楼主: jieforest

SQL vs NoSQL KO. Postgres vs Mongo

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2015-9-4 23:06 | 只看该作者
2.3.3 BSON VS JsonB

So how do our contenders perform? The first thing an astute reader will notice, is the similarity between JSONB and BSON, both are a json structure stored as binary internally. How do they differ? The first point of difference is that JSONB will output fully standards compliant JSON, as described by the RFC[9], while BSON has does not. This is, however a double edged sword. For example, JSONB does not support a native binary type unlike BSON, nor, more pertinently a date type.

2.3.4 Performance Comparison

As demonstrated here: Postgres is faster, uses less memory on disk, and is all around more performant for JSON storage and reads then Mongo.[10]
Winner: Postgres. By technical Victory. Postgres is faster, less memory intensive, and more standards compliant then MongoDB. However, if you require some of the intrinsic type checking of BSON, and this type checking must be done in a denormalized manner, rather then by table columns, or if your need to do complex access updates on a JSON attribute, Mongo ekes out a victory. Once Postgres 9.5 ships, the latter seized to be a concern.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2015-9-4 23:07 | 只看该作者
2.4 Complex Model Relations, Access Patterns, and Normalized Data

2.4.1 Normalized Data in Postgres

This is more or less what postgres does. It allows you to encode data relationships into tables, using foreign keys to encode relationships between tables. It allows you to join between tables to pull in data from across table boundaries[11]. For example:
  1. CREATE TABLE USERS( id SERIAL PRIMARY KEY, organization_id INTEGER, name TEXT );
  2. CREATE TABLE ORGANIZATIONS( id SERIAL PRIMARY KEY, name TEXT );
复制代码
Like learning from posts like this? Subscribe for more!

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2015-9-4 23:08 | 只看该作者
Now to select we can either:
  1. SELECT USERS.* FROM USERS, ORGANIZATIONS WHERE USERS.organization_id = ORGANIZATIONS.id AND ORGANIZATIONS.name LIKE '%bar%';
复制代码
Like learning from posts like this? Subscribe for more!

Or
  1. SELECT USERS.* FROM USERS INNER JOIN ORGANIZATIONS ON USERS.organization_id = ORGANIZATIONS.id WHERE ORGANIZATIONS.name LIKE '%bar%';
复制代码
Like learning from posts like this? Subscribe for more!

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2015-9-4 23:08 | 只看该作者
2.4.2 Normalized Data in MongoDB

Loosely speaking Mongo collections map to Postgres tables, while Mongo Documents map to Postgres Rows. It is important to note that MongoDB does not support Joins, forcing you to query for nested relationships directly if you choose to store either a direct key id or a DBRef, or letting you fetch directly from the nested object. While storing a nested object like so:
  1. db.createCollection('users');
  2. db.users.insert({name: 'jack', organization: {name: 'foo corp'}});
复制代码
Is a trivial solution, it is also denormalized, and from a business logic point of view, presumes that organization has no life, independently of user. What if this isn't true, or you would like to have normalized data? MongoDB allows two ways to achieve this:

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2015-9-4 23:09 | 只看该作者
1) DBRefs allow you to embed direct references to other documents in your documents. However, this will force additional queries to be run every time to fetch your referenced documents, and should (as per MongoDB's documentation) be avoided when possible

2)
  1. db.createCollection('users');
  2. db.createCollection('organizations');
  3. db.organizations.insert({name: 'foo corp'});
  4. db.organizations.insert({name: 'bar corp'});
  5. var foo = db.organizations.find({name: 'foo corp'});
  6. db.users.insert({name: "Jack", organization_id: foo});
复制代码
You can then use application level logic to extract the organization_id, fetch that organization separately, and join the data in the application. Note that, this bypasses any transaction logic you have built or use, unless your transaction logic is handled at the application level. Winner: Postgres By knockout.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2015-9-5 19:10 | 只看该作者
2.4.3 Performance Comparison

MongoDB also provides an aggregation framework that lets users mimic many of the JOIN functioanlities of the Relational world. A good (albeit somewhat biased in writing style) performance comparison is provided here[12]. TL;DR; Postgres is about three times faster at joins and aggregation than Mongo. Additionally, the MongoDB aggregation Pipeline can only handle a single collection. An important note here, is that this benchmark only really applies to a one machine database.

2.5 Scaling

There are fundamentally two types of scaling, horizontal scale, and vertical scale. Vertical scaling loosely means adding resources to a given machine. More RAM, more CPU cores. Horizontal Scale means multiple machines running your database.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
17#
 楼主| 发表于 2015-9-5 19:11 | 只看该作者
2.5.1 Scaling Postgres

As long as you can maintain Vertical Scale, Postgres scaling is trivial. You add more power to your machine, bump up the resources allocated to Postgres, and you're off to the races. But sooner or later this will hit a ceiling. Scaling Postgres Horizontally is significantly harder, but doable. There are several valid strategies to achieve this. At the core these are replication for reads (a master machine that allows writes, and multiple read only machines), and sharding. Sharding is a complex topic with multiple solutions, from application level load balancing, down to database level logic to store the shardid as part of the primary key as done at instagram[13] A detailed discussion of the approaches is out of scope of this article.

2.5.2 Scaling Mongo

MongoDB Supports Sharding at the technology level. When sharded, collections are partitioned by a shard key. Mongo's query routers can then identify the right shard to read from. A great resource to achieve good sharding, complete with best practices for balancing shard sizes, can be found in the documentation[14]

Winner: MongoDB. Technical Victory due to native sharding support and ease.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
18#
 楼主| 发表于 2015-9-5 19:11 | 只看该作者
2.6 Rapid Prototyping

So, you have investors breathing down your neck, and you owe a prototype yesterday. What technology do you choose for your data store? While Postgres seems to be the clear victor above, there are a few advantages to using Mongo:

1. As the store is schema-less, as your requirements rapidly change you do not need to continuously write migrations

2. You do not need to think through your data-model, ensuring normalization.

3. You do not need to write SQL, as The query language is JSON like, and will feel very familiar to anybody with Javascript experience.

4. It is probably fair to say that at this early juncture a lot of your data is of suboptimal importance, and your organization can survive its loss or corruption, thus the strong guarantees provided by Postgres are not necessary.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
19#
 楼主| 发表于 2015-9-5 19:12 | 只看该作者
The downside: All data is equally likely to be lost. If your organization deals with enterprise customers, or handles financial data, MongoDB is very simply not an option[15]. Additionally, while it is true that MongoDB is easier to scale down the road, Postgres is also scalable (if its good enough for Instagram....).

Victory: MongoDB, Technical Victory. Assuming you do not already have postgres and/or database expertise, MongoDB's simpler query interface and lack of requirements for schema migration/maintenance make it easier to rapidly prototype in. Just be aware that unless you fit a small set of niche use cases where individual, small scale dataloss is truly irrelevant (running large scale analytics on normally distributed datasets), you will eventually have to throw your database away and rewrite swathes of your application.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
20#
 楼主| 发表于 2015-9-5 19:13 | 只看该作者
3. Summary

Postgres comes out the clear victor of this fight. There are valid use cases for MongoDB such as reporting on and aggregating large datasets of normally distributed data, and storing TRULY denormalized data, data where relationships are mostly non existent, documents are large, mostly unstructured, and with little to no overlap, while data loss is largely irrelevant (Log Parsing and Caching come to mind). However as a general purpose database, Postgres is clearly the dominant fighter in this arena, and if some denormalized data is required, like say a set of optional parameters on a user (eye color, height, weight, hair color), Postgres' JSONB column is more than sufficient.

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表