SQL vs NoSQL KO. Postgres vs Mongo

jieforest · 发表于 2015-9-4 23:06

2.3.3 BSON VS JsonB

So how do our contenders perform? The first thing an astute reader will notice, is the similarity between JSONB and BSON, both are a json structure stored as binary internally. How do they differ? The first point of difference is that JSONB will output fully standards compliant JSON, as described by the RFC[9], while BSON has does not. This is, however a double edged sword. For example, JSONB does not support a native binary type unlike BSON, nor, more pertinently a date type.

2.3.4 Performance Comparison

As demonstrated here: Postgres is faster, uses less memory on disk, and is all around more performant for JSON storage and reads then Mongo.[10]
Winner: Postgres. By technical Victory. Postgres is faster, less memory intensive, and more standards compliant then MongoDB. However, if you require some of the intrinsic type checking of BSON, and this type checking must be done in a denormalized manner, rather then by table columns, or if your need to do complex access updates on a JSON attribute, Mongo ekes out a victory. Once Postgres 9.5 ships, the latter seized to be a concern.

jieforest · 发表于 2015-9-4 23:07

2.4 Complex Model Relations, Access Patterns, and Normalized Data

2.4.1 Normalized Data in Postgres

This is more or less what postgres does. It allows you to encode data relationships into tables, using foreign keys to encode relationships between tables. It allows you to join between tables to pull in data from across table boundaries[11]. For example:

CREATE TABLE USERS( id SERIAL PRIMARY KEY, organization_id INTEGER, name TEXT );
CREATE TABLE ORGANIZATIONS( id SERIAL PRIMARY KEY, name TEXT );

复制代码

Like learning from posts like this? Subscribe for more!

jieforest · 发表于 2015-9-4 23:08

Now to select we can either:

SELECT USERS.* FROM USERS, ORGANIZATIONS WHERE USERS.organization_id = ORGANIZATIONS.id AND ORGANIZATIONS.name LIKE '%bar%';

复制代码

Like learning from posts like this? Subscribe for more!

Or

SELECT USERS.* FROM USERS INNER JOIN ORGANIZATIONS ON USERS.organization_id = ORGANIZATIONS.id WHERE ORGANIZATIONS.name LIKE '%bar%';

复制代码

Like learning from posts like this? Subscribe for more!

jieforest · 发表于 2015-9-4 23:08

2.4.2 Normalized Data in MongoDB

Loosely speaking Mongo collections map to Postgres tables, while Mongo Documents map to Postgres Rows. It is important to note that MongoDB does not support Joins, forcing you to query for nested relationships directly if you choose to store either a direct key id or a DBRef, or letting you fetch directly from the nested object. While storing a nested object like so:

db.createCollection('users');
db.users.insert({name: 'jack', organization: {name: 'foo corp'}});

复制代码

Is a trivial solution, it is also denormalized, and from a business logic point of view, presumes that organization has no life, independently of user. What if this isn't true, or you would like to have normalized data? MongoDB allows two ways to achieve this:

jieforest · 发表于 2015-9-4 23:09

1) DBRefs allow you to embed direct references to other documents in your documents. However, this will force additional queries to be run every time to fetch your referenced documents, and should (as per MongoDB's documentation) be avoided when possible

2)

db.createCollection('users');
db.createCollection('organizations');
db.organizations.insert({name: 'foo corp'});
db.organizations.insert({name: 'bar corp'});
var foo = db.organizations.find({name: 'foo corp'});
db.users.insert({name: "Jack", organization_id: foo});

复制代码

You can then use application level logic to extract the organization_id, fetch that organization separately, and join the data in the application. Note that, this bypasses any transaction logic you have built or use, unless your transaction logic is handled at the application level. Winner: Postgres By knockout.

jieforest · 发表于 2015-9-5 19:10

2.4.3 Performance Comparison

MongoDB also provides an aggregation framework that lets users mimic many of the JOIN functioanlities of the Relational world. A good (albeit somewhat biased in writing style) performance comparison is provided here[12]. TL;DR; Postgres is about three times faster at joins and aggregation than Mongo. Additionally, the MongoDB aggregation Pipeline can only handle a single collection. An important note here, is that this benchmark only really applies to a one machine database.

2.5 Scaling

There are fundamentally two types of scaling, horizontal scale, and vertical scale. Vertical scaling loosely means adding resources to a given machine. More RAM, more CPU cores. Horizontal Scale means multiple machines running your database.

jieforest · 发表于 2015-9-5 19:11

2.5.1 Scaling Postgres

As long as you can maintain Vertical Scale, Postgres scaling is trivial. You add more power to your machine, bump up the resources allocated to Postgres, and you're off to the races. But sooner or later this will hit a ceiling. Scaling Postgres Horizontally is significantly harder, but doable. There are several valid strategies to achieve this. At the core these are replication for reads (a master machine that allows writes, and multiple read only machines), and sharding. Sharding is a complex topic with multiple solutions, from application level load balancing, down to database level logic to store the shardid as part of the primary key as done at instagram[13] A detailed discussion of the approaches is out of scope of this article.

2.5.2 Scaling Mongo

MongoDB Supports Sharding at the technology level. When sharded, collections are partitioned by a shard key. Mongo's query routers can then identify the right shard to read from. A great resource to achieve good sharding, complete with best practices for balancing shard sizes, can be found in the documentation[14]

Winner: MongoDB. Technical Victory due to native sharding support and ease.

jieforest · 发表于 2015-9-5 19:11

2.6 Rapid Prototyping

So, you have investors breathing down your neck, and you owe a prototype yesterday. What technology do you choose for your data store? While Postgres seems to be the clear victor above, there are a few advantages to using Mongo:

1. As the store is schema-less, as your requirements rapidly change you do not need to continuously write migrations

2. You do not need to think through your data-model, ensuring normalization.

3. You do not need to write SQL, as The query language is JSON like, and will feel very familiar to anybody with Javascript experience.

4. It is probably fair to say that at this early juncture a lot of your data is of suboptimal importance, and your organization can survive its loss or corruption, thus the strong guarantees provided by Postgres are not necessary.

jieforest · 发表于 2015-9-5 19:12

The downside: All data is equally likely to be lost. If your organization deals with enterprise customers, or handles financial data, MongoDB is very simply not an option[15]. Additionally, while it is true that MongoDB is easier to scale down the road, Postgres is also scalable (if its good enough for Instagram....).

Victory: MongoDB, Technical Victory. Assuming you do not already have postgres and/or database expertise, MongoDB's simpler query interface and lack of requirements for schema migration/maintenance make it easier to rapidly prototype in. Just be aware that unless you fit a small set of niche use cases where individual, small scale dataloss is truly irrelevant (running large scale analytics on normally distributed datasets), you will eventually have to throw your database away and rewrite swathes of your application.

jieforest · 发表于 2015-9-5 19:13

3. Summary

Postgres comes out the clear victor of this fight. There are valid use cases for MongoDB such as reporting on and aggregating large datasets of normally distributed data, and storing TRULY denormalized data, data where relationships are mostly non existent, documents are large, mostly unstructured, and with little to no overlap, while data loss is largely irrelevant (Log Parsing and Caching come to mind). However as a general purpose database, Postgres is clearly the dominant fighter in this arena, and if some denormalized data is required, like say a set of optional parameters on a user (eye color, height, weight, hair color), Postgres' JSONB column is more than sufficient.

SQL vs NoSQL KO. Postgres vs Mongo

浏览过的版块