Introduction to MongoDB for Java, PHP and Python Developers

jieforest · 发表于 2012-5-31 12:33

Additional features of note

MongoDB has many useful features like Geo Indexing (How close am I to X?), distributed file storage, capped collection (older documents auto-deleted), aggregation framework (like SQL projections for distributed nodes without the complexities of MapReduce for basic operations on distributed nodes), load sharing for reads via replication, auto sharding for scaling writes, high availability, and your choice of durability (journaling) and/or data safety (make sure a copy exists on other servers).

Architecture Replica Sets, Autosharding

The model of MongoDB is such that you can start basic and use features as your growth/needs change without too much trouble or change in design. MongoDB uses replica sets to provide read scalability, and high availability.

Autosharding is used to scale writes (and reads). Replica sets and autosharding go hand in hand if you need mass scale out. With MongoDB scaling out seems easier than traditional approaches as many things seem to come built-in and happen automatically.

Less operation/administration and a lower TCO than other solutions seems likely. However you still need capacity planning (good guess), monitoring (test your guess), and the ability to determine your current needs (adjust your guess).

jieforest · 发表于 2012-5-31 12:35

Replica Sets

The major advantages of replica sets are business continuity through high availability, data safety through data redundancy, and read scalability through load sharing (reads). Replica sets use a share nothing architecture.

A fair bit of the brains of replica sets is in the client libraries. The client libraries are replica set aware. With replica sets, MongoDB language drivers know the current primary.

Language driver is a library for a particular programing language, think JDBC driver or ODBC driver, but for MongoDB.

All write operations go to the primary. If the primary is down, the drivers know how to get to the new primary (an elected new primary), this is auto failover for high availability.

The data is replicated after writing. Drivers always write to the replica set's primary (called the master), the master then replicates to slaves. The primary is not fixed. The master/primary is nominated.

Typically you have at least three MongoDB instances in a replica set on different server machines (see figure 2). You can add more replicas of the primary if you like for read scalability, but you only need three for high availability failover.

There is a way to sort of get down to two, but let's leave that out for this article. Except for this small tidbit, there are advantages of having three versus two in general.

If you have two instances and one goes down, the remaining instance has 200% more load than before. If you have three instances and one goes down, the load for the remaining instances only go up by 50%.

If you run your boxes at 50% capacity typically and you have an outage that means your boxes will run at 75% capacity until you get the remaining box repaired or replaced. If business continuity is your thing or important for your application, then having at least three instances in a replica set sounds like a good plan anyway (not all applications need it).

Figure 2: Replica Sets

jieforest · 发表于 2012-5-31 12:36

In general, once replication is setup it just works.

However, in your monitoring, which is easy to do in MongoDB, you want to see how fast data is getting replicated from the primary (master) to replicas (slaves).

The slower the replication is, the dirtier your reads are. The replication is by default async (non-blocking). Slave data and primary data can be out of sync for however long it takes to do the replication.

There are already whole books written just on making Mongo scalable, if you work at a Foursquare like company or a company where high availability is very important, and use Mongo, I suggest reading such a book.

By default replication is non-blocking/async. This might be acceptable for some data (category descriptions in an online store), but not other data (shopping cart's credit card transaction data).

For important data, the client can block until data is replicated on all servers or written to the journal (journaling is optional). The client can force the master to sync to slaves before continuing.

This sync blocking is slower. Async/non-blocking is faster and is often described as eventual consistency. Waiting for a master to sync is a form of data safety.

There are several forms of data safety and options available to MongoDB from syncing to at least one other server to waiting for the data to be written to a journal (durability). Here is a list of some data safety options for MongoDB:

1. Wait until write has happened on all replicas

2. Wait until write is on two servers (primary and one other)

3. Wait until write has occurred on majority of replicas

4. Wait until write operation has been written to journal

(The above is not an exhaustive list of options.)

The key word of each option above is wait. The more syncing and durability the more waiting, and harder it is to scale cost effectively.

jieforest · 发表于 2012-5-31 12:37

Journaling: Is durability overvalued if RAM is the new Disk? Data Safety versus durability

It may seem strange to some that journaling was added as late as version 1.8 to MongoDB.

Journaling is only now the default for 64 bit OS for MongoDB 2.0. Prior to that, you typically used replication to make sure write operations were copied to a replica before proceeding if the data was very important.

The thought being that one server might go down, but two servers are very unlikely to go down at the same time.

Unless somebody backs a truck over a high voltage utility poll causing all of your air conditioning equipment to stop working long enough for all of your servers to overheat at once, but that never happens (it happened to Rackspace and Amazon).

And if you were worried about this, you would have replication across availability zones, but I digress.

At one point MongoDB did not have single server durability, now it does with addition of journaling.

But, this is far from a moot point. The general thought from MongoDB community was and maybe still is that to achieve Web Scale, durability was thing of the past. After all memory is the new disk.

If you could get the data on second server or two, then the chances of them all going down at once is very, very low. How often do servers go down these days? What are the chances of two servers going down at once?

The general thought from MongoDB community was (is?) durability is overvalued and was just not Web Scale.

Whether this is a valid point or not, there was much fun made about this at MongoDB's expense (rated R, Mature 17+).

jieforest · 发表于 2012-6-1 12:41

An article on when to use MongoDB journaling versus older recommendations will be a welcome addition. Generally it seems journaling is mostly a requirement for very sensitive financial data and single server solutions.

Your results may vary, and don't trust my math, it has been a few years since I got a B+ in statistics, and I am no expert on SLA of modern commodity servers (the above was just spit balling).

If you have ever used a single non-clustered RDBMS system for a production system that relied on frequent backups and transaction log (journaling) for data safety, raise your hand.

Ok, if you raised your hand, then you just may not need autosharding or replica sets. To start with MongoDB, just use a single server with journaling turned on. If you require speed, you can configure MongoDB journaling to batch writes to the journal (which is the default).

This is a good model to start out with and probably very much like quite a few application you already worked on (assuming that most application don't need high availability).

The difference is, of course, if later your application deemed to need high availability, read scalability, or write scalability, MongoDB has your covered. Also setting up high availability seems easier on MongoDB than other more established solutions.

Figure 3: Simple setup with journaling and single server ok for a lot of applications

jieforest · 发表于 2012-6-1 12:42

There are three main process actors for autosharding: mongod (database daemon), mongos, and the client driver library. Each mongod instance gets a shard. Mongod is the process that manages databases, and collections.

Mongos is a router, it routes writes to the correct mongod instance for autosharding. Mongos also handles looking for which shards will have data for a query. To the client driver, mongos looks like a mongod process more or less (autosharding is transparent to the client drivers).

Figure 4: MongoDB Autosharding

jieforest · 发表于 2012-6-2 01:37

Autosharding increases write and read throughput, and helps with scale out. Replica sets are for high availability and read throughput. You can combine them as shown in figure 5.

Figure 5: MongoDB Autosharding plus Replica Sets for scalable reads, scalable writes, and high availability

You shard on an indexed field in a document. Mongos collaborates with config servers (mongod instances acting as config servers), which have the shard topology (where do the key ranges live).

Shards are just normal mongod instances. Config servers hold meta-data about the cluster and are also mongodb instances.

jieforest · 发表于 2012-6-2 01:38

Shards are further broken down into 64 MB chunks called chunks. A chunk is 64 MB worth of documents for a collection. Config servers hold which shard the chunks live in.

The autosharding happens by moving these chunks around and distributing them into individual shards. The mongos processes have a balancer routine that wakes up so often, it checks to see how many chunks a particular shard has.

If a particular shard has too many chunks (nine more chunks than another shard), then mongos starts to move data from one shard to another to balance the data capacity amongst the shards.

Once the data is moved then the config servers are updated in a two phase commit (updates to shard topology are only allowed if all three config servers are up).

The config servers contain a versioned shard topology and are the gatekeeper for autosharding balancing. This topology maps which shard has which keys.

The config servers are like DNS server for shards. The mongos process uses config servers to find where shard keys live.

Mongod instances are shards that can be replicated using replica sets for high availability.

Mongos and config server processes do not need to be on their own server and can live on a primary box of a replica set for example.

For sharding you need at least three config servers, and shard topologies cannot change unless all three are up at the same time. This ensures consistency of the shard topology.

The full autosharding topology is show in figure 6. An excellent talk on the internals of MongoDB sharding was done by Kristina Chodorow, author of Scaling MongoDB, at OSCON 2011 if you would like to know more.

Figure 6: MongoDB Autosharding full topology for large deployment including Replica Sets, Mongos routers, Mongod Instance, and Config Servers

jieforest · 发表于 2012-6-2 01:40

MapReduce

MongoDB has MapReduce capabilities for batch processing similar to Hadoop. Massive aggregation is possible through the divide and conquer nature of MapReduce.

Before the Aggregation Framework MongoDB's MapReduced could be used instead to implement what you might do with SQL projections (Group/By SQL).

MongoDB also added the aggregation framework, which negates the need for MapReduce for common aggregation cases. In MongoDB, Map and Reduce functions are written in JavaScript.

These functions are executed on servers (mongod), this allows the code to be next to data that it is operating on (think stored procedures, but meant to execute on distributed nodes and then collected and filtered).

The results can be copied to a results collections.

MongoDB also provides incremental MapReduce. This allows you to run MapReduce jobs over collections, and then later run a second job but only over new documents in the collection.

You can use this to reduce work required by merging new data into existing results collection.

jieforest · 发表于 2012-6-3 01:04

Aggregation Framework

The Aggregation Framework was added in MongoDB 2.1.

It is similar to SQL group by. Before the Aggregation framework, you had to use MapReduce for things like SQL's group by.

Using the Aggregation framework capabilities is easier than MapReduce. Let's cover a small set of aggregation functions and their SQL equivalents inspired by the Mongo docs as follows:

Count

SQL

SELECT COUNT(*) FROM employees

复制代码

MongoDB

db.users.employees([
{ $group: {_id:null, count:{$sum:1}} }
])

复制代码

Sum salary where of each employee who are not retired by department

SQL

SELECT dept_name SUM(salary) FROM employees WHERE retired=false GROUP BY dept_name

复制代码

MongoDB

db.orders.aggregate([
{ $match:{retired:false} },
{ $group:{_id:"$dept_name",total:{$sum:"$salary"}} }
])

复制代码

This is quite an improvement over using MapReduce for common projections like these. Mongo will take care of collecting and summing the results from multiple shards if need be.