楼主: jieforest

Introduction to MongoDB for Java, PHP and Python Developers

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
21#
 楼主| 发表于 2012-6-2 01:40 | 只看该作者
MapReduce

MongoDB has MapReduce capabilities for batch processing similar to Hadoop. Massive aggregation is possible through the divide and conquer nature of MapReduce.

Before the Aggregation Framework MongoDB's MapReduced could be used instead to implement what you might do with SQL projections (Group/By SQL).

MongoDB also added the aggregation framework, which negates the need for MapReduce for common aggregation cases. In MongoDB, Map and Reduce functions are written in JavaScript.

These functions are executed on servers (mongod), this allows the code to be next to data that it is operating on (think stored procedures, but meant to execute on distributed nodes and then collected and filtered).

The results can be copied to a results collections.

MongoDB also provides incremental MapReduce. This allows you to run MapReduce jobs over collections, and then later run a second job but only over new documents in the collection.

You can use this to reduce work required by merging new data into existing results collection.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
22#
 楼主| 发表于 2012-6-3 01:04 | 只看该作者
Aggregation Framework

The Aggregation Framework was added in MongoDB 2.1.

It is similar to SQL group by. Before the Aggregation framework, you had to use MapReduce for things like SQL's group by.

Using the Aggregation framework capabilities is easier than MapReduce. Let's cover a small set of aggregation functions and their SQL equivalents inspired by the Mongo docs as follows:

Count

SQL
  1. SELECT COUNT(*) FROM employees
复制代码
MongoDB
  1. db.users.employees([
  2. { $group: {_id:null, count:{$sum:1}} }
  3. ])
复制代码
Sum salary where of each employee who are not retired by department

SQL
  1. SELECT dept_name SUM(salary) FROM employees WHERE retired=false GROUP BY dept_name
复制代码
MongoDB
  1. db.orders.aggregate([
  2. { $match:{retired:false} },
  3. { $group:{_id:"$dept_name",total:{$sum:"$salary"}} }
  4. ])
复制代码
This is quite an improvement over using MapReduce for common projections like these. Mongo will take care of collecting and summing the results from multiple shards if need be.



使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
23#
 楼主| 发表于 2012-6-3 01:05 | 只看该作者
Installing MongoDB

Let's mix in some code samples to try out along with the concepts.

To install MongoDB go to their download page, download and untar/unzip the download to ~/mongodb-platform-version/.

Next you want to create the directory that will hold the data and create a mongodb.config file (/etc/mongodb/mongodb.config) that points to said directory as follows:

Listing: Installing MongoDB
  1. $ sudo mkdir /etc/mongodb/data


  2. $ cat /etc/mongodb/mongodb.config
  3. dbpath=/etc/mongodb/data
复制代码
The /etc/mongodb/mongodb.config has one line dbpath=/etc/mongodb/data that tells mongo where to put the data. Next, you need to link mongodb to /usr/local/mongodb and then add it to the path environment variable as follows:

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
24#
 楼主| 发表于 2012-6-3 01:07 | 只看该作者
本帖最后由 jieforest 于 2012-6-3 01:08 编辑

Listing: Setting up MongoDB on your path
  1. $ sudo ln -s  ~/mongodb-platform-version/  /usr/local/mongodb
  2. $ export PATH=$PATH:/usr/local/mongodb/bin
复制代码
Run the server passing the configuration file that we created earlier.

Listing: Running the MongoDB server
  1. $ mongod --config /etc/mongodb/mongodb.config
复制代码
Mongo comes with a nice console application called mongo that let's you execute commands and JavaScript. JavaScript to Mongo is what PL/SQL is to Oracle's database. Let's fire up the console app, and poke around.
Firing up the mongos console application

  1. $ mongo
  2. MongoDB shell version: 2.0.4
  3. connecting to: test

  4. > db.version()
  5. 2.0.4
  6. >
复制代码





使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
25#
 楼主| 发表于 2012-6-4 09:09 | 只看该作者
One of the nice things about MongoDB is the self describing console. It is easy to see what commands a MongoDB database supports with the db.help() as follows:

Client: mongo db.help()
  1. > db.help()
  2. DB methods:
  3. db.addUser(username, password[, readOnly=false])
  4. db.auth(username, password)
  5. db.cloneDatabase(fromhost)
  6. db.commandHelp(name) returns the help for the command
  7. db.copyDatabase(fromdb, todb, fromhost)
  8. db.createCollection(name, { size : ..., capped : ..., max : ... } )
  9. db.currentOp() displays the current operation in the db
  10. db.dropDatabase()
  11. db.eval(func, args) run code server-side
  12. db.getCollection(cname) same as db['cname'] or db.cname
  13. db.getCollectionNames()
  14. db.getLastError() - just returns the err msg string
  15. db.getLastErrorObj() - return full status object
  16. db.getMongo() get the server connection object
  17. db.getMongo().setSlaveOk() allow this connection to read from the nonmaster member of a replica pair
  18. db.getName()
  19. db.getPrevError()
  20. db.getProfilingStatus() - returns if profiling is on and slow threshold
  21. db.getReplicationInfo()
  22. db.getSiblingDB(name) get the db at the same server as this one
  23. db.isMaster() check replica primary status
  24. db.killOp(opid) kills the current operation in the db
  25. db.listCommands() lists all the db commands
  26. db.logout()
  27. db.printCollectionStats()
  28. db.printReplicationInfo()
  29. db.printSlaveReplicationInfo()
  30. db.printShardingStatus()
  31. db.removeUser(username)
  32. db.repairDatabase()
  33. db.resetError()
  34. db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into { cmdObj : 1 }
  35. db.serverStatus()
  36. db.setProfilingLevel(level,{slowms}) 0=off 1=slow 2=all
  37. db.shutdownServer()
  38. db.stats()
  39. db.version() current version of the server
  40. db.getMongo().setSlaveOk() allow queries on a replication slave server
  41. db.fsyncLock() flush data to disk and lock server for backups
  42. db.fsyncUnock() unlocks server following a db.fsyncLock()
复制代码
You can see some of the commands refer to concepts we discussed earlier. Now let's create a employee collection, and do some CRUD operations on it.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
26#
 楼主| 发表于 2012-6-4 09:10 | 只看该作者
Create Employee Collection
  1. > use tutorial;
  2. switched to db tutorial
  3. > db.getCollectionNames(); [ ]
  4. > db.employees.insert({name:'Rick Hightower', gender:'m', gender:'m', phone:'520-555-1212', age:42});
  5. Mon Apr 23 23:50:24 [FileAllocator] allocating new datafile /etc/mongodb/data/tutorial.ns, ...
复制代码
The use command uses a database. If that database does not exist, it will be lazily created the first time we access it (write to it).

The db object refers to the current database. The current database does not have any document collections to start with (this is why db.getCollections() returns an empty list). To create a document collection, just insert a new document.

Collections like databases are lazily created when they are actually used. You can see that two collections are created when we inserted our first document into the employees collection as follows:
  1. > db.getCollectionNames();
  2. [ "employees", "system.indexes" ]
复制代码
The first collection is our employees collection and the second collection is used to hold onto indexes we create.

To list all employees you just call the find method on the employees collection.
  1. > db.employees.find()
  2. { "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower",
  3.     "gender" : "m", "phone" : "520-555-1212", "age" : 42 }
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
27#
 楼主| 发表于 2012-6-4 09:12 | 只看该作者
The above is the query syntax for MongoDB. There is not a separate SQL like language. You just execute JavaScript code, passing documents, which are just JavaScript associative arrays, err, I mean JavaScript objects. To find a particular employee, you do this:
  1. > db.employees.find({name:"Bob"})
复制代码
Bob quit so to find another employee, you would do this:
  1. > db.employees.find({name:"Rick Hightower"})
  2. { "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", "gender" : "m", "phone" : "520-555-1212", "age" : 42 }
复制代码
The console application just prints out the document right to the screen. I don't feel 42. At least I am not 100 as shown by this query:
  1. > db.employees.find({age:{$lt:100}})
  2. { "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower", "gender" : "m", "phone" : "520-555-1212", "age" : 42 }
复制代码
Notice to get employees less than a 100, you pass a document with a subdocument, the key is the operator ($lt), and the value is the value (100). Mongo supports all of the operators you would expect like $lt for less than, $gt for greater than, etc. If you know JavaScript, it is easy to inspect fields of a document, as follows:
  1. > db.employees.find({age:{$lt:100}})[0].name
  2. Rick Hightower
复制代码
If we were going to query, sort or shard on employees.name, then we would need to create an index as follows:
  1. db.employees.ensureIndex({name:1}); //ascending index, descending would be -1
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
28#
 楼主| 发表于 2012-6-5 12:32 | 只看该作者
Indexing by default is a blocking operation, so if you are indexing a large collection, it could take several minutes and perhaps much longer.

This is not something you want to do casually on a production system. There are options to build indexes as a background task, to setup a unique index, and complications around indexing on replica sets, and much more.

If you are running queries that rely on certain indexes to be performant, you can check to see if an index exists with db.employees.getIndexes(). You can also see a list of indexes as follows:
  1. > db.system.indexes.find()
  2. { "v" : 1, "key" : { "_id" : 1 }, "ns" : "tutorial.employees", "name" : "_id_" }
复制代码
By default all documents get an object id. If you don't not give it an object an _id, it will be assigned one by the system (like a criminal suspects gets a lawyer). You can use that _id to look up an object as follows with find One:
  1. > db.employees.findOne({_id : ObjectId("4f964d3000b5874e7a163895")})
  2. { "_id" : ObjectId("4f964d3000b5874e7a163895"), "name" : "Rick Hightower",
  3.    "gender" : "m", "phone" : "520-555-1212", "age" : 42 }
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
29#
 楼主| 发表于 2012-6-5 12:33 | 只看该作者
Java and MongoDB

Pssst! Here is a dirtly little secret. Don't tell your Node.js friends or Ruby friends this. More Java developers use MongoDB than Ruby and Node.js. They just are not as loud about it. Using MongoDB with Java is very easy.

The language driver for Java seems to be a straight port of something written with JavaScript in mind, and the usuability suffers a bit because Java does not have literals for maps/objects like JavaScript does.

Thus an API written for a dynamic langauge does not quite fit Java. There can be a lot of useability improvement in the MongoDB Java langauge driver (hint, hint). There are alternatives to using just the straight MongoDB language driver, but I have not picked a clear winner (mjorm, morphia, and Spring data MongoDB support).

I'd love just some usuability improvements in the core driver without the typical Java annotation fetish, perhaps a nice Java DAO DSL (see section on criteria DSL if you follow the link).

Setting up Java and MongoDB

Let's go ahead and get started then with Java and MongoDB.

Download latest mongo driver from github (https://github.com/mongodb/mongo-java-driver/downloads), then put it somewhere, and then add it to your classpath as follows:
  1. $ mkdir tools/mongodb/lib
  2. $ cp mongo-2.7.3.jar tools/mongodb/lib
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
30#
 楼主| 发表于 2012-6-5 12:34 | 只看该作者
Assuming you are using Eclipse, but if not by now you know how to translate these instructions to your IDE anyway. The short story is put the mongo jar file on your classpath. You can put the jar file anywhere, but I like to keep mine in ~/tools/.

If you are using Eclipse it is best to create a classpath variable so other projects can use the same variable and not go through the trouble. Create new Eclipse Java project in a new Workspace.

Now right click your new project, open the project properties, go to the Java Build Path->Libraries->Add Variable->Configure Variable shown in figure 7.

Figure 7: Adding Mongo jar file as a classpath variable in Eclipse



For Eclipse from the "Project Properties->Java Build Path->Libraries", click "Add Variable", select "MONGO", click "Extend…", select the jar file you just downloaded.

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表