12
返回列表 发新帖
楼主: jieforest

[转载] Goodbye MongoDB, Hello PostgreSQL

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2015-3-19 22:53 | 只看该作者
Migrating a Subset

Before we would even consider migrating all our data we needed to run tests using a small subset of the final data. There’s no point in migrating if you know that even a small chunk of data is going to give you lots of trouble.

While there are existing tools that can handle this we also had to transform some data (e.g. fields being renamed, types being different, etc) and as such had to write our own tools for this. These tools were mostly one-off Ruby scripts that each performed specific tasks such as moving over reviews, cleaning up encodings, correcting primary key sequences and so on.

The initial testing phase didn’t reveal any problems that might block the migration process, although there were some problems with some parts of our data. For example, certain user submitted content wasn’t always encoded correctly and as a result couldn’t be imported without being cleaned up first. Another interesting change that was required was changing the language names of reviews from their full names (“dutch”, “english”, etc) to language codes as our new sentiment analysis stack uses language codes instead of full names.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2015-3-19 22:54 | 只看该作者
Updating Applications

By far most time was spent in updating applications, especially those that relied heavily on MongoDB’s aggregation framework. Throw in a few legacy Rails applications with low test coverage and you have yourself a few weeks worth of work. The process of updating these applications was basically as following:

1. Replace MongoDB driver/model setup code with PostgreSQL related code

2. Run tests

3. Fix a few tests

4. Run tests again, rinse and repeat until all tests pass

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2015-3-19 22:54 | 只看该作者
本帖最后由 jieforest 于 2015-3-19 22:54 编辑

For non Rails applications we settled on using Sequel while we stuck with ActiveRecord for our Rails applications (at least for now). Sequel is a wonderful database toolkit, supporting most (if not all) PostgreSQL specific features that we might want to use. Its query building DSL is also much more powerful compared to ActiveRecord, although it can be a bit verbose at times.

As an example, say you want to calculate how many users use a certain locale along with the percentage of every locale (relative to the entire set). In plain SQL such a query could look like the following:

  1. SELECT locale,
  2. count(*) AS amount,
  3. (count(*) / sum(count(*)) OVER ()) * 100.0 AS percentage

  4. FROM users

  5. GROUP BY locale
  6. ORDER BY percentage DESC;
复制代码


使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2015-3-19 22:54 | 只看该作者
In our case this would produce the following output (when using the PostgreSQL commandline interface):
  1. locale | amount |        percentage
  2. --------+--------+--------------------------
  3. en     |   2779 | 85.193133047210300429000
  4. nl     |    386 | 11.833231146535867566000
  5. it     |     40 |  1.226241569589209074000
  6. de     |     25 |  0.766400980993255671000
  7. ru     |     17 |  0.521152667075413857000
  8.         |      7 |  0.214592274678111588000
  9. fr     |      4 |  0.122624156958920907000
  10. ja     |      1 |  0.030656039239730227000
  11. ar-AE  |      1 |  0.030656039239730227000
  12. eng    |      1 |  0.030656039239730227000
  13. zh-CN  |      1 |  0.030656039239730227000
  14. (11 rows)
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2015-3-19 22:55 | 只看该作者
Sequel allows you to write the above query using plain Ruby without the need of string fragments (as ActiveRecord often requires):
  1. star = Sequel.lit('*')

  2. User.select(:locale)
  3.     .select_append { count(star).as(:amount) }
  4.     .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
  5.     .group(:locale)
  6.     .order(Sequel.desc(:percentage))
复制代码

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表