12
返回列表 发新帖
楼主: jieforest

[转载] Apache Cassandra and Low-Latency Applications

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2015-2-3 23:55 | 只看该作者
As soon as read performance issues were resolved, we performed tests in the mixed mode (reads + writes). Here we observed a situation which is described in the chart below (Picture 2). After each flush to SSTable Cassandra started to read data from disks, which in its turn caused increased timeouts on the client side. This problem was relevant for the HDD+RAM configuration because reading from SSD did not result in additional timeouts.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2015-2-3 23:56 | 只看该作者
Picture 2. Disk usage in the mixed mode (reads + writes) before improvements.

We tried to tinker around with Cassandra configuration options, namely, populate_io_cache_on_flush (which is described above). This option was turned off by default, meaning that filesystem cache was not populated with new SSTables. Therefore when the data from a new SSTable was accessed, it was read from HDD. Setting its value to true fixed the issue. The chart below (Picture 3) displays disk reads after the improvement.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2015-2-3 23:56 | 只看该作者
Picture 3. Disk usage in the mixed mode (reads + writes) after improvements.

In other words, Cassandra stopped reading from disks after the whole dataset was cached in memory even in the mixed mode. It’s noteworthy that populate_io_cache_on_flush option is turned on by default in Cassandra starting from version 2.1, though it was excluded from the configuration file. The summary below (Table 2) describes the changes we tried and their impact.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2015-2-3 23:56 | 只看该作者
Table 2. Changes to Cassandra and the system itself and their effect on latency.

Finally, after applying the changes described in this post, we achieved acceptable results for both SSD and HDD+RAM configurations. Much effort was also put into tuning a Cassandra client (we used Astyanax) to operate well with replication factor two and reliably return control on time in case of a timeout. We would also like to share some details about operations automation, monitoring, as well as ensuring proper work of the cross data center replication, but it is very difficult to cover all the aspects in a single post. As stated above, we had gone to production with HDD+RAM configuration and it worked reliably with no surprises, including Cassandra upgrade on the live cluster without downtime.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2015-2-3 23:57 | 只看该作者
Conclusion
Cassandra was new to us when it was introduced into the project. We had to spend a lot of time exploring its features and configuration options. It allowed us to implement the required architecture and deliver the system on time. And at the same time we gained a great experience. We carried out significant work integrating Cassandra into our workflow. All our changes in Cassandra source code were contributed back to the community. Our digital marketing client benefited by having a more stable and scalable infrastructure with automated synchronization reducing the amount of time they had to maintain the systems.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2015-2-3 23:57 | 只看该作者
over.

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表