楼主: jieforest

把Apache Cassandra作为云数据库的评估

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
21#
 楼主| 发表于 2012-7-23 00:46 | 只看该作者
Solving the Cloud Mixed-Workload Problem

A primary benefit that DataStax Enterprise provides to enterprises needing smart big data management capabilities is its ability to service real-time, analytic, and enterprise search data operations in the same database cluster without any of the loads impacting the other. The key to making this possible is the underlying architecture of Cassandra.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
22#
 楼主| 发表于 2012-7-23 00:48 | 只看该作者
Hadoop Analytics in the Cloud

Built into DataStax Enterprise is an enhanced Hadoop distribution that utilizes Cassandra for many of its core services. DataStax Enterprise provides integrated Hadoop MapReduce, Hive, Pig, Mahout, and job/task tracking capabilities, replacing Hadoop’s HDFS storage layer with Cassandra (CassandraFS).

The end product is a single integrated solution that provides increased reliability, simpler deployment, and lower total cost of ownership (TCO) than a traditional Hadoop solution. DataStax Enterprise also is fully compatible with existing HDFS, Hadoop, and Hive tools and utilities.

Another benefit of using Hadoop in DataStax Enterprise is that it eliminates the complexity and single points of failure of the typical Hadoop HDFS layer. From an operational standpoint, there is no need to set up a Hadoop name node, secondary name node, Zookeeper, and so on.

Instead, DataStax Enterprise provides a single layer in which every node is a peer of the others and automatically knows its position in the cluster. On startup, all DataStax Enterprise nodes automatically start a Hadoop task tracker, and one of the nodes is elected to be the job tracker.

If the job tracker node fails, the job tracker is automatically restarted on a different node. DataStax Enterprise utilizes full data locality awareness for Hadoop task assignment.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
23#
 楼主| 发表于 2012-7-23 00:50 | 只看该作者
Search With Solr in the Cloud

DataStax Enterprise includes strong enterprise search support via Lucene and Apache Solr. Coming from the Apache Lucene project, Solr is the most popular open source enterprise search platform in use today.

Solr’s primary features include robust full-text search, hit highlighting, faceted search, rich document (e.g., PDF, Microsoft Word) handling, and geospatial search.

By integrating Solr into the DataStax Enterprise big data platform, DataStax extends Solr’s capabilities and overcomes a number of shortcomings that native Solr has such as:

• Lack of data durability (community Solr has no write-ahead log, so data can be lost if a node crashes). No chance of data loss exists with Solr in DataStax Enterprise

• Solr’s write bottleneck, as all writes go through a single master. But with DataStax Enterprise, users can read and write to any Solr node in the cluster

• Replication and sharding of Solr, which is a manual process and requires careful planning for scaling and failover. DataStax Enterprise, however, supplies automatic sharding and no single point of failure

• Manual re-indexing of data. Indexes can be automatically rebuilt in DataStax Enterprise

• Writes to indexes in community Solr cannot span multiple data centers; there is only a single master that replicates via rsync. But, in DataStax Enterprise, multiple writes to search indexes in different data centers are merged together (i.e., writes can occur anywhere)

• Solr indexes in DataStax Enterprise can be dropped/recreated/rebuilt on the fly (versus how things are done in native Solr)

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
24#
 楼主| 发表于 2012-7-23 00:51 | 只看该作者
In essence, in the same way that DataStax Enterprise takes Hadoop and delivers a fault-tolerant, no single point of failure, and dynamically scalable Hadoop/analytics system, it automatically does the same thing for Solr and enterprise search operations.

Using Cassandra as the underlying foundation, DataStax Enterprise allows search data to be written to any participating search node in a DataStax Enterprise cluster. New search nodes can be added online to increase both fault tolerance and performance, with gains being near linear in nature.

Those currently using Solr will be at home with DataStax Enterprise, as the solution is 100 percent Solr compatible, with all Solr utilities, APIs, and so on, included.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
25#
 楼主| 发表于 2012-7-27 11:39 | 只看该作者
A Complete Big Data Platform for the Cloud

A key benefit of DataStax Enterprise is the tight feedback loop it has between real-time applications and the analytics and search operations that naturally follow. Traditionally, users would be forced to move data between systems via complex ETL processes, or perform both functions on the same system with the risk of one impacting the other. In big data environments, this process can be time-consuming and burdensome.  

With DataStax Enterprise, real-time, analytic, and search big data operations take place in the same distributed system, but users have the ability to dedicate certain nodes solely for analytics or search so their workloads don’t slow down real-time processing.  Users simply define one or more replica groups, and configure the role of each – one or more Cassandra, Hadoop, or HDFS (i.e., HDFS without job/task tracker), and search/Solr nodes. Writes are instantly replicated between all nodes.

With DataStax Enterprise, users truly have the best of all worlds for big data management. They have all the power of Cassandra serving their highest-volume and high-velocity, real-time applications; the power of Hadoop, Hive, and Pig working directly against the same data for analytics; and Solr for enterprise search in the same distributed database. The result is smart workload isolation for big data applications that is much simpler to manage and more reliable than any alternative.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
26#
 楼主| 发表于 2012-7-28 06:47 | 只看该作者
Figure 3: DataStax Enterprise – real-time and analytic, and search data in one cloud database

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
27#
 楼主| 发表于 2012-7-28 06:47 | 只看该作者
Visual Database Management

DataStax Enterprise includes a visual, browser-based management solution named OpsCenter Enterprise to manage and monitor cloud database deployments. OpsCenter Enterprise allows a developer or administrator to manage and monitor the health of cloud databases from a centralized web console.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
28#
 楼主| 发表于 2012-7-28 06:48 | 只看该作者
Figure 4: OpsCenter Enterprise database cluster ring view

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
29#
 楼主| 发表于 2012-7-28 06:49 | 只看该作者
OpsCenter Enterprise uses an agent-based architecture to monitor and carry out tasks on each node in a DataStax Enterprise cluster. Through a graphical and intuitive point-and-click interface, a user can understand the state of a cluster, which nodes are up and down, and what type of performance users are experiencing. Key events are reported into a centralized dashboard displayed along with other vital statistics.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
30#
 楼主| 发表于 2012-7-28 06:51 | 只看该作者
Figure 5: OpsCenter dashboard

Analytic operations also can be monitored and controlled from within OpsCenter Enterprise:

Figure 6: OpsCenter analytic operations monitoring

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表