查看: 2637|回复: 0

Apache Cassandra Terminology Risks and Rewards

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
跳转到指定楼层
1#
发表于 2012-7-10 07:19 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
Recently, there's been growing support to change the terminology we use to describe the data model of Cassandra. This has people somewhat divided and although I've gone on record as supporting the decision. I too am a bit torn. I can appreciate both perspectives, and there are both risks and rewards associated with the switch.

The two controversial terms are Keyspace and Column Family. The terms roughly correlate to the more familiar relational equivalents: Schema and Table. I think that it is a fairly easy transition to change from Keyspace to Schema. Logically speaking, in relational databases, a schema is collection of tables. Likewise, in Cassandra, a Keyspace is a collection of Column Families.

The sticky point is Column Family. Conceptually, everyone can visualize a table as an nxm matrix of data. Although you can mentally map a Column Family into that same logical construct, buyer beware.

The Risks:

A data model for a column-oriented database is typically *much* different from an analogous model designed for an RDBMS. To achieve the same capabilities that a relational database provides on tables, you need to model your data differently to support "standard" relational queries. Assuming a column family has the same capabilities as a table will lead you to all sorts of headaches. (e.g. consider Range Queries and Indexing)

When data modeling, I don't relate column families to tables at all. For me, its easier to think of column families as a map of maps. Then just remember that the top-level map can be distributed across a set of machines. Using that mental model you are more likely to create a data model that is compatible with a column-oriented database. Think of column families as tables, and you may get yourself into trouble that will require significant refactoring.

The Rewards:

With a strong movement towards polyglot persistence architectures, and tools that need to span the different persistence mechanisms, I can see a strong motivation to align terminology. (Consider ETL tools (e.g. Talend), design tools (e.g. Erwin), even SQL clients (e.g. good old Toad))

The popularity of Cassandra's CQL is further evidence that people want to interact with NoSQL databases using tried-and-true SQL (ironically). And maybe we should "give the people what they want" especially if it simultaneously eases the transition for new comers.

The Big Picture:

Theologically, and in an ideal world, I agree with Jonathan's point:"The point is that thinking in terms of the storage engine is difficult and unnecessary. You can represent that data relationally, which is the Right Thing to do both because people are familiar with that world and because it decouples model from representation, which lets us change the latter if necessary"
Pragmatically, I've found that it is often necessary to consider the storage engine at least until that engine has all the features and functions that allow me to ignore it.Realistically, any terminology change is going to take a long time. The client APIs probably aren't changing anytime soon, (Hector, Astyanax, etc.) and the documentation still reflects the "legacy" terminology. It's only on my radar because we decided to evolve the terminology in the RefCard that we just released. Only time will tell what will come of "The Great Cassandra Terminology Debates of 2012", but guaranteed there will be people on both sides of the fence -- as I find myself occasionally straddling it. =)



您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表