楼主: jieforest

NoSQL数据存储管理模式的演变

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2013-9-20 11:59 | 只看该作者
3. We introduce a generic NoSQL database programming lan- guage that  abstracts  from the  APIs of the most  prominent NoSQL systems. Our language clearly distinguishes the state of the persisted data from the state of the objects in the application space. This is a vital aspect, since the NoSQL data store offers a very restricted API, and data manipulation happens in the application code.

4. By implementing our schema evolution operations in our NoSQL database programming language, we show that they can be implemented for a large class of NoSQL data stores.

5. We investigate whether a proposed schema evolution operation is safe to execute.

6. Apart from exploring eager migration, we introduce the notion of lazy migration and point out its potential for future research in the database community.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2013-9-21 12:40 | 只看该作者
Structure.

In the next section, we start with an overview on the state-of-the-art in NoSQL data stores. Section 3 introduces our declarative language for evolving the data and its structure.

In Section 4, we define an abstract and generic NoSQL database program- ming language for accessing NoSQL data stores. The operations of our language are available in many popular NoSQL systems. With this formal basis, we can implement our schema evolution operations eagerly, see Section 5.

Alternatively, schema evolution can be handled lazily. We sketch the capabilities of object mappers that allow lazy migration in Section 6. In Section 7, we discuss related work on schema evolution in relational databases, XML applica- tions, and NoSQL data stores. We then conclude with a summary and an outlook on our future work.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2013-9-21 12:41 | 只看该作者
2. NoSQL Data Stores

We focus on NoSQL data stores hosted in a cloud environment. Typically, such systems scale to large amounts of data, and are schema-less or schema-flexible. We begin with a categorization of popular systems, discussing their commonalities and differences.

We then point out the NoSQL data stores that we consider in this paper with their core characteristics. In doing so, we generalize from proprietary details and introduce a common terminology.

2.1 State of the art

NoSQL data stores vary hugely in terms of data model, query model, scalability, architecture, and persistence design. Several tax- onomies for NoSQL data stores have been proposed. Since we fo- cus on schema evolution, a categorization of systems by data model is most natural for our purposes.

We thus resort to a (very common) classification [8, 34] into (1) key-value stores, (2) document stores, and (3) extensible record stores. Often, extensible record stores are also called wide column stores or column family stores.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2013-9-21 12:42 | 只看该作者
(1) Key-value stores.

Systems like Redis [30, Chapter 8] or Riak [4] store data in pairs of a unique key and a value. Key- value stores do not manage the structure of these values.

There is no concept of schema beyond distinguishing keys and values. Accordingly, the query model is very basic: Only inserts, updates, and deletes by key are supported, yet no query predicates on val- ues. Since key-value stores do not manage the schema of values, schema evolution is the responsibility of the application.

(2) Document stores.

Systems such as MongoDB [10] or Couch- base [7] also store key-value pairs. However, they store “docu- ments” in the value part. The term “document” connotes loosely structured sets of name-value pairs, typically in JSON (JavaScript Object Notation) format or the binary representation BSON, a more type-rich format of JSON.

Name-value pairs represent the proper- ties of data objects. Names are unique, and name-value pairs are sometimes even referred to as key-value pairs. The document for- mat is hierarchical, so values may be scalar, lists, or even nested documents. Documents within the same document store may differ in their structure, since there is no fixed schema.

Queries in document stores are more expressive than in key- value stores. Apart from inserting, updating, and deleting doc- uments based on the document key, we may query documents based on their properties.

The query languages differ from system to system. Some systems, such as MongoDB, have an integrated query language for ad-hoc queries, whereas other systems, such as CouchDB [30, Chapter 6] and Couchbase, do not. There, the user predefines views in form of MapReduce functions [12, 34].

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2013-9-21 12:42 | 只看该作者
An interesting and orthogonal point is the behavior in evaluating predicate queries: When a document does not contain a property mentioned in a query predicate, then this property is not even considered in query evaluation.

Document stores are schema-less, so documents may effort- lessly evolve in structure: Properties can be added or removed from a particular document without affecting the remaining documents. Typically, there is no schema definition language that would al- low the application developer to manage the structure of documents globally, across all documents.

(3) Extensible record stores. Extensible record stores such as BigTable [9] or HBase [13] actually provide a loosely defined schema. Data is stored as records.

A schema defines families of properties, and new properties can be added within a property fam- ily on a per-record basis. (Properties and property families are of- ten also referred to as columns and column families.) Typically, the schema cannot be defined up front and extensible record stores allow the ad-hoc creation of new properties.

However, properties can- not be renamed or easily re-assigned from one property family to the other. So certain challenges from schema evolution in relational database systems carry over to extensible record stores.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2013-9-21 12:43 | 只看该作者
Google Datastore [15] is built on top of Megastore [3] and BigTable, and is very flexible and comfortable to use. For instance, it very effectively implements multitenancy for all its users.

The Cassandra system [1] is an exception among extensible record stores, since it is much more restrictive regarding schema. Properties are actually defined up front, even with a “CREATE TABLE” statement, and the schema is altered globally with an “ALTER TABLE” statement. So while Cassandra is an extensible record store [8, 34], it is not schema-less or schema-flexible. In this work, we will exclusively consider schema-less data stores.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
17#
 楼主| 发表于 2013-9-22 09:16 | 只看该作者
A word on NULL values.

The handling of NULL values  in NoSQL data stores deserves attention, as the treatment of unknown values is a factor in schema evolution. In relational database sys- tems, NULL values represent unknown information, and are pro- cessed with a three-valued logic in query evaluation. Yet in NoSQL data stores, there is no common notion of NULLs across systems:

1. Some systems follow the same semantics of NULL values as relational databases, e.g. [10].

2. Some systems allow for NULL values to be stored, but do not allow NULLs in query predicates, e.g. [1, 15].

3. Some  systems  do  not  allow  NULL  values  at  all,  e.g.  [13], arguing that NULL values only waste storage.

While there is no common strategy on handling unknown values yet, the discussion is ongoing and lively. Obviously, there is a semantic difference between a property value that is not known (such as the first name for a particular user), and a property value that does not exist for a variant of an entity (since home addresses and business addresses are structured differently). Consequently, some NoSQL data stores which formerly did not support NULL values have introduced them in later releases [11, 30, Chapter 6].


使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
18#
 楼主| 发表于 2013-9-22 09:17 | 只看该作者
In Section 4, we present a generic NoSQL data store program- ming language. As the approaches to handling NULL values are so manifold, we choose to disregard NULLs as values and in queries, until a consensus has been established among NoSQL data stores.

2.2 NoSQL Data Stores in Scope for this Paper

In this paper, we investigate schema evolution for feature-rich, interactive web applications that are backed by NoSQL data stores. This makes document stores and schema-less extensible record stores our primary platforms of interest. Since key-value stores do not know any schema apart from distinguishing keys and values, we believe they are not the technology of choice for our purposes; after all, one cannot even run the most basic predicate queries, e.g. to find all blogs posted within the last ten hours.

We assume a high-level, abstract view on document stores and extensible record stores and introduce our terminology. Our ter- minology takes after Google Datastore [15]. We also state our as- sumptions on the data and query model.


使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
19#
 楼主| 发表于 2013-9-22 09:17 | 只看该作者
Data model.

Objects stored in the NoSQL data store are called entities. Each entity belongs to a kind, which is a name given to groups of semantically similar objects. Queries can then be spec- ified over all entities of the same kind. Each entity has a unique key, which consists of the entity kind and an id. Entities have sev- eral properties (corresponding to attributes in the relational world). Each entity property consists of a name and a value. Properties may be scalar, they may be multi-valued, or consist of nested entities.

Query model.

Entities can be inserted and deleted based on their key. We can formulate queries against all entities of a kind. At the very least, we assume that a NoSQL data store supports conjunctive queries with equality comparisons. This functionality is commonly provided by document stores and extensible record stores alike.

Freedom of schema.

We assume that the global structure of enti- ties cannot be fixed in advance. The structure of a single entity can be changed any time, according to the developers’ needs.


使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
20#
 楼主| 发表于 2013-9-22 09:18 | 只看该作者
  1. evolutionop ::= add | delete | rename | move | copy; add ::= "add" property "=" value [selection];
  2. delete ::= "delete" property [selection];
  3. rename ::= "rename" property "to" pname [selection]; move ::= "move" property "to" kname [complexcond]; copy ::= "copy" property "to" kname [complexcond];

  4. selection ::= "where" conds;
  5. complexcond ::= "where"  (joincond | conds
  6. | (joincond "and" conds)); joincond ::= property "=" property;
  7. conds ::= cond {"and" cond}; cond ::=   property "=" value;

  8. property ::= kname "." pname; kname ::= identifier;
  9. pname ::= identifier;
复制代码
Figure 2.  EBNF of the NoSQL schema evolution language.

Example 1.  The blogging application example from the Introduc- tion is coherent with this terminology and these assumptions.        口

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表