楼主: jieforest

NoSQL数据存储管理模式的演变

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
31#
 楼主| 发表于 2013-9-24 12:24 | 只看该作者
To mark the account as expired (state “x”), we write



To change the user’s password to “g2g”, we write

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
32#
 楼主| 发表于 2013-9-24 12:25 | 只看该作者
Evaluating operations.

Operations may change the state of the data store and the application space. We call the former the data store state, and call the latter the application state. We denote the impact of operations by rules of the form



where op denotes the operation to be executed on the data store

state ds and the application state as. By evaluating the operation, the data store state changes to dsI, and the application state to asI.

Operations may be executed in sequence, which we define as

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
33#
 楼主| 发表于 2013-9-25 12:45 | 只看该作者
4.1 Manipulating Entities

We next formalize operations common to most NoSQL data stores, namely creating and persisting entities, as well as retrieving and deleting single entities. Figure 5 defines our operations. Let Kind be the set of entity kinds. Let Id be a set denoting identifiers.

The set of entity keys is defined as Keys = Kind × Id, i.e. an entity key is a tuple of the kind and an identifier. Entity properties are named.

Let Names be the set of property names. A property value can be Rule 3 adds a new property with name n and value v to the entity with key κ. Adding a nested entity as a property is specified in Rule 4. Rule 5 removes the property with name n from the entity

with key κ: By setting the property value to ⊥, the property by that name is no longer defined.

Persisting entities. Rule 6 persists the entity with key κ, repli- cating this entity to the data store state. The put-operation replaces any entity by the same key, should one exist. Rule 7 deletes the en- tity with key κ from the data store state. With rule 8, we retrieve a particular entity by key from the data store state.

Example 9. The following sequence of operations creates the entity from Example 7 and persists it in the data store.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
34#
 楼主| 发表于 2013-9-25 12:46 | 只看该作者
4.2 Queries

Given an entity key κ, we define the function kind(κ) such that it returns the kind of this entity. Then rule 9 retrieves all entities from the store that are of the specified kind c.
In addition to querying for a particular kind, we can also query with a predicate θ, as described by rule 10.

We consider conjunc- tive queries, with equality as the only comparison operator. This type of queries is typically supported by all of today’s NoSQL data stores. Various systems may even have more expressive query lan- guages (e.g. with additional comparison operators and support for disjunctive queries, yet typically no join).

More precisely, θ is a conjunctive query over atoms of the form n = v where n is a property name and v is a property value from Dom. The predicate θ is evaluated on one entity at-a-time.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
35#
 楼主| 发表于 2013-9-25 12:49 | 只看该作者
5.        Safe and Eager Migration

Now that we have a generic NoSQL database programming lan- guage, we can implement the declarative schema evolution opera- tions from Section 3. We believe the declarative operations cover common schema evolution tasks. For more complex migration sce- narios, we can always resort to a programmatic solution. This matches the situation with relational databases, where an “ALTER TABLE” statement covers the typical schema alterations, but where more complex transformations require an ETL-process to be set up,

(
or a custom migration script to be coded.

Let asθ be the result of evaluating query θ, i.e.
get(θ) (ds, ?) = (ds, asθ )

Figure 6 shows the implementation for the operations add, delete, and rename. A for-loop fetches all matching entities from

]        the data store, modifies them, and updates their version property (as

and let K = {κ | (κ  → π) ∈ asθ } be the keys of all entities in the
query result. We can then evaluate the for-loop as follows.
while (K != ?) do
there exists some key κ in K;
K := K \ {κ};
evaluate operation op for the binding of x to key κ:
(ds, as) :=   op[x/κ]  (ds, as);
od        ]

Above, op[x/κ] is obtained from operation op by first substituting each occurrence of x in op by κ, and next replacing all operands κ.n in query predicates by the value of “getProperty(κ, n)”.
Example 11. We add a new property “email” to all user entities in the data store, and initialize it with the empty string E.
foreach x in get(kind = “user”) do
setProperty(x, email, E); put(x)
od

introduced in Section 3). The updated entities are then persisted.

Figure 7 shows the implementation for copy and move. Again, entities are fetched from the NoSQL data store one by one, updated, and then persisted. This requires joins between entities. Since joins are not supported in most NoSQL data stores, they need to be encoded in the application logic.

This batch update corresponds to the recommendation of NoSQL data store providers on how to handle schema evolution (e.g. [23]).

Note that the create-or-replace semantics inherent in our NoSQL database programming language make for a well-defined behav- ior of operations. For instance, renaming the property “text” in blogposts to “content” (c.f. Example 4) effectively overwrites any existing property named content.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
36#
 楼主| 发表于 2013-9-25 12:50 | 只看该作者
Moreover, the version property added to all entities makes the migration robust in case of interruptions. NoSQL data stores com- monly offer very limited transaction support. For instance, Google Datastore only allows transactions to span up to five entities in so- called cross-group transactions (or alternatively, provides the con- cept of entity groups not supported in our NoSQL database pro-

Legend: Let c be a kind, let n be a property name, and let v be a property value from Dom. θ is a conjunctive query over properties.
add c.n = v where θ
foreach e in get(kind = c ∧ θ) do
setProperty(e, n, v);
setProperty(e, version, getProperty(e, version) +1); put(e)
od
delete c.n where θ
foreach e in get(kind = c ∧ θ) do
removeProperty(e, n);
setProperty(e, version, getProperty(e, version) +1); put(e)
od
rename c.n to m where θ
foreach e in get(kind = c ∧ θ) do setProperty(e, m, getProperty(e, n)); removeProperty(e, n);
setProperty(e, version, getProperty(e, version) +1); put(e)
od

Figure 6. Implementing add, delete, and rename.


gramming language) [15]. So a large-scale migration cannot be per- formed as an atomic action. By restricting migrations to all entities of a particular version (using the where-clause), we may correctly recover from interrupts, even for move and copy operations.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
37#
 楼主| 发表于 2013-9-25 12:51 | 只看该作者
Interestingly, not all migrations that can be specified are desir- able. For instance, assuming a 1:N relationship between users and the blogposts they have written, the result of the migration copy user.url to blogpost where user.login = blogpost.author does not depend on the order in which blogpost entities are updated.

However, if there is an N:M relationship between users and blogposts, e.g. since we specify the copy operation as cross product between all users and all blogposts, copy user.url to blogpost

then the execution order influences the migration result. Naturally, we want to be able to know whether a migration is safe before we execute it. Concretely, we say a migration is safe if it does not produce more than one entity with the same key.

The following propositions follow from the implementations of schema evolution operators in Figures 6 and 7.

Proposition 1.  An add, delete, or rename operation is safe.

Proposition 2. For a move or copy operation, and a data store state ds, the safety of executing the operation on ds can be decided
in O(|ds|2).

Deciding whether a copy or move operation is safe can be done in a simulation run of the evolution operator. If an entity has already been updated in such a “dry-run” and is to be overwritten with different property values, then the migration is not safe.

In relational data exchange, the existence of solutions for re- lational mappings under constraints is a highly related problem. There, it can be shown that while the existence of solutions is an undecidable problem per-se, for certain restrictions, the problem is PTIME-decidable (c.f. Corollary 2.15 in [2]). Moreover, the ve- hicle for checking for solutions is the chase algorithm, which fails when equality-generating dependencies in the target schema are vi- olated. This is essentially the same idea as our dry-run producing

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
38#
 楼主| 发表于 2013-9-26 12:53 | 只看该作者
6. An Outlook on Lazy Migration

Our NoSQL database programming language can also express op- erations for lazy migration. To illustrate this on an intuitive level, we encode some features of the Objectify object mapper [27].

We will make use of some self-explanatory additional language constructs, such as if-statements and local variables. Additionally, we assume an operation “hasProperty(κ, n)” that tests whether the entity with key κ in the application state has a property by name n.

Example 12. The following example is adapted from the Objectify documentation. It illustrates how properties are renamed when an entity is loaded from the data store and translated into a Java object.

The Java class Person is mapped to an entity. The annota- tion @Id marks the identifier for this entity, the entity kind is derived from the class name. The earlier version of this entity has a prop- erty “name”, which is now renamed to “fullName”. Legacy entities do not yet have the property “fullName”. When they are loaded, the object mapper assigns the value of property “name” to the class attribute “fullName”. The next time that the entity is persisted, its new version will be stored.
  1. public class Person {
  2. @Id Long id;
  3. @AlsoLoad("name") String fullName;
  4. }
复制代码
In our NoSQL database programming language, we implement the annotation @AlsoLoad as follows.
  1. Key p := (“Person”, id);
  2. if hasProperty(p, name) do

  3. setProperty(p, fullName, getProperty(p, name)); removeProperty(p, name)
  4. od
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
39#
 楼主| 发表于 2013-9-26 12:54 | 只看该作者
Example 13. The following example is adapted from [27]. The annotation @OnLoad specifies the migration for an entity when it is loaded. If the entity has properties street and city, these properties are moved to a new entity storing the address. These properties are then discarded from the person entity when it is persisted (specified by the annotation @IgnoreSave). Saving an entity is done by calling the Objectify function ofy().save().
  1. public class Person {
  2. @Id Long id;
  3. @IgnoreSave String street;
  4. @IgnoreSave String city;

  5. @OnLoad void onLoad() {
  6. if (this.street != null &&  this.city != null) { Entity a = new Entity("address"); a.setProperty("person", this.id); a.setProperty("street", this.street); a.setProperty("city", this.city); ofy().save().entity(a);
  7. }
  8. }
  9. }
复制代码
We implement the method with annotation @OnLoad as follows.
  1. Key p := (“Person”, id);
  2. if ( hasProperty(p, street) ∧ hasProperty(p, city) ) do
  3. Key a = (“Address”, id);
  4. new(a);
  5. setProperty(a, person, id);
  6. setProperty(a, street, getProperty(p, street)); setProperty(a, city, getProperty(p, city)); put(a);
  7. removeProperty(p, street); removeProperty(p, city);
  8. od
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
40#
 楼主| 发表于 2013-9-26 12:54 | 只看该作者
It remains future work to explore lazy migrations in greater de- tail, and develop mechanisms to statically check them prior to ex- ecution: The perils of using such powerful features in an uncon- trolled manner, on production data, are evident. Lazy migration is particularly difficult to test prior to launch, since we cannot fore- tell which entities will be touched at runtime. After all, users may return after years and re-activate their accounts, upon which the object mapper tries to evolve ancient data.

It is easy to imagine scenarios where lazy migration fails, due to artifacts in the entity structure that developers are no longer aware of. In particular, we would like to be able to determine whether an annotation for lazy migration is safe. At the very least, we would like to check whether a lazy migration is idempotent, so that when transactions involving evolutions fail, there is no harm done in re- applying the migration.


使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表