楼主: jieforest

分布式数据库Hypertable 0.9.7.6发布

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
21#
 楼主| 发表于 2013-5-31 12:29 | 只看该作者
1. control - This field is consists of bit flags that describe the format of the remaining fields.  There are certain circumstances where the timestamp or revision number may be absent, or where they are identical, in which case, they're collapsed into a single field.  This field contains that information and tells Hypertable how to properly interpret the key.

2. row key - This field contains a '\0' terminated string that represents the row key.

3. column family - This field is a single-byte field that indicates the column family code.  

4. column qualifier - This field contains a '\0' terminated string that represents the column qualifier.

5. flag - Deletes are handled through the insertion of special "delete" records (or tombstones) that indicate that some portion of a row's cells have been deleted.  These delete records are applied at query time and the deleted cells are garbage collected during major compactions.

6. timestamp - This field is an 8-byte (64-bit) field that contains the cell timestamp, represented as nanoseconds since the Unix epoch.  By default, the timestamp is stored big-endian, ones-compliment so that within a given cell, versions are stored newest to oldest.

7. revision - This field is an 8-byte (64-bit) field that contains a high resolution timestamp that currently is used internally to provide snapshot isolation for queries.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
22#
 楼主| 发表于 2013-5-31 12:30 | 只看该作者
ACCESS GROUPS

Access Groups provide a way to control the physical storage of column data to optimize disk I/O.  Access Groups are defined in the table schema and instruct Hypertable to physically store all data for columns within the same access group together on disk.  This feature allows you optimize queries for columns that are accessed with high frequency by reducing the amount of data transferred from disk during query execution.  Disk I/O is limited to just the data from the access groups of the columns specified in the query.  For example, consider the following schema.
  1. CREATE TABLE User (
  2.   name,
  3.   address,
  4.   photo,
  5.   profile,
  6.   ACCESS GROUP default (name, address, photo),
  7.   ACCESS GROUP profile (profile)
  8. );
复制代码

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
23#
 楼主| 发表于 2013-6-5 10:00 | 只看该作者
Hypertable will create two physical groupings of column data, one for the name, address, and photo columns, and another for the profile column.  The following diagram illustrates this physical grouping.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
24#
 楼主| 发表于 2013-6-5 10:00 | 只看该作者
Consider the following query for the profile column of the User table.
  1. SELECT profile from User;
复制代码
The execution of this query will be efficient because only the data for the profile column will be transferred from disk during query execution.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
25#
 楼主| 发表于 2013-6-5 10:01 | 只看该作者
RANGESERVER INSERT HANDLING

The following diagram illustrates how inserts are handled inside the RangeServer.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
26#
 楼主| 发表于 2013-6-5 10:01 | 只看该作者
Step 1: Commit Log - Inserts are appended to the Commit log which resides in the distributed filesystem (DFS) and followed by a sync operations that tells the filesystem to persist any buffered writes to disk.  If multiple insert requests are pending, or a GROUP_COMMIT_INTERVAL is configured for the table, then the  sync operation is performed after multiple Commit log appends to improve throughput.

Step 2: Add to map - The inserts are added to the in-memory CellCache (equivalent to the Memtable in the Bigtable paper).

Step 3: Acknowledge - Acknowledgement is sent back to the application.

Background Maintenance Threads - Over time, as the CellCaches fill memory, background maintenance threads will "spill" the in-memory CellCache data to on-disk CellStore files which frees up memory inside the RangeServer which allows it to accept more inserts.

This design makes Hypertable writes durable and consistent because inserts are not acknowledged until the Commit log has been successfully written to.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
27#
 楼主| 发表于 2013-6-5 10:02 | 只看该作者
RANGESERVER QUERY HANDLING

The following diagram illustrates how queries are handled inside the RangeServer.



Data for a range can reside in the in-memory CellCache as well as in some number of on-disk CellStores (see following section).  To evaluate a query over a table range, the RangeServer must create a unified view of the data, which it does through the use of a MergeScanner object, which merges together the sorted key/value pairs coming from the CellCache and CellStores.  This unified stream of key/value pairs is then filtered to produce the desired results.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
28#
 楼主| 发表于 2013-6-6 09:41 | 只看该作者
CELLSTORE FORMAT

Over time, the RangeServers will write in-memory CellCaches to on-disk files, called CellStores, whose format is illustrated in the illustration to the right.  The following describes the sections of the CellStore file format.

Compressed blocks of cells (key/value pairs) - This section consists of a series of sorted blocks of compressed sorted key/value pairs.  By default, the compressed blocks are approximately 64KB in size.  
This size can be controlled by the Hypertable.RangeServer.CellStore.

DefaultBlockSize property.  These blocks are the minimum unit of data transfer from disk.
Bloom Filter - After the compressed blocks of key/value pairs comes the bloom filter.  This is a probabalistic data structure that describes the keys that exist (with high likelihood) in the CellStore.  It also signals if a key is definitively not present, which helps the RangeServer avoid unnecessary block transfer and decompression.

Block Index - After the bloom filter comes the block index.  This index lists, for each block, the last key in the block followed by the block offset.

Trailer - At the end of the CellStore is the trailer.  The trailer contains general statistics about the CellStore and includes the version number of the CellStore format so that the RangeServer can interpret it correctly.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
29#
 楼主| 发表于 2013-6-6 09:41 | 只看该作者
QUERY ROUTING

The following diagram illustrates the data structures that support the query routing algorithm which is how queries get sent to the relevant RangeServers.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
30#
 楼主| 发表于 2013-6-6 09:41 | 只看该作者
METADATA Table

There exists a special table in Hypertable called the METADATA table that contains a row for each range in the system.  There is a column Location, that indicates which RangeServer is currently serving the range.  Though the diagram shows IP addresses in the Location column, the system stores a proxy name for the RangeServer in that column so that the system can be run on public clouds such as Amazon's EC2 and operate correctly in the face of server restarts and IP address changes.  A two-level hierarchy is overlaid on top of the METADATA table.  The first range is the ROOT range which contains pointers to the second-level ranges which, in turn, contain pointers to the USER ranges, which are the ranges that make up regular user or application defined tables.

Client Library

The Client Library provides the application programming interface (API) that allows an application to talk to Hypertable.  This library is linked into each Hypertable application and handles query routing.  The client library includes a METADATA cache which contains the range location information obtained by walking the METADATA hierarchy.  Most application range location requests are served directly out of this cache.  The ThriftBroker, which provides a high-level language interface to Hypertable, links against the client library and is a long-lived process, so its METADATA cache is usually fresh and populated.  For this reason, we recommend that short lived applications (e.g. CGI programs) use the Thrift interface to avoid having to walk the METADATA hierarchy for each request.

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表