楼主: jieforest

在线数据存储系统S3QL

[复制链接]
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
11#
 楼主| 发表于 2013-7-16 17:14 | 只看该作者
Storage Backends

S3QL supports different backends to store data at different service providers and using different protocols. A storage url specifies a backend together with some backend-specific information and uniquely identifies an S3QL file system. The form of the storage url depends on the backend and is described for every backend below.

All storage backends respect the http_proxy and https_proxy environment variables.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
12#
 楼主| 发表于 2013-7-16 17:15 | 只看该作者
Google Storage

Google Storage is an online storage service offered by Google. To use the Google Storage backend, you need to have (or sign up for) a Google account, and then activate Google Storage for your account. The account is free, you will pay only for the amount of storage and traffic that you actually use. Once you have created the account, make sure to activate legacy access.

To create a Google Storage bucket, you can use e.g. the Google Storage Manager. The storage URL for accessing the bucket in S3QL is then
  1. gs://<bucketname>/<prefix>
复制代码
Here bucketname is the name of the bucket, and prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same Google Storage bucket.

Note that the backend login and password for accessing your Google Storage bucket are not your Google account name and password, but the Google Storage developer access key and Google Storage developer secret that you can manage with the Google Storage key management tool.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
13#
 楼主| 发表于 2013-7-16 17:15 | 只看该作者
Amazon S3

Amazon S3 is the online storage service offered by Amazon Web Services (AWS). To use the S3 backend, you first need to sign up for an AWS account. The account is free, you will pay only for the amount of storage and traffic that you actually use. After that, you need to create a bucket that will hold the S3QL file system, e.g. using the AWS Management Console. For best performance, it is recommend to create the bucket in the geographically closest storage region, but not the US Standard region (see below).

The storage URL for accessing S3 buckets in S3QL has the form
  1. s3://<bucketname>/<prefix>
复制代码
Here bucketname is the name of the bucket, and prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same S3 bucket.

Note that the backend login and password for accessing S3 are not the user id and password that you use to log into the Amazon Webpage, but the AWS access key id and AWS secret access key shown under My Account/Access Identifiers.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
14#
 楼主| 发表于 2013-7-16 17:16 | 只看该作者
Reduced Redundancy Storage (RRS)

S3QL does not allow the use of reduced redundancy storage. The reason for that is a combination of three factors:

1 RRS has a relatively low reliability, on average you lose one out of every ten-thousand objects a year. So you can expect to occasionally lose some data.

2 When fsck.s3ql asks S3 for a list of the stored objects, this list includes even those objects that have been lost. Therefore fsck.s3ql can not detect lost objects and lost data will only become apparent when you try to actually read from a file whose data has been lost. This is a (very unfortunate) peculiarity of Amazon S3.

3 Due to the data de-duplication feature of S3QL, unnoticed lost objects may cause subsequent data loss later in time (see Data Durability for details).

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
15#
 楼主| 发表于 2013-7-16 17:16 | 只看该作者
OpenStack/Swift

OpenStack is an open-source cloud server application suite. Swift is the cloud storage module of OpenStack. Swift/OpenStack storage is offered by many different companies.

The storage URL for the OpenStack backend has the form
  1. swift://<hostname>[:<port>]/<container>[/<prefix>]
复制代码
Note that the storage container must already exist. Most OpenStack providers offer a web frontend that you can use to create storage containers. prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same container.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
16#
 楼主| 发表于 2013-7-17 09:24 | 只看该作者
Rackspace CloudFiles

Rackspace CloudFiles uses OpenStack internally, so it is possible to just use the OpenStack/Swift backend (see above) with auth.api.rackspacecloud.com as the host name and your rackspace API key as the backend passphrase. However, in this case you are restricted to using containers in the default storage region.

To access containers in other storage regions, there is a special rackspace backend that uses a storage URL of the form
  1. rackspace://<region>/<container>[/<prefix>]
复制代码
The storage container must already exist in the selected region. prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL and can be used to store several S3QL file systems in the same container.

You can create a storage container for S3QL using the Cloud Control Panel (click on Files in the topmost menu bar).

Note

As of January 2012, Rackspace does not give any durability or consistency guarantees (see Important Rules to Avoid Losing Data for why this is important). However, Rackspace support agents seem prone to claim very high guarantees. Unless explicitly backed by their terms of service, any such statement should thus be viewed with suspicion. S3QL developers have also repeatedly experienced similar issues with the credibility and competence of the Rackspace support.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
17#
 楼主| 发表于 2013-7-17 09:24 | 只看该作者
S3 compatible

The S3 compatible backend allows S3QL to access any storage service that uses the same protocol as Amazon S3. The storage URL has the form
  1. s3c://<hostname>:<port>/<bucketname>/<prefix>
复制代码
Here bucketname is the name of an (existing) bucket, and prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same bucket.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
18#
 楼主| 发表于 2013-7-17 09:25 | 只看该作者
Local

S3QL is also able to store its data on the local file system. This can be used to backup data on external media, or to access external services that S3QL can not talk to directly (e.g., it is possible to store data over SSH by first mounting the remote system using sshfs and then using the local backend to store the data in the sshfs mountpoint).

The storage URL for local storage is
  1. local://<path>
复制代码
Note that you have to write three consecutive slashes to specify an absolute path, e.g. local:///var/archive. Also, relative paths will automatically be converted to absolute paths before the authentication file (see Storing Authentication Information) is read, i.e. if you are in the /home/john directory and try to mount local://s3ql, the corresponding section in the authentication file must match the storage url local:///home/john/s3ql.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
19#
 楼主| 发表于 2013-7-17 09:26 | 只看该作者
Important Rules to Avoid Losing Data

Most S3QL backends store data in distributed storage systems. These systems differ from a traditional, local hard disk in several important ways. In order to avoid losing data, this section should be read very carefully.

Rules in a Nutshell

To avoid losing your data, obey the following rules:

1. Know what durability you can expect from your chosen storage provider. The durability describes how likely it is that a stored object becomes damaged over time. Such data corruption can never be prevented completely, techniques like geographic replication and RAID storage just reduce the likelihood of it to happen (i.e., increase the durability).

2. When choosing a backend and storage provider, keep in mind that when using S3QL, the effective durability of the file system data will be reduced because of S3QL’s data de-duplication feature.

3. Determine your storage service’s consistency window. The consistency window that is important for S3QL is the smaller of the times for which:

3.1 a newly created object may not yet be included in the list of stored objects
3.2 an attempt to read a newly created object may fail with the storage service reporting that the object does not exist

If one of the above times is zero, we say that as far as S3QL is concerned the storage service has immediate consistency.

If your storage provider claims that neither of the above can ever happen, while at the same time promising high durability, you should choose a respectable provider instead.

使用道具 举报

回复
论坛徽章:
277
马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14马上有车
日期:2014-02-19 11:55:14马上有车
日期:2014-02-18 16:41:112014年新春福章
日期:2014-02-18 16:41:11版主9段
日期:2012-11-25 02:21:03ITPUB年度最佳版主
日期:2014-02-19 10:05:27现任管理团队成员
日期:2011-05-07 01:45:08
20#
 楼主| 发表于 2013-7-17 09:27 | 只看该作者
4. When mounting the same file system on different computers (or on the same computer but with different --cachedir directories), the time that passes between the first and second of invocation of mount.s3ql must be at least as long as your storage service’s consistency window. If your storage service offers immediate consistency, you do not need to wait at all.

5. Before running fsck.s3ql or s3qladm, the file system must have been left untouched for the length of the consistency window. If your storage service offers immediate consistency, you do not need to wait at all.

The rest of this section explains the above rules and the reasons for them in more detail. It also contains a list of the consistency windows for a number of larger storage providers.

Consistency Window List

The following is a list of the consistency windows (as far as S3QL is concerned) for a number of storage providers. This list doesn’t come with any guarantees and may be outdated. If your storage provider is not included, or if you need more reliable information, check with your storage provider.

——————————————————————————————
Storage Provider                                        Consistency
——————————————————————————————
Amazon S3 in the US standard region        Eventual
Amazon S3 in other regions                        Immediate
Google Storage                                        Immediate
RackSpace CloudFiles                                Eventual
——————————————————————————————

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表