12
返回列表 发新帖
楼主: Sky-Tiger

Architecture Summit 2008 - Write up

[复制链接]
论坛徽章:
350
2006年度最佳版主
日期:2007-01-24 12:56:49NBA大富翁
日期:2008-04-21 22:57:29地主之星
日期:2008-11-17 19:37:352008年度最佳版主
日期:2009-03-26 09:33:53股神
日期:2009-04-01 10:05:56NBA季后赛大富翁
日期:2009-06-16 11:48:01NBA季后赛大富翁
日期:2009-06-16 11:48:01ITPUB年度最佳版主
日期:2011-04-08 18:37:09ITPUB年度最佳版主
日期:2011-12-28 15:24:18ITPUB年度最佳技术原创精华奖
日期:2012-03-13 17:12:05
11#
 楼主| 发表于 2009-4-27 22:42 | 只看该作者
A couple of interesting Points discovered during this session

Apple's Macbooks outnumbered PCs almost 3 to 1,
Flickr deploys new code about 10 times per day. Cal: "Our last deployment was 8minutes ago."

使用道具 举报

回复
论坛徽章:
350
2006年度最佳版主
日期:2007-01-24 12:56:49NBA大富翁
日期:2008-04-21 22:57:29地主之星
日期:2008-11-17 19:37:352008年度最佳版主
日期:2009-03-26 09:33:53股神
日期:2009-04-01 10:05:56NBA季后赛大富翁
日期:2009-06-16 11:48:01NBA季后赛大富翁
日期:2009-06-16 11:48:01ITPUB年度最佳版主
日期:2011-04-08 18:37:09ITPUB年度最佳版主
日期:2011-12-28 15:24:18ITPUB年度最佳技术原创精华奖
日期:2012-03-13 17:12:05
12#
 楼主| 发表于 2009-4-27 22:42 | 只看该作者
Data Management discussion

Day two of the meeting started with a discussion on data management patterns. We started the discussion with an attempt to answer what brings people to look into distributed data management in the first place. We laid out three concrete options:

-Performance
-Latency
-Scalability

We started the discussion with the thought that all points are equally important but we quickly agreed that in reality the cost of addressing all at the same time may not always bea good fit for everyone. For example, Memcached is a good example of a solution that was designed to address performance and scalability but completely ignore dealing with the reliability and consistency aspects. On the other hand, all the vendors in the Room(GigaSpaces, Oracle, Terracotta) spent a lot of effort ensuring the reliability and the consistency of the memory-based solution. The adoption of both types of solutions indicates that people are willing to trade some of the functionality for cost or for other reasons and all-or nothing reasoning doesn't seem to apply here.

John Davies raised another driver for moving to distributed data management: availability/reliability. The fact that your data is not stored in a centralized location means that the chances for total failure are smaller. This sparked a small debate on whether indeed reliability is one of the driving forces or just a feature and not necessarily a driving force.During this debate it was clear that most people have a slightly different definition in mind when they use the term reliability. John Purdy suggested a definition for reliability that was quickly accepted by everyone in the room:

Availability/Reliability can be broken down into the following properties:

* Durability
* Consistency
* Availability and
* MTBF

After we agreed on the baseline of what brings people to look into distributed data management in the first place and a basic definition for availability, the discussion moved to cover other aspects of distributed data management.

What is the impact of distributed data management on latency?

* Affinity – in many cases the number of hops required to access the data has a strong impact on latency. Ensuring that the business logic is close to where the data is can reduce that overhead significantly. That statement holds true for any form of data management, distributed or centralized. It becomes more interesting when dealing with distributed management scenarios, since the data can be spread over the network and therefore you may have different latency for different sets of data. A common option to avoid this overhead is to execute the business logic where the data is. A service-oriented architecture can provide a good example of co-locating logic and data, where a service is responsible for both its data and the operations performed on it. An algorithmic trading
application is another example – each node is responsible for very fast processing on a
subset of the overall data.

使用道具 举报

回复
论坛徽章:
350
2006年度最佳版主
日期:2007-01-24 12:56:49NBA大富翁
日期:2008-04-21 22:57:29地主之星
日期:2008-11-17 19:37:352008年度最佳版主
日期:2009-03-26 09:33:53股神
日期:2009-04-01 10:05:56NBA季后赛大富翁
日期:2009-06-16 11:48:01NBA季后赛大富翁
日期:2009-06-16 11:48:01ITPUB年度最佳版主
日期:2011-04-08 18:37:09ITPUB年度最佳版主
日期:2011-12-28 15:24:18ITPUB年度最佳技术原创精华奖
日期:2012-03-13 17:12:05
13#
 楼主| 发表于 2009-4-27 22:42 | 只看该作者
*Latency under load – contention on the same lock or a large number of concurrent users increase the time it takes for the data server to serve these requests, and impacts the overall latency. Distributed data management can smooth out the impact of these two factors by spreading the load and contestations between multiple data partitions.

*Latency vs. consistency tradeoff – guaranteeing consistent, ordered operations requires serialization, which increases latency. There is an explicit tradeoff to be made here – one can improve latency, but at the cost of relaxing consistency.

*Latency and multiple replicas – for read operations, reading data in parallel from multiple replicas can improve latency, because you take advantage of the faster responders. (If you need to read from only k replicas in order to be sure you are reading the correct value, the overall latency is the latency of the k'th fastest, not the overall slowest.)

Is data distribution a leaky abstraction?

There have been many attempts to make the transition from centralized data model to distributed data model seamless through abstraction. There is a good chance that things wouldn't work as expected with this level of abstraction – for example, join queries work very differently if all the data is placed in one centralized location or if it spread across distributed data partitions. The same thing applies for any aggregated function such as SUM, AND, or any blocking operations.

Hiding these details from the application can lead to bad design, which in many cases will only be discovered at later stages of the application development. On the other hand,forcing explicit change on the application can also be a painful process. So the question is,at which level abstraction can become useful and at what point it becomes "leaky". I suggested that the best measure is the chances of failure. If in 80% of the cases chances are that users will choose the wrong option – this is a clear indication for a leaky abstraction. There are various way to deal with that:

1. Avoid any abstraction and force explicit change in the application code.
2. Provide an abstraction, but throw a warning when there is a chance for bad use.
3. Provide an abstraction, but also explicit semantics (through annotations or special query semantics) for dealing with distributed data management, such as affinity semantics,semantics for parallel execution and map/reduce semantics.

使用道具 举报

回复
论坛徽章:
350
2006年度最佳版主
日期:2007-01-24 12:56:49NBA大富翁
日期:2008-04-21 22:57:29地主之星
日期:2008-11-17 19:37:352008年度最佳版主
日期:2009-03-26 09:33:53股神
日期:2009-04-01 10:05:56NBA季后赛大富翁
日期:2009-06-16 11:48:01NBA季后赛大富翁
日期:2009-06-16 11:48:01ITPUB年度最佳版主
日期:2011-04-08 18:37:09ITPUB年度最佳版主
日期:2011-12-28 15:24:18ITPUB年度最佳技术原创精华奖
日期:2012-03-13 17:12:05
14#
 楼主| 发表于 2009-4-27 22:42 | 只看该作者
In a follow-up discussion with John P. and Shay B. it seemed that additional semantics (affinity, parallel, map/reduce) to JPA could serve as a good starting point for such an abstraction for Java developers.

What will be the database's role in future architecture?

Databases are largely valuable in providing logical-physical abstractions, and are probably not going disappear from our world any time soon. However, at the same time it is clear that databases are not going to serve as a general-purpose solution for all data requirements. Moving to distributed data management is becoming more common, as value of data grows on the one hand, and the need for scaling and faster performance for
processing the data emerges on the other hand. Currently there are three main
approaches to this challenge:

Distributed database (Similar to Oracle RAC) – with this option the database is broken into partitions and provides a single driver for interacting with these databases.

Distributed Caching – in this case the database is kept centralized and offload a large part of the read load by putting the data in read-mostly cache.

In-Memory Data Grids – in this case a data grid is used as the system of record instead of the database. The database is used as a background service which maintains the data in a durable storage and used as a read-only access point.

All options require a change – a common fallacy is that if you're already using a database,choosing database partitioning is the most natural next step because it is "seamless". If you choose to change a database from a centralized model to a distributed model, you will most likely need to re-design your schema domain model. You will need to change all applications accordingly to conform to this new schema change. Since in most cases the database is used as a central piece that serves lots of applications (online, reporting,legacy), this change is going to have large impact.

使用道具 举报

回复
论坛徽章:
350
2006年度最佳版主
日期:2007-01-24 12:56:49NBA大富翁
日期:2008-04-21 22:57:29地主之星
日期:2008-11-17 19:37:352008年度最佳版主
日期:2009-03-26 09:33:53股神
日期:2009-04-01 10:05:56NBA季后赛大富翁
日期:2009-06-16 11:48:01NBA季后赛大富翁
日期:2009-06-16 11:48:01ITPUB年度最佳版主
日期:2011-04-08 18:37:09ITPUB年度最佳版主
日期:2011-12-28 15:24:18ITPUB年度最佳技术原创精华奖
日期:2012-03-13 17:12:05
15#
 楼主| 发表于 2009-4-27 22:42 | 只看该作者
On the other hand, caching seem to be a simpler tactical change that helps to optimize the existing system and allows you to apply the change only for the application that needs it most. In this case, the change is significantly smaller, because you only need to change the read queries that hit the database most.

Data grid goes a step further and provides the ability to completely decouple the database while at the same time keeping the database in sync with all changes made into the data grid. This solution is good for read/write scenarios and for cases where most of the frequently-used data can reside in-memory.

Every decision between these options has trade-offs and costs associated with it. In reality you'll rarely go through a process of examining all the options and the cost associated with each one of them. For example, choosing MySQL over Oracle will not always be cheaper if you already have Oracle in place, trained DBAs, etc. The license cost is only part of TCO and not always the most significant one. The same applies to Memcached or any other open source solution. Unfortunately, people rarely apply these cost measurements in their decision-making process (especially when dealing with free-license products). Applying these measures makes it much more likely to discover, sooner rather then later, that what seems to be a natural solution might not actually be the best suited for your needs,compared to the alternatives.

使用道具 举报

回复
论坛徽章:
4
16#
发表于 2009-4-27 23:43 | 只看该作者
每天晚上发这么多,有多少人看呢

使用道具 举报

回复
论坛徽章:
4
17#
发表于 2009-4-27 23:44 | 只看该作者
幸运红包。29PUB币!

使用道具 举报

回复
论坛徽章:
131
乌索普
日期:2017-09-26 13:06:30马上加薪
日期:2014-11-22 01:34:242014年世界杯参赛球队: 尼日利亚
日期:2014-06-17 15:23:23马上有对象
日期:2014-05-11 19:35:172014年新春福章
日期:2014-04-04 16:16:58马上有对象
日期:2014-03-08 16:50:54马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14
18#
发表于 2009-4-28 23:49 | 只看该作者
头晕啊

使用道具 举报

回复
论坛徽章:
131
乌索普
日期:2017-09-26 13:06:30马上加薪
日期:2014-11-22 01:34:242014年世界杯参赛球队: 尼日利亚
日期:2014-06-17 15:23:23马上有对象
日期:2014-05-11 19:35:172014年新春福章
日期:2014-04-04 16:16:58马上有对象
日期:2014-03-08 16:50:54马上加薪
日期:2014-02-19 11:55:14马上有对象
日期:2014-02-19 11:55:14马上有钱
日期:2014-02-19 11:55:14马上有房
日期:2014-02-19 11:55:14
19#
发表于 2009-4-28 23:51 | 只看该作者
原帖由 freedom2k 于 2009-4-27 23:43 发表
每天晚上发这么多,有多少人看呢

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表