|
*Latency under load – contention on the same lock or a large number of concurrent users increase the time it takes for the data server to serve these requests, and impacts the overall latency. Distributed data management can smooth out the impact of these two factors by spreading the load and contestations between multiple data partitions.
*Latency vs. consistency tradeoff – guaranteeing consistent, ordered operations requires serialization, which increases latency. There is an explicit tradeoff to be made here – one can improve latency, but at the cost of relaxing consistency.
*Latency and multiple replicas – for read operations, reading data in parallel from multiple replicas can improve latency, because you take advantage of the faster responders. (If you need to read from only k replicas in order to be sure you are reading the correct value, the overall latency is the latency of the k'th fastest, not the overall slowest.)
Is data distribution a leaky abstraction?
There have been many attempts to make the transition from centralized data model to distributed data model seamless through abstraction. There is a good chance that things wouldn't work as expected with this level of abstraction – for example, join queries work very differently if all the data is placed in one centralized location or if it spread across distributed data partitions. The same thing applies for any aggregated function such as SUM, AND, or any blocking operations.
Hiding these details from the application can lead to bad design, which in many cases will only be discovered at later stages of the application development. On the other hand,forcing explicit change on the application can also be a painful process. So the question is,at which level abstraction can become useful and at what point it becomes "leaky". I suggested that the best measure is the chances of failure. If in 80% of the cases chances are that users will choose the wrong option – this is a clear indication for a leaky abstraction. There are various way to deal with that:
1. Avoid any abstraction and force explicit change in the application code.
2. Provide an abstraction, but throw a warning when there is a chance for bad use.
3. Provide an abstraction, but also explicit semantics (through annotations or special query semantics) for dealing with distributed data management, such as affinity semantics,semantics for parallel execution and map/reduce semantics. |
|