|
My application recently finished switching from a master-slave database configuration to Apache Cassandra (version 2.0.4). This took my team around four months, involved rewriting every part of our application that touched the database, and migrating all existing data (we managed to do this all without downtime!).
mysql dead. hail cassandra
— Dave (@tildedave) February 21, 2014
The availability guarantees provided by Cassandra were the main motivation for this switch: the database has no single point of failure. The week after we finished our switch, I had to fill out a business continuity plan: our application failure scenario went from the loss of asingle node (our MySQL master) to the simultaneous unrecoverable loss of three complete datacenters. Combined with a Global Traffic Manager setup across our three datacenters, Cassandra will allow our site to remain operational even if two of them fail, all without the loss of customer data.
Of course, even though the database has no single point of failure, that doesn't mean that your application stays up. It's no good having a highly available database if you only have one edge device handling all incoming traffic. During our database migration, we ran into a number of application-level failure scenarios -- some of these we induced ourselves through our test scenarios, while others happened to us in our lower environments (staging, preprod, test) during the slow march to production. Each of these failures was not due to a failure of the database, but the failure of components that we added on top of it.
|
|