Wednesday, January 18, 2012

Recovering big data from a big (or small) disaster

Cassandra offers some  cool fault tolerant features and a couple of different  options for recovery from simple crashes or catastrophic failures.

The two primary options are:

1)  Automatic Replication – the more popular option, and
2)  Traditional Snapshot backups and restores


Option 1) Automatic Replication

In most serious production deployments, Cassandra is setup as a cluster of nodes, separated on different physical servers. These servers could be on the same rack, different racks, different physical locations (or data centers) or even completely different clouds (for example – a hybrid cloud combining a public Amazon EC2 Cloud with an internal private Cloud).

Since there is no traditional Master/Slave architecture, there is also no single point of failure. Cassandra uses a peer-to-peer implementation and can be configured to automatically replicate every incoming write into multiple locations.  This can provide excellent insulation from unexpected failures or scheduled downtime.

Possible replication options include:

1)  Replication between different physical server racks

2)  Replication between geographically dispersed data centers

3)  Replication between public Cloud providers and in house servers

Replication strategy is configurable, and uses an internal construct called a “snitch”. The snitch is  a configurable component on each node, which keeps track of physical proximity of other nodes ( for example, which nodes are in my physical server, rack, datacenter or cloud?) and uses the information to ensure that replication is distributed across servers based on the desired replicating option. You specify these replication options when you create your Keyspaces in Cassandra.

The number of replicas is also configurable – starting from a value of 1 (which means only one copy is kept on one node) to n. You typically want to specify a replication factor > 1 (for example 3) to realize the benefits of linear scalability. The biggest draw in Big Data technologies like Cassandra is linear scalability. If you are running into performance overloads due to excess demand, just throw in a few more physical servers into the cluster, with Cassandra nodes on each one, and each node will be able to talk to the other nodes, and automatically figure out how to share the work so that performance can improve. By the same token, when system demand is slow, you want to be able to remove a bunch of physical servers from your Cloud to save some $. If your replication factor is set to a higher value (for example 3) you can just go ahead and remove some servers (for example, unplug a rack, or release some servers back to EC2) and Cassandra will still continue to deliver data seamlessly because it has additional copies of the data on other servers. Nice, eh?

Cassandra uses an internal protocol called Gossip to keep track of the whole network topology. The (supposedly) low intensity chatter on gossip, keeps track of which nodes are up or down, and incoming writes and read requests are directed accordingly. For example, if a node crashes, gossip will spread the word, and any future writes or reads will avoid trying to go to the crashed node and go to a replica instead. When the node comes back online again, gossip will once again report the node is back and data will start flowing in and out. Pretty neat stuff.

However, keep in mind that this does not come for free – the higher the replication factor – the more expensive the read and write performance is going to be. Jus thow much impact there is remains to be seen with some actual tests that I’ll be blogging about soon.

 Option 2) Traditional Snapshot backup and restores

Cassandra comes with an in-built set of tools. One of the tools (nodetool) comes with an option to take an instant snapshot of the data and store it someplace. (No need to shut down Cassandra – this is real-time, command line based). Snapshots are stored as files which you can then copy and ship off some place safe. There is an incremental snapshot option available as well, so you don’t need to capture the entire snapshot each time. Pretty useful with you are talking TeraBytes.

The restore process is equally simple, but does require a shutdown of the node. You need to stop the node, clean out all the data (by deleting the data folder), copying the archived snapshots and all increments into the new data folder and restarting Cassandra.

More information on this subject can be found on both the Apache Cassandra Wiki and the Datastax FAQ page – both of which I found quite handy in my research.

Couldn’t be simpler. Now, that’s the theory. I plan to run some experiments and the next blog(s) will post the real story and explore the questions –

1)  Does it work as simply as advertised?

2)  What are the performance trade-off’s?

3)  Any gotchas...

Stay tuned!