Multiple instances of Cassandra on the same server?

Now that we kicked the tires and installed Cassandra, it's time to take her for a drive.

Before going there, I'd like to mention that after my earlier post about the problems/issues I had in installing DataStax’s version of Cassandra on SUSE - I installed it on a RedHat Linux server using their RPM installer and this time it did install smoothly without any issues and works nicely.

So I will say that the RPM packages works as advertised and, if you are a RedHat user (perhaps Debian too – though I haven’t tried Debian myself) - then the DataStax installers could be a good choice for you - they've done a great job packaging these and have tested things well. We prefer to use SUSE, so it is still a problem for us, and at some point, we'll need to go back and see what we can do to get it to work, or even better someone at DataStax hopefully will!

Now for our first test-drive:

You can setup as many instances of Cassandra as you want on a single server - if you don't have the luxury of multiple servers - though it is pretty cheap to do this on EC2 and I would recommend that as a better alternative to what I am going to describe here.

Anyway, lets assume that like me at the moment, you are stuck with one server. You can still do pretty decent work with Cassandra if you make some tweaks in the internal configurations of each instance. Each of these tweaks can also be easily automated into a shell script. so that when you go grab a new version from Apache (and you’ll be doing a lot of this, I promise) – you just run your script and it will copy the apache distro into multiple folders and tweak the settings below in each folder. For purposes of this post, I am keeping my description below simple– with 3 instances, configured manually.

These are the steps :

1) Copied the apache tar into my Linux server, untarred it and copied the contents the three folders called apache-cassandra-instance-1, apache-cassandra-instance-2 and apache-cassandra-instance-3. Three is good enough for initial testing / playing around with and certainly better than just a single instance which will not really allow you to learn much since most of the interesting stuff in Cassandra happens on a multiple node cluster.

2) Make changes to the conf/cassandra.yaml file in each of the 3 new instance folders. Need to change the following

MAX_HEAP_SIZE="XXXM"'- where XXX can be 256 or 512, etc

listen_address and rpc_address are set to loopback addresses 127.0.0.1, 2 and 3 - so each instance uses different ports but loop back to the same server. It's a kludge but it works.

Change RMI ports to three different ports for example, 8080,1 and 2

To startup Cassandra – bin/cassandra –f will keep it in foreground. Repeat the same for the 3 instances in three separate windows.

You can now use JConsole to inspect Cassandra (at login prompt just enter the address 127.0.0.1:8081 , 127.0.0.2:8082, 127.0.0.3:8083 to inspect the Cassandra instance –for example to look at the Heap size, etc

There’s one last important step. Cassandra uses Hashing to divide data across the cluster (ring) of nodes. Each node has an “Initial Token”. This is the node's logical position in the ring. You need to calculate and set these initial tokens in each node. More about this in the next post.

Cassandra and Hadoop -Hands-On and Hype-Free - the good news, the problems and all...

Wednesday, December 21, 2011

Multiple instances of Cassandra on the same server?

2 comments: