Jul 21, 2017

Cassandra Design Best Practices

Cassandra is a great NoSQL product, it provides near real time performance for designed queries and enables high availablity w/ linear scale growth as it uses the eventually consistent paradigm.
 In this post we will focus on some best practices for this great product: 

 How many nodes do you need? 
  1. Number of nodes should be odd in order to support votes during downtime/network cut.
  2. Minimial number should 5, as lower number (3) will result in high stress on the machines during node failure (replicaiton factor is 2 in this case, and each node will have to read 50% of the data and write 50% of data. When you select replication factor 3, each node will need to read 15% of data and write 15% of the data. Therefore, recovery will be much faster, and higher chances performance and availablility will not be affected.
C* like any other data store loves fast disks (SSD) although its SSTables and INSERT only architecture and as much memory as your data.
In particular your nodes should be 32GB to 512GB RAM each (and not less than 8GB in produciton and 4GB in development). This is a common issue since C* was coded in Java. For small machines you should avoid G1 and keep w/ CMS.

JVM heap size should be max of 8GB too avoid too long "stop the world" during GC.
If you feel the default Heap size (max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB)) does not fit your needs, try to set it between 1/4 to 1/2 of your RAM, but not more than 8GB. 

C* is also CPU intensive and 16 cores are recommended (and not less than 2 cores for development).

Repair and Repair Strategy
nodetool repair is probably one of the most common tasks on C* cluster. 
  1. You can run it on a single node or on a whole cluster.
  2. Repair should run before reaching the gc_grace_seconds (default 10 days) that will remove thombstones
  3. You should run it durring off-peak hours (probably during weekend) if you keep w/ the gc_grace_seconds.
  4. You can take this numbers down, but it will affect your backup and recovery strategy (see details about recovery from failure using hints).
You can optimize the repair process by using the following flags:
  1. -seq: repair token after token: slower and safer
  2. -local: run only on the local data center to avoid downtime of both in any case
  3. -parallel: fastest mode: run on all datacenters in parallel
  4. -j: the number parallel jobs on a node (1-4), using more threads will stress the nodes, but will help end the task faster.
We recommend to select your strategy based on height of your peaks and the sensitivity of your data. If your system has the same level of traffic 24/7, consider doing things slow and sequencial.
The higher your peaks, the more stress your should do on your system during off peak hours.

There are several backup strategies you can have:
  1. Utilize your storage/cloud storage snapshot capabilities. 
  2. Use C*  nodetool  snapshot command. This one is very similar to your storage capabilities but enables backup only the data and not the whole machine.
  3. Use C* incremental backup that will enable point in time recovery. This process is not a daily process, but requires copying and managing small files all the time
  4. Mix C* snapshots and incremental backups to minimize the time of recovery while keeping the point of time recovery option.
  5. Snapshots and commit log: complex process to recover that supports point in time recovery, as you need to reply the commit log.
We recommend to use the daily snapshot if your data is not critical and you want to minimize your Ops costs, or the mix C* snapshots and incremental backup when you must have a point in time recovery.

There are several approaches to go:
  1. Commerical Software:
    1. DataStax OpsCenter solution: as almost every other OSS, DataStax that provides the commerical version of C*, provides a pay for managment and moniotring solution
  2. Commercial Service including
    1. NewRelic: provides a C* plugin as part of its platform
    2. DataDog: with a nice hint on what should be monitored.
  3. Use Open Source with common integration:
    1. Graphite, Grafana:or Prometheus: 3 tools that can work together or apart and integrated w/ time series and relevant metrics.
    2. Old style Nagios and Zabbix that provides communitry plugins
If you choose a  DIY solution there some hints that you can find in the commercial products and services and also in the folowing resources:
  1. Basic monitoring thresholds
  2. Nagios out of the box plugins that thresholds can be extracted from
For example:
  1. Heap usage: 85% (warning), 95% (error)
  2. GC ConcurrentMarkSweep: 9 (warning), 15 Error
Our recommendation is starting when possible with an existing service/product, get experience w/ the metrics that are relevant for your environment, and if needed, implement based on them your own setup.

Lightweight transactions are meant to enable case studies that requires sequence (or some type of transactions) in an eventually consistent environment.
Yet notice, that it's a minimal solution that is aimed to serialize tasks in a single table.

We believe that this is a good solution, but the if your data requires consistent soluton, you should avoid eventually consitent solution and look for SQL solution (with native transactions) or NoSQL solution like MongoDB.

C* Internals
What to know more? use the following videos or get the O'REILY book 

Bottom Line
C* is indeed a great product. However, it definittly not an entry level solution for data storage, and managing it requires skills and expertise

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations