Choosing Cassandra consistency levels

Last updated on 18 September 2021

It is very important to understand Cassandra consistency levels, to choose the best balance between speed and consistency. In this article I’ve put some quick notes for persons who are new to Cassandra, and some good practices that in my opinion it’s better to follow.

Blixy is choosing the best read and write consistency levels for a customer’s application.

Changing the consistency level

The default consistency level is  ONE , for both writes and reads.

We cannot set a different default. Not even at session level.

We can set the consistency level for each query we run. We need to do that at driver level – check the documentation of your driver to find out how to do it. Note that the  CONSISTENCY  statement is not CQL: it only works with  cqlsh .

Quick facts

  • Reads and writes support different consistency levels. Most of them however are shared.
  • Consistency levels behavior depends on your replication factor. If you have multiple datacentres (even if they are just logical datacentres) draw a map of your keyspaces redundancy.
  • Read Cassandra documentation carefully. You really need to understand consistency levels to avoid slow queries and failures.

Consistency levels speed

Some general facts to keep in mind:

  • Stricter consistency levels produce more accurate results but they are slower (obviously).
  • Queries that return many rows are more affected by strict consistency levels.
  • Slow or badly configured networks increase the cost of strict consistency levels.

Consistency levels accuracy

  • Consistency levels based on a quorum require a consensus from the majority of nodes. The majority of two nodes is two nodes. If you use quorums, you need an odd number of nodes. If you use local quorums (majority of nodes in a specific datacentre), you need an odd number of nodes in each datacentre.
  • Most of the times you don’t want to use local consistency levels. Cassandra is smart enough to contact nodes in the same cluster, when possible. But if it’s not possible, you have two choices: contact nodes in other datacentres, or fail. Inter-datacentre traffic can be slow, but it’s the price to pay to avoid application failures. Compare the two costs (failures and slowdowns), and opt for the lowest.

Performance and accuracy tradeoff

Ideally, one would always prefer to have the most accurate result. But this is not how Cassandra works. You can achieve that result, but this implies communication between nodes and datacentres. Generally, you do that only for some specific queries – or not at all. If we think that this is a problem for your use case, we should ask ourselves why you are not using a relational DBMS.

The default isolation level,  ONE , is usually fine. The node(s) that receives the query will contact the node that owns that relevant rows, and will return the results to the client. But, other than that, there is no network traffic.

If we are considering a stricter isolation level for a certain query, weshould ask yourself the following questions:

  • Does Cassandra actually return data that is not enough accurate for our use case? If there is no such problem, we should not change the consistency level.
  • How often are Cassandra nodes unavailable, and for how much time? You should have this information and a SLO. Together, they answer the question if a strict consistency level is too risky.
  • If we are considering to change the consistency level for many queries, or queries that are executed often, or queries that return big results, or queries on secondary indexes or  ALLOW FILTERING , we should do a benchmark. The result of this test and the SLO determine if the change is fine.

More specific advice

The previous tips are, in my opinion, general rules that we should always follow. However I also have practical advice for more specific cases.

Abort or retry?

Application developers, after choosing a consistency level for a given query, should ask themselves: what should happen if the consistency level cannot be satisfied because some servers are down?

Should the query simply fail, and an error message be returned to the users, asking them to come back later? Fine.

But you can also retry the query with a more relaxed consistency level. The idea behind this is quite simple: ideally we’d like the query to be executed with, say,  QUORUM ; but if that’s not possible, instead of failing, we can run it with  TWO  or even  ONE .

Sound like a good idea? Good, but these retries should not create a too high traffic between the client and Cassandra. Consider “remembering” that a certain consistency level cannot be satisfied, and don’t retry it until a certain timeout has expired. Something similar to the following pseudo-code:

set last_fail = '01-01-1970 00:00:00'
function run_query:
if last_fail < (now() - timeout):
set consistency = 'QUORUM'
try query
if query fails:
set last_fail = now()
else:
set consistency = 'ONE'
try query
if query fails:
return error to user

When QUORUM is too demanding

To decrease the chances that certain consistency levels cannot be satisfied because not enough nodes are running, we can add new nodes. With two caveats:

  1. The  ALL  consistency level will become harder to satisfy, not easier. It’s better to avoid it in this situation.
  2. The  QUORUM  and  LOCAL_QUORUM  consistency levels require that the majority of nodes is running, so adding more nodes could increase the number of nodes that must run for certain queries to be satisfied.

A solution is to avoid  ALL  and replace  QUORUM  with other consistency levels. For example, suppose you have a replication factor of 3, and some queries use  QUORUM . Two nodes need to be running. But if we add one node,  QUORUM  will require three nodes to be running. So in this case, it is better to replace  QUORUM  with  TWO .

Load balancing with logical datacentres

Suppose you have a limited number of queries returning many rows. Maybe they it’s OK if they are consuming resources, but you don’t want them to slow down other queries, to avoid that the application used by customers slows down.

You can achieve this by adding nodes in a different datacentre. But this implies two problems: you need to have a second datacentre but maybe you don’t, and you will have traffic between different datacentres.

The solution is to use a logical datacentre. If you use  GossipingPropertyFileSnitch , you can set the datacentre’s name by editing the  cassandra-rackdc.properties  file. Even if all nodes run in the same physical datacentre, you can tell Cassandra that they run on different datacentres. These are logical datacentres.

Federico Razzoli

Did you like this article?

About Federico Razzoli

Federico is Vettabase Ltd director, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems.

2 Replies to “Choosing Cassandra consistency levels”

  1. “For example, suppose you have three nodes, and some queries use QUORUM . Two nodes need to be running. But if we add one node, QUORUM will require three nodes to be running. So in this case, it is better to replace QUORUM with TWO .”

    Hello. I dont uderstand this. Because how many nodes have to be running doesnot depends on how many node are in the cluster but it depends only on replication factor. If you have RF=3, and 10 nodes in cluster (or 15 nodes in the cluster), CL=Quorum – all it means you have to wait for 2 nodes for response.

Leave a Reply

Your email address will not be published. Required fields are marked *

*