Choosing Cassandra consistency levels wisely

by Federico Razzoli | Aug 9, 2021 | Cassandra

Need Help? Click Here for Expert Support

It is very important to understand Cassandra consistency levels, to choose the best balance between speed and consistency. In this article I’ve put some quick notes for persons who are new to Cassandra, and some good practices that in my opinion it’s better to follow.

Blixy is choosing the best read and write consistency levels for a customer’s application.

Changing the consistency level

The default consistency level is ONE, for both writes and reads.

We cannot set a different default. Not even at session level.

We can set the consistency level for each query we run. We need to do that at driver level – check the documentation of your driver to find out how to do it. Note that the CONSISTENCY statement is not CQL: it only works with cqlsh.

Quick facts

Reads and writes support different consistency levels. Most of them however are shared.
Consistency levels behavior depends on your replication factor. If you have multiple datacentres (even if they are just logical datacentres) draw a map of your keyspaces redundancy.
Read Cassandra documentation carefully. You really need to understand consistency levels to avoid slow queries and failures.

Consistency levels speed

Some general facts to keep in mind:

Stricter consistency levels produce more accurate results but they are slower (obviously).
Queries that return many rows are more affected by strict consistency levels.
Slow or badly configured networks increase the cost of strict consistency levels.

Consistency levels accuracy

Consistency levels based on a quorum require a consensus from the majority of nodes. The majority of two nodes is two nodes. If you use quorums, you need an odd number of nodes. If you use local quorums (majority of nodes in a specific datacentre), you need an odd number of nodes in each datacentre.
Most of the times you don’t want to use local consistency levels. Cassandra is smart enough to contact nodes in the same cluster, when possible. But if it’s not possible, you have two choices: contact nodes in other datacentres, or fail. Inter-datacentre traffic can be slow, but it’s the price to pay to avoid application failures. Compare the two costs (failures and slowdowns), and opt for the lowest.

Performance and accuracy tradeoff

Ideally, one would always prefer to have the most accurate result. But this is not how Cassandra works. You can achieve that result, but this implies communication between nodes and datacentres. Generally, you do that only for some specific queries – or not at all. If we think that this is a problem for your use case, we should ask ourselves why you are not using a relational DBMS.

The default isolation level, ONE, is usually fine. The node(s) that receives the query will contact the node that owns that relevant rows, and will return the results to the client. But, other than that, there is no network traffic.

If we are considering a stricter isolation level for a certain query, weshould ask yourself the following questions:

Does Cassandra actually return data that is not enough accurate for our use case? If there is no such problem, we should not change the consistency level.
How often are Cassandra nodes unavailable, and for how much time? You should have this information and a SLO. Together, they answer the question if a strict consistency level is too risky.
If we are considering to change the consistency level for many queries, or queries that are executed often, or queries that return big results, or queries on secondary indexes or ALLOW FILTERING, we should do a benchmark. The result of this test and the SLO determine if the change is fine.

More specific advice

The previous tips are, in my opinion, general rules that we should always follow. However I also have practical advice for more specific cases.

Abort or retry?

Application developers, after choosing a consistency level for a given query, should ask themselves: what should happen if the consistency level cannot be satisfied because some servers are down?

Should the query simply fail, and an error message be returned to the users, asking them to come back later? Fine.

But you can also retry the query with a more relaxed consistency level. The idea behind this is quite simple: ideally we’d like the query to be executed with, say, QUORUM; but if that’s not possible, instead of failing, we can run it with TWO or even ONE.

Sound like a good idea? Good, but these retries should not create a too high traffic between the client and Cassandra. Consider “remembering” that a certain consistency level cannot be satisfied, and don’t retry it until a certain timeout has expired. Something similar to the following pseudo-code:

set last_fail = '01-01-1970 00:00:00'

function run_query:
    if last_fail < (now() - timeout):
        set consistency = 'QUORUM'
        try query
        if query fails:
            set last_fail = now()
    else:
        set consistency = 'ONE'
        try query
        if query fails:
            return error to user

When QUORUM is too demanding

To decrease the chances that certain consistency levels cannot be satisfied because not enough nodes are running, we can add new nodes. With two caveats:

The ALL consistency level will become harder to satisfy, not easier. It’s better to avoid it in this situation.
The QUORUM and LOCAL_QUORUM consistency levels require that the majority of nodes is running, so adding more nodes could increase the number of nodes that must run for certain queries to be satisfied.

A solution is to avoid ALL and replace QUORUM with other consistency levels. For example, suppose you have a replication factor of 3, and some queries use QUORUM. Two nodes need to be running. But if we add one node, QUORUM will require three nodes to be running. So in this case, it is better to replace QUORUM with TWO.

Load balancing with logical datacentres

Suppose you have a limited number of queries returning many rows. Maybe they it’s OK if they are consuming resources, but you don’t want them to slow down other queries, to avoid that the application used by customers slows down.

You can achieve this by adding nodes in a different datacentre. But this implies two problems: you need to have a second datacentre but maybe you don’t, and you will have traffic between different datacentres.

The solution is to use a logical datacentre. If you use GossipingPropertyFileSnitch, you can set the datacentre’s name by editing the cassandra-rackdc.properties file. Even if all nodes run in the same physical datacentre, you can tell Cassandra that they run on different datacentres. These are logical datacentres.

If you need expert advice to make your Cassandra databases more scalable or more reliable, consider our Cassandra Health Checks.

Federico Razzoli

Did you like this article?

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli

Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

A Review of 2024 in the Database World

Jan 21, 2025

It's January 2025, so it's a good time to look back, and write my thoughts on the most important events and trends of 2024 in the database world. Would you like to share your thoughts? I'd be happy to read your comments. How does MariaDB compare to MySQL? MariaDB...

Cassandra: How to run a query with Maximum Consistency

Dec 8, 2022

Cassandra consistency levels are a tricky concept, until you familiarise with them. They’re based on a simple consideration: not all data and not all queries require the same level of correctness.

ALERT: Many Cassandra users may need an urgent upgrade

Jul 28, 2021

Cassandra 4.0 GA was released on 26th July 2021. Older versions users need to upgrade quickly.

Services



2 Comments

jp on August 16, 2021 at 7:29 am

“For example, suppose you have three nodes, and some queries use QUORUM . Two nodes need to be running. But if we add one node, QUORUM will require three nodes to be running. So in this case, it is better to replace QUORUM with TWO .”

Hello. I dont uderstand this. Because how many nodes have to be running doesnot depends on how many node are in the cluster but it depends only on replication factor. If you have RF=3, and 10 nodes in cluster (or 15 nodes in the cluster), CL=Quorum – all it means you have to wait for 2 nodes for response.
Reply
- Federico Razzoli on August 16, 2021 at 9:06 am
  
  Yes, you are correct. Fixed in this way: “suppose … and you have a replication factor of 3”.
  Thank you!
  Reply

Email

Schedule Meeting

Phone

Choosing Cassandra consistency levels wisely

Changing the consistency level

Quick facts

Consistency levels speed

Consistency levels accuracy

Performance and accuracy tradeoff

More specific advice

Abort or retry?

When QUORUM is too demanding

Load balancing with logical datacentres

Recent Posts

A Review of 2024 in the Database World

Cassandra: How to run a query with Maximum Consistency

ALERT: Many Cassandra users may need an urgent upgrade

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

2 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

To BLOB or not to BLOB? The image storage dilemma

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Policies & Licenses

Follow Us on Social Media