Cassandra consistency levels are a tricky concept, until you familiarise with them. They’re based on a simple consideration: not all data and not all queries require the same level of correctness.
By default:
- data changes are written to only one node for better performance;
- queries involve only one node to improve performance;
- so, queries only “see” the data changes sent to that particular node.
Sometimes this is not acceptable. That’s why we can change the consistency level for an individual read, for an individual write, or for the current session.
What if we want to make sure that some or all queries are completely correct, so that no stale data is returned? We can adjust the consistency levels. Read the rest of this article to learn how they can be configured.
Understanding the quorums and data centres
A quorum is the smallest majority of a group. It’s often described as 50% + 1. Or, if you like precise expressions, it should be described in this way:
floor(population / 2 + 1)
In a Cassandra cluster, a quorum is the 50% of the nodes + 1.
But we’re talking about logical nodes here. Because keyspaces have a replication factor. A replication factor of 3 means that every row must exist in 3 nodes. A higher replication factor increases data redundancy, but it slows down reads and writes that use a certain replication factor.
If you’re using a single data centre, the number of logical nodes is a keyspace replication factor. Let’s admit that you’re using multiple data centres, perhaps logical ones. Then the number of logical nodes is the sum of replication factors of all data centres.
The Maximum Consistency formula
What is the most intuitive way to obtain Maximum Consistency? Setting the ALL
consistency level for all reads and writes. But this is the slowest option, as it implies a lot of unnecessary work under the hood.
Instead, you’ll need to consider:
- the number of nodes affected by the writes you’re interested in;
- the number of nodes participating in the read operation you’re interested in.
If you want to obtain Maximum Consistency, you need to make sure that the sum of these numbers is higher than the number of logical nodes.
Single data centre
Sacrificing write speed:
- Writes consistency level:
ALL
- Reads consistency level:
ONE
Sacrificing read speed:
- Writes consistency level:
ONE
- Reads consistency level:
ALL
Distributing the overhead between writes and reads:
- Writes consistency level:
QUORUM
- Reads consistency level:
QUORUM
Multiple data centres
The above solutions also work with multiple data centres.
However, if we can send related writes and reads to the same data centre, we can use this combination:
- Writes consistency level:
LOCAL_QUORUM
- Reads consistency level:
LOCAL_QUORUM
What if this solution satisfies our redundancy requirements but we can’t make sure to send related writes and reads to the same data centre? In this case writes can use LOCAL_QUORUM
and reads can use EACH_QUORUM
.
If this is not enough for our redundancy requirements, writes can use EACH_QUORUM
and reads can use LOCAL_QUORUM
.
If writes use ALL
, reads can use LOCAL_ONE
. Or the other way around. Note, this may be a bit risky. The query will fail if the local data centre doesn’t have a healthy node able to accept it.
One, two, three
You may know that Cassandra supports the ONE
, TWO
and THREE
consistency levels. Can we use them to achieve the Maximum Consistency for clusters of 5 logical nodes or less? We can, but these consistency levels are not flexible. They specify a number that remains constant if the number of nodes increases or decreases. If it decreases, this may result in failures. If the number of nodes increases, it will lead to insufficient redundancy .
Should we aim for Maximum Consistency with Cassandra?
This is a reasonable question that we should ask ourselves.
Maximum Consistency means the maximum consistency that Cassandra can guarantee – this is a term I made up, you won’t find it elsewhere. But even if you use Consistency Levels as explained in this article, you might still occasionally face problems with lost Tombstones. In practice, some deletions may be “forgotten” under some circumstances, and erased data may resurrect. This is a topic for another post.
Cassandra is designed to be fast with some compromises that affect data correctness. Trying to achieve the Maximum Consistency means that we don’t take advantage of Cassandra unique characteristics, and we use it for a purpose it wasn’t designed for.
If this is an exception, it is not a problem. We can use relaxed consistency levels for most of our writes and/or queries, and occasionally achieve the Maximum Consistency.
For more practical advice on how to use Cassandra’s consistency levels, see my older post Choosing Cassandra consistency levels.
Federico Razzoli
0 Comments