Last updated on 20 Ottobre 2020
Transactions are the least known RDBMS features. Everyone knows they exist, but few know how they work. For this reason, it is not obvious that they can be read only. This article explains what read only transactions are and why we should use them when appropriate.
What is a read only transaction?
An objection I’ve heard more than once is: transactions are for writes,
SELECTs don’t even happen inside a transaction! But this is wrong. Transactions are not only about
COMMIT. They are also about isolation.
As a general rule, this is what happens. When you start a transaction we acquire a view, or snapshot, on data. Until the end of the transaction we will see that snapshot, not modifications made by other connections.
Reads are generally non-locking. After you’ve read a row, other users can still modify it, even if your transaction is still in progress. Internally, MySQL will keep the information that allow to “rebuild” data as they were when you acquired the snapshot, so you can still see them. This information is preserved in the transaction logs.
To be clear, we actually always use transactions. By default, we run queries in
autocommit mode, and we don’t start or end transactions explicitly. But even in this case, every SQL statement we run is a transaction.
There is no such thing as reading objective data. We always read from (or write to) snapshots acquired at a certain point of time, that include all the changes made by committed transactions.
Why should we use read only transactions?
The answer should be obvious now. When we want to run more than one
SELECT, and we want the results to be consistent, we should do it in a read only transaction. For example, we could do something like this:
SET @date := CURRENT_DATE(); SELECT COUNT(DISTINCT user_id) FROM product WHERE date = @date; SELECT COUNT(DISTINCT product_id) FROM product WHERE date = @date;
If we want the two results to reflect the reality at a certain point in time, we need to do this in a transaction:
START TRANSACTION READ ONLY; SET @date := CURRENT_DATE(); SELECT COUNT(DISTINCT user_id) FROM product WHERE date = @date; SELECT COUNT(DISTINCT product_id) FROM product WHERE date = @date; COMMIT;
The question implies something: why should we use a read only transaction, when we can use a regular read write transaction? If for some reason we try to write something, the statement will fail. So it’s simpler to just always use read write transactions.
The reason for using read only transactions is mentioned in the MySQL manual:
8.5.3 Optimizing InnoDB Read-Only Transactions – 25 August 2019
InnoDBcan avoid the overhead associated with setting up the transaction ID (
TRX_IDfield) for transactions that are known to be read-only. A transaction ID is only needed for a transaction that might perform write operations or locking reads such as
SELECT ... FOR UPDATE. Eliminating unnecessary transaction IDs reduces the size of internal data structures that are consulted each time a query or data change statement constructs a read view.
Isolation level is a setting that defines how much the current transaction is isolated from concurrent transactions. It can be changed at session level, before starting the next transaction. There is also a global setting, which takes effect for sessions that don’t change their isolation level. By default it is
Isolation levels are:
SERIALIZABLE here, as it’s just a variation of
REPEATABLE READ and it does not make sense with read only transactions)
This list is only valid for MySQL. Different DBMSs support different isolation levels.
The behavior described above refers to
REPEATABLE READ. To recap, it acquires a snapshot of the data, which is used for the whole duration of the transaction.
Another isolation level is
READ COMMITTED. It acquires a new snapshot for each query in the transaction. But this does not make sense for a read only transaction.
A usually underestimated isolation level is
READ UNCOMMITTED. The reason why it is underestimated is that it reads data that are being written by other transactions, but are not yet committed – and possibly, will never be. For this reason, highly inconsistent data may be returned.
However, this can be an important optimisation. In my experience, it can make some queries much faster and greatly reduce CPU usage, especially when you are running expensive analytics
SELECTs on write-intensive servers.
There are cases when results inconsistencies don’t matter. A typical example is running aggregations on many rows: an average on one million values will not change sensibly. Another example is when reading session-level data, that no one else is modifying; or old data, that no one is supposed to modify anymore.
If you want to use
READ UNCOMMITTED, there is no reason to wrap multiple queries in a single transaction. For each query, you can run something like this:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; START TRANSACTION; SELECT COUNT(*) FROM product; COMMIT;
MetaData Lock and the rollback trick
When accessing a table using a non-transactional engine (like MyISAM) inside a transaction, MySQL will return a warning. They seem to assume that there is no reason to access such tables from a transaction, which is false. MariaDB doesn’t return such warning.
In reality, transactions acquire metadata locks (MDL). This means that other connections will not be able to run an
ALTER TABLE on those tables until the transaction finishes. Preventing such problems is very important and should be done every time it’s useful.
mysqldump rolls back
mysqldump --single-transaction uses default read write transactions, not read only ones. My guess is that they don’t consider it important for a single long transaction, but only for many concurrent transactions.
However, the MariaDB version of mysqldump does something interesting with rollbacks:
ROLLBACK TO SAVEPOINT in --single-transaction mode to release metadata lock on table which was already dumped. This allows to avoid blocking concurrent DDL on this table without sacrificing correctness, as we won't access table second time and dumps created by --single-transaction mode have validity point at the start of transaction anyway.
mysqldump in MariaDB repo, taken from commit 624dd71b9419555eca8baadc695e3376de72286f
Note that this doesn't make --single-transaction mode with concurrent DDL safe in general case. It just improves situation for people for whom it might be working.
- An MDL blocks some operations, so it can be a problem just like any other lock, especially in a long transaction;
- An MDL is only useful for data that we are going to access again, just like any other lock;
- Transactions support savepoints in the middle of a transaction, and we can rollback to a savepoint instead of rolling the whole transaction back;
- Doing so releases the MDLs on objects that were only accessed after the savepoint.
Here we discussed a feature that should probably be used more often. Actually, if you know from the beginning that a transaction is read only, there is no reason not to inform MySQL about that.
We discussed which isolation levels make sense for read only transactions, and the nice
ROLLBACK TO SAVEPOINT trick used by mysqldump for MariaDB.
Did you notice any mistake? Do you have ideas to contribute?