Sometimes it is necessary to update multiple triggers in production. It may be necessary to do so without causing temporary inconsistencies between them, which could cause a service disruption or lead to write incorrect data into the database.
To update triggers atomically, without temporary inconsistencies, we need to run CREATE TRIGGER
and DROP TRIGGER
inside a transaction. But this is not supported by MySQL and MariaDB. If we try to do so, the current transaction is implicitly committed. MariaDB has a variable called in_transaction
that allows us to demonstrate this clearly:
MariaDB [test]> START TRANSACTION; Query OK, 0 rows affected (0.000 sec) MariaDB [test]> SELECT @@in_transaction; +------------------+ | @@in_transaction | +------------------+ | 1 | +------------------+ 1 row in set (0.000 sec) MariaDB [test]> DROP TRIGGER bi_t; Query OK, 0 rows affected (0.001 sec) MariaDB [test]> SELECT @@in_transaction; +------------------+ | @@in_transaction | +------------------+ | 0 | +------------------+ 1 row in set (0.000 sec)
The problem is actually more generic: MySQL and MariaDB don’t support transactional DDL. But let’s focus on triggers. I’ll explain what problems this causes, and then I will propose a solution.
NOTE: MariaDB 10.6 supports crash-safe DDL, meaning that in case of a server crash triggers can only be present and consistent, or not present at all. For example, no orphaned files will be present. But this doesn’t mean that they can be created or deleted as part of a transaction.
The problems
The problems are:
- Phantom trigger – it’s about every single updated trigger;
- Inconsistent triggers – it’s about a table’s triggers as a whole;
- Atomicity – it’s about failures.
Phantom triggers
If I want to replace one trigger with another, as mentioned, I will need to run DROP TRIGGER
followed by CREATE TRIGGER
. This implies that, in the (hopefully brief) interval between these expressions, the trigger will simply not exist.
If I run these statements in a transaction, this is not a problem. In fact, for other users, both the statements will take effect together when I COMMIT
the transaction. But again, this cannot happen in MySQL and MariaDB. On a heavily used server, we should definitely expect some users to write the table without activating a trigger.
However, MariaDB has a solution for this problem: CREATE OR REPLACE TRIGGER
. In this way the existing trigger (if any) will be destroyed and the new one will be created, but this operation is atomic. No other user can see the table without a trigger.
Inconsistent triggers
Even with the first problem solved, in most cases we have a second problem: we have multiple triggers associated to a table, and we always want them to be consistent each other.
This is better explained with an example. Suppose that we have triggers BEFORE INSERT
and UPDATE
. These triggers perform some data validation, and return an error if some data in not valid. If you want to know how to do this, take a look at an article I wrote some years ago in my personal blog: Validating data using a TRIGGER.
Anyway, the problem here is that, if we change the validation rules in one trigger, we want to change them in the other trigger also. In no point in time we want the triggers to be inconsistent.
Atomicity
Our triggers operations should completely succeed or completely fail. If, for example, we lose our connection in the middle of these changes, all the changes we’ve done should be undone, to avoid inconsistencies.
The solution
If you used MySQL for centuries like me, you may remember the LOCK TABLES
statement. And probably you remember it with horror. It was developed in ancient times as an alternative to transactions, when MyISAM (or probably its ancestor, ISAM) was the default storage engine and InnoDB simply didn’t exist. Basically, since you could not use a real transaction, you locked a whole table with a read lock or a write lock. Then you did some stuff, and then you ran UNLOCK TABLES
. Operations executed inside theLOCK
block were not atomic, but at least other users could not interfere with them. Nowadays, I only see that command in legacy code – but yes, I still see it.
Why doesLOCK TABLES
help in this case? Because the lock survives statements like DROP TRIGGER
and CREATE TRIGGER
.
So basically what you would do is:
LOCK TABLE something WRITE
;- the command is queued until all existing metadata locks (transactions or DDL in progress) are released; this would happen even if we directly run
DROP TRIGGER
; - you
DROP
old triggers; - you
CREATE
new triggers; UNLOCK TABLES
.
What can go wrong
The above solution seems to me good for most cases, but it is not perfect. I encourage you to read this section to understand which problems you may encounter.
Lost connection or server crash
We are emulating an atomic series of operations, but the logic that makes it atomic is implemented remotely. Without a built-in solution, some possible problems cannot be addressed:
- Lost connections: If the connection is lost, the table will be unlocked, and the triggers already created will be used by all users. Some triggers however are possibly needed but not created yet.
- mysqld crash: When mysqld crashes, of course we restart it. But on restart, tables will be unlocked, and possibly some of the triggers are already already created, while others are not.
Both these risks cannot be avoided. What we can do to mitigate them is to be fast and, if something bad happens, re-establish the connection and complete the migration as soon as possible.
LOCK TABLES and Galera
In the comments, a concern was raised about LOCK TABLES not working as expected as Galera.
Actually, LOCK TABLES
seems to work perfectly on the node where it is executed. The problem is that the lock is not propagated to other nodes.
So I would advise it, if one of these best practices is followed:
- All writes go to the same node;
- Writes to each set of databases (or at least tables) go to the same node.
If so, you are reasonably safe. If you have automated failover and you want to be extra-safe, run LOCK TABLES
and UNLOCK TABLES
on all nodes. If one of them cannot be reached from your host but it seems to be up, you may prefer to postpone the migration until the problem is solved.
Waiting for a metadata lock
As mentioned, any DDL is subject to wait for a metadata lock to be acquired for a long time. Even a long-running SELECT may lead to this problem. Such wait may block replication.
There is no way to avoid this. When it happens, you may want to kill the connection and postpone the migration; or you can kill the long running transaction/query to unblock the migration.
Conclusions
What are the drawbacks of this solution?
- If the connection is lost at some point, no rollback happens. In other word, the Atomicity problem mentioned above is not solved in any way.
- If a long transaction is running, our
LOCK TABLE
statement will be queued for a long time. And it will cause several other statements (includingSELECT
s) to be queue. But this also happens with a simpleDROP TRIGGER
. - For a brief moment, the table will be completely write-locked.
What are the pro’s?
- No disruption.
- Isolation between concurrent sessions.
- Once the lock is acquired, the operations on triggers will be fast.
Other methods are possible, but they are more complicated and not necessarily less risky. For example, if you use row-based replication, you could update the triggers on a slave, promote it to master, and then update the triggers on the master. But then you have a risk of conflicts during the failover.
All in all, this method is good enough for most situations.
Federico Razzoli
Photo credit: Stefano Trucco
0 Comments