MariaDB has many configurable timeouts. Some of them can only be configured by DBAs (database administrators). Others can be adjusted by developers in the application code, though DBAs can configure defaults. MariaDB timeouts can be grouped in these categories:
- Network timeouts;
- Connection timeouts;
- Query timeouts;
- Transaction and locks timeouts.
But why would one change the default timeouts? Isn’t it dangerous?
Actually the default timeouts are very high, so they are only hit in extreme cases. For example, reading a single network package has a 30 seconds timeout. The reasoning behind this is clear: this can’t make communications fail in any real-world environment (hopefully!), so it’s a safe value. And surely when this default was set, in the 90s, this motivation sounded reasonable. Unfortunately, it isn’t today. If a MariaDB instance has to wait 30 seconds for each network package, this will bring no benefit, it will stress other parts of your infrastructure, and will possibly crash one of them. Before discussing timeouts, we’ll discuss this point.
Note that this article only discusses the timeouts that directly affect users, for example the timeouts for their queries or their connection attempts. There are other timeouts that the DBAs should care about, like the timeouts that affect the communication between Galera nodes, or binary log group commits. We might write articles on those topics, if there is interest.
Why timeouts should be reasonable
A timeout is the amount of time that can be spent, as a maximum, waiting for an operation to complete. In this context, the operations involve some form of communication between MariaDB and a client. We’re examining low-level timeouts, such as timeouts for individual TCP packets, as well as high-level timeouts, such as timeouts for queries and idle transactions. But the idea is more or less the same: if an interaction between the client and the database takes too long, abort it.
Why do I consider timeouts important? Because a blocking operation taking a long time might have a chain of consequences, and potentially cause a disaster.
Consider a typical architecture. Here we are only considering the components that lay between the application and the database:
- An application server layer runs some sort of web applications.
- Optionally, each pod or server running the database also runs a load balancer, such as ProxySQL or HAProxy.
- Optionally, a layer of independent load balancers exists.
- The database server receives SQL queries from the load balancers and processes them.
If a database is not responsive and timeouts are too high, the following situations might occur:
- An application server sends a query to a load balancer.
- The load balancer forwards the query to a running database server.
- The query takes a long time, and it locks rows.
- Other queries, from different clients, try to access those rows but are blocked.
- The load balancers’ capacity is saturated and they stop accepting queries from the application.
- Or, the load balancers keep accepting connections but they remain queued, the application servers resources are consumed by blocked outbound connections, and they crash.
If the relevant timeouts are set to a more reasonable value, like 1 second or 5 seconds, this disaster is unlikely to happen. Naturally, the more intensive your database workload is, the more dangerous long timeouts are.
Network timeouts
MariaDB has timeouts to receive (read) or send (write) a single TCP packet from/to the client. Their names are:
These variables default to 30, which means 30 seconds. This is a long wait for a single network packet. To be conservative, I recommend a value of 2.
If a package can’t be sent or received, MariaDB will try to perform the operation again. The number of attempts is defined by
. The default is 30, I recommend 3.net_retry_count
Lowering these timeouts will protect your database, for example, in the following scenarios:
- Long queries are killed on the client side, terminating a thread in the application server. MariaDB will usually assume that the client is still connected, until the timeout is reached.
- A human user connects to MariaDB with a GUI like DBeaver, runs a long query, then closes the GUI. Some GUIs don’t close active connections and don’t close running queries on exit. Even if they do, in case of a crash the queries and connections will remain open.
- The network becomes slow for some reason.
Connection timeouts
connect_timeout
– When a client establishes a connection, the server will wait this number of seconds for a single TCP package. Default: 10. Recommended value: 2.
wait_timeout
– A connection can stay idle for some time. When a connection is doing nothing and no transaction is active, this timeout applies. Default: 28800 (8 hours). Recommendation: it makes sense to use a load balancer (such as ProxySQL) and reutilise connections to MariaDB. But keeping idle connections open for 8 hours is clearly a waste of resources. Normally, a value of 1800 (30 minutes) is safe and prudent.
interactive_timeout
– The above option has a special case: some connections are supposed to be interactive. This is the case when you use a GUI to work visually with a MariaDB database: you launch queries, you read the results, and some time is supposed to pass between these operations. When an application connects to MariaDB with any official driver, it has the option of declaring the connection as interactive. The only difference between an interactive and a non-interactive connection is that, for interactive connections, this timeout applies. Default: 2147483 (24+ days). Recommendation: 30 minutes. However, make sure to specify MAX_USER_CONNECTIONS
for human users, otherwise they might open many connections and leave them open by mistake, consuming MariaDB resources.
Query timeouts
Query timeouts can be set at several level:
- The
max_statement_time
is a default timeout for SQL queries. Default: 0 (disabled). - A query timeout can also be set on a per-user basis with
CREATE USER ... MAX_STATEMENT_TIME
. - A timeout can also be set for an individual query with
SET STATEMENT max_statement_time
.
For the vast majority of queries, even 1 second is an unreasonable duration. A web page typically runs tens of queries, and it’s important that each of them takes a short time.
My recommendation is:
- Set the default (the
max_statement_time
variable) to a prudent but relatively low value, such as 5. - Human users, such as analysts or DBAs, should have a much higher default timeout, especially on replicas.
- If you have at least one separate MariaDB user for jobs (which is a great practice), this user should have a higher timeout, too, because jobs are often supposed to run more expensive queries during off-peak times.
- mariadb-dump and optionally other tools should use a user with a high query timeout.
- But any user, even an OLTP application, might periodically run queries that are allowed to take a longer time. This is normally fine, as long as it’s an exception rather than the rule. In those cases,
max_statement_time
should be increased for individual queries.
Transaction and lock timeouts
idle_transaction_timeout
is the timeout for idle transactions. These transactions started, possibly read and wrote some data, but then they did nothing until the timeout was reached. Default: 0 (disabled). Recommendation: 3.
It is also possible to define different values for read-only and read-write transactions, but I don’t recommend to do so, because in most cases this would unnecessarily complicate things. However, if you wish to do so, you can use the idle_readonly_transaction_timeout
and idle_write_transaction_timeout
variables.
innodb_lock_wait_timeout
determines how much time a query can wait for a row lock to be released. Generally speaking, row locks are usually held by transactions that modified a row. When the timeout is reached, an error is raised. Regardless how long this timeout is, the application should be able to handle this error, and usually retry the transaction a reasonable number of times. Default: 50 seconds. Recommendation: 3 seconds.
Waiting for row locks might cause deadlocks. This is especially true with databases that write frequently, or have many foreign keys. Low values for innodb_lock_wait_timeout
will help reducing deadlocks, but won’t eliminate them. To avoid deadlocks, innodb_lock_wait_timeout
needs to be set to zero. But in this case, transactions will fail more often, and the application needs to be able to retry them. If you do this, and make deadlocks impossible, you should also set innodb_deadlock_detect
to zero. This will disable InnoDB deadlock detection, which can be expensive for CPUs.
lock_wait_timeout
is similar to the previous variable, but it concerns metadata locks (MDLs). Generally speaking, an MDL is acquired by any read or write, and it blocks changes to a table structure. This guarantees, for example, that a column cannot be added or dropped in the middle of a SELECT
. Default: 86400 (1 day). Recommendation: 3 seconds.
slave_abort_blocking_timeout
– A query on a replica might block a DDL statement (CREATE
, ALTER
or DROP
). This happens, for example, if data analysts connect directly to a replica to query data manually, and they run a long SELECT. In these cases, the blocking query will be killed automatically when this timeout is reached. Default: 31536000 (1 year). Recommendation: this value should be much lower to be useful, but an acceptable value depends on what your analysts do, and how critical queries are for their job. Please check with your analysts. Also, keep into account tools that periodically export data from MariaDB to another technology.
Conclusions
I recommend to modify timeouts according to the recommendations in this article. This will avoid inefficient resource consumption, or even disasters, caused by other technologies waiting for a response from MariaDB. Before changing timeouts in production, you should perform adequate tests.
Federico Razzoli
0 Comments