Sometimes it is possible to do with one query what most people would do with two or more queries. This is done by using expressive SQL syntax that is often ignored. This principle applies to any database, but this article focuses on MySQL and MariaDB syntax.
Why reducing the number of queries is good
Reducing the number of queries is important with any DBMS. There are two reasons for this: reducing the latency of applications; and reducing the workload for the database servers.
Reducing the number of queries reduces the latency of a web page. A web page is not completely loaded until all queries are executed. And running a query involves sending a text over a network, and waiting for the results.
But also remember that the DBMS is usually your application’s bottleneck, and we shouldn’t increase its workload without a reason. While simple queries are fast, they always involve a certain amount of work that we tend to forget: listening to data sent by clients, parsing the SQL, mapping the table/column names to actual data structures, checking permissions, finding the best execution strategy… and more.
LAST_INSERT_ID()
In relational databases, the following situation occurs quite often: we insert a new row into a table, we let the DBMS generate the id automatically, and then we need to know that id to insert another row into a linked table. For example we create a row in the order table, and than we create a row in ordered_product – which will contain the order_id.
MySQL has the LAST_INSERT_ID()
function, which returns the last id generated by the current session. This allows to insert rows in this way:
INSERT INTO order (id, user_id, date)
VALUES (DEFAULT, 24, DATE());
INSERT INTO ordered_product (order_id, product_id)
VALUES (LAST_INSERT_ID(), 12);
Inserting multiple rows
When we insert several rows into the same table, there is no need to run a separate INSERT
statement for each row. With MySQL, it is possible to insert many rows with a single statement:
INSERT INTO usert (first_name, last_name) VALUES
('Emma', 'Goldman'),
('Ada', 'Lovelace'),
('John', 'Doe');
Duplicate rows
Sometimes we want to insert a row that could be already there. In this context, “already there” means that trying to insert it will violate a UNIQUE
constraint or the primary key uniqueness. If the row is already there, depending on the situation, we may want to:
- Receive an error/warning and insert nothing;
- Replace the old row or some of the columns.
The most commonly used solution is to run a SELECT
to check if the row is present, and then run an INSERT
if the row is not found; otherwise, run nothing or an UPDATE
, or even a DELETE
+ INSERT
. For all these cases, I propose a better solution.
If you don’t want to insert anything in case the row is already there, you can just run the INSERT
and check if it returned an error. If that’s the case, the transaction will not roll back automatically.
If you are running a multi-row INSERT
, you may want to insert the rows that are not duplicate. To do so, use INSERT IGNORE
. For each row that cannot be inserted, you will receive a warning, not an error, and non-duplicate rows will still be inserted. If you receive an error, you know that the whole statement failed for some other reason and nothing was written.
To entirely replace a row, the simplest solution is to use REPLACE
. I don’t suggest to do so. If you want to, be sure to understand what happens when you have foreign keys or triggers – and if you don’t have them now, ask yourself what will happen if they are added later. Even AUTO_INCREMENT
behaviour with REPLACE
is buggy (pardon, I mean: is interesting).
Instead, we can use INSERT ... ON DUPLICATE KEY UPDATE
. This allows to replace all columns (maybe setting some of them to DEFAULT()
) or some of them.
Modifying a value
Sometimes we know that a value is part of a certain row, and we need to modify it. Common examples are increasing a counter by one when a page is visited, or decreasing the quantity of products in stack. A less common example is modifying a string, by adding a suffix.
In such cases, the most common solution is to read a value from the database, modifying it in the application, and then writing it. So we have a SELECT
followed by an UPDATE
.
Normally, you can achieve the same result in this way:
UPDATE product SET qty = qty - 1 WHERE id = 24;
(For string concatenation, check an older article: How to concatenate strings in MySQL and MariaDB)
But what if we also need to know the value? Well, we can use variables:
UPDATE product
SET qty = (@qty := qty - 1)
WHERE id = 24;
Unfortunately, while this still works in MySQL 8.0, we will get this error:
mysql> SHOW WARNINGS G
*************************** 1. row ***************************
Level: Warning
Code: 1287
Message: Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'.
1 row in set (0.00 sec)
They did this change because variables are sometimes set in a SELECT
as a dirty workaround because window functions were historically not supported in MySQL. However MySQL supports them, and this hack is not needed anymore.
Except that this particular case has nothing to do with windowing functions. The solution proposed in the error message doesn’t help here.
Multi-table DELETE and UPDATE
Suppose you want to delete or modify some posts, and all comments associated to them. You don’t need to run DELETE
or UPDATE
on two tables separately. MySQL supports multi-table DELETE
and multi-table UPDATE
instead, which are somewhat similar to JOIN
s.
I recommend to check the documentation pages I linked. Using subqueries (UPDATE ... WHERE if IN (SELECT ...)
) may be more intuitive, but the optimiser will not be able to choose a proper query plan.
Autocommit
In InnoDB (MySQL storage engine) nothing happens outside a transaction. It doesn’t really matter if we start and commit a transaction implicitly or not. Therefore the following pieces of code are equivalent:
SET autocommit := 1;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
START TRANSACTION;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
COMMIT;
Autocommit is ON
by default. So, unless we changed it explicitly, we can just run:
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
However, I saw ORMs enclosing single-statement transactions in START TRANSACTION
and COMMIT
statements. Try to avoid that.
AND CHAIN, AND RELEASE
When we complete a transaction, successfully or not, we probably want to start another or to disconnect from MySQL.
To start another transaction, we could use this syntax:
COMMIT AND CHAIN;
ROLLBACK AND CHAIN;
No lock or snapshot is taken until we write or read some data, so there is no drawback in using this feature. We simply avoid to run one more START TRANSACTION
.
To close the connection when the transaction ends:
COMMIT AND RELEASE;
ROLLBACK AND RELEASE;
This doesn’t avoid to run another SQL instruction. But it shortens the time we stay connected to MySQL or the proxy.
If this is a web page or application that will reconnect soon to MySQL or a proxy, it may be better not to disconnect at all. In any case, if we use a proxy, it probably shouldn’t disconnect from MySQL.
You may want to make the or the default behavior, for the current session or globally, using the completion_type
variable. This would be ok for a database abstraction library which handles the transactions, but I don’t suggest to use it in other cases, because this is not a well-known feature, and it would confuse developers.
Stored procedures and functions
I’ve already mentioned why stored programs can potentially be a huge performance optimisation, in the article 3 good reasons to use Stored Procedures. Let me summarise those reasons again:
- Less statements sent from the client to the server (one
CALL
statement instead of several queries); - Less results sent from the server to the client (intermediate results are handled inside the stored procedure);
- Less locks – the key here is to understand that locking problems remain, but at least they should be held for a short time.
I want to remind you about a particular type of stored programs: stored aggregate functions, currently supported in MariaDB but not MySQL. See my article MariaDB Stored Aggregate Functions by examples. With aggregate functions, you may loop on a huge amount of rows and return a single value, saving a lot of client-server traffic.
DO
The DO
statement is an almost unknown MySQL syntax. It’s purpose is to invoke a function without returning any result to the client. I believe that it is worth using it in some situations:
- When invoking a stored function which writes some data and returns a potentially big resultset, but sometimes we have no use for the resultset. If we are doing this, however, we are probably using a function to do something that should be done by a procedure.
- When we run functions or queries because of their side effects inside a stored procedure, and we don’t want to send multiple resultsets to the client.
INSERT SELECT and CREATE TABLE SELECT
We have two useful statements to copy rows between tables:
INSERT SELECT
copies the result of aSELECT
to an existing table;CREATE TABLE
This will reduce server resource usageSELECT
copies it to a new table.
Typically, these are one time operations, so using these statements is convenient but it’s not a relevant performance optimisation. However sometimes they are used in cron jobs, for example to automate the archival of old rows, moving them from current tables into compressed or remote tables.
DELETE / INSERT RETURNING
MariaDB implemented the DELETE RETURNING
statement, and more recently (version 10.5) INSERT RETURNING
and REPLACE RETURNING
. They address the following use cases:
- We want to read details about a row before deleting it.
- We want to insert a row and then use it in the code (this is something that ORMs commonly do).
Here’s a DELETE RETURNING
example. INSERT RETURNING
is very similar.
DELETE FROM user
WHERE id = 24
RETURNING email;
For more information, see MariaDB RETURNING statements.
SQL_CALC_FOUND_ROWS
In a search page, the results are often paged (for example a website could display 10 results per page, just like Google). It is not an uncommon practice to show the total number of results found.
Let me say that, in most cases, I consider that as a bad practice – because the query to show that number can be horribly slow. Also, it is a common practice to use ORDER BY
and LIMIT
to page the results. This is also a bad practice.
But if you are doing it, there’s at least something that you can do to improve the search page performance without a big effort. Instead of running a separate query with COUNT(*)
to get the number of total results, just add SQL_CALC_FOUND_ROWS
to your query and then run a SELECT FOUND_ROWS()
.
SQL_CALC_FOUND_ROWS
and FOUND_ROWS()
are deprecated in MySQL 8 and will be removed in a future version of MySQL. As a consequence, I can only recommend them on MariaDB – keeping in mind that they are just a mitigation of a bad practice.
Side note: bug #101325 asked not to remove SQL_CALC_FOUND_ROWS
, but Oracle censored it.
SET and SHOW
This is not a hint about something you should do; it is more about what you should avoid. ORMs tend to abuse of statements like SET
and SHOW
.
SET
is generally ran when a session starts, or even before each query. It’s used to set some session variables to be sure, for example, that MySQL will expect the correct character set to be used. However, this is only necessary if the MySQL configuration needs some corrections. If it is the case, it’s better to fix it. Then, the ORM should be instructed not to run SET
statements, and just assume that the configuration is correct.
That said, if you want to use SET
, keep in mind that you can set more than one variable with the same statement:
SET @my_var := 1,
@your_var := 2,
@her_var := 3;
SHOW
statements (SHOW TABLES
, SHOW COLUMNS
, …) are used by some ORMs to retrieve the remote table definitions from the database. If this is done for every new session, this operation consume resources. The ORM should already know the tables structures instead.
I remember that in a company where I have worked, PMM per-statement graphs shown that SHOW CREATE TABLE
was ran up to 2K times per second when worker threads started. No one had idea that this happened, because it was done by the ORM transparently. However, this should be avoided.
See also
Related courses:
Conclusions
When it comes to optimising an application performance, what we need to do is to find the slow queries and fix them or build proper indexes. But sensibly reducing the number of executed queries and communication between the application and the database also helps. This will reduce server resource usage, in particular CPU and network.
The above mentioned SQL statements are part of the SQL language supported by MySQL and MariaDB, so there is no reason not to use them. ORM authors should implement them, and, until they do so, developers should consider writing at least part of their queries without using an ORM.
Federico Razzoli
0 Comments