Schedule Meeting

a

MariaDB/MySQL: use SQL properly to run less queries

by | Oct 29, 2020 | MariaDB, MySQL, SQL Language

Need Help?  Click Here for Expert Support

Sometimes it is possible to do with one query what most people would do with two or more queries. This is done by using expressive SQL syntax that is often ignored. This principle applies to any database, but this article focuses on MySQL and MariaDB syntax.

Does your application talk too much
to the database?

Why reducing the number of queries is good

Reducing the number of queries is important with any DBMS. There are two reasons for this: reducing the latency of applications; and reducing the workload for the database servers.

Reducing the number of queries reduces the latency of a web page. A web page is not completely loaded until all queries are executed. And running a query involves sending a text over a network, and waiting for the results.

But also remember that the DBMS is usually your application’s bottleneck, and we shouldn’t increase its workload without a reason. While simple queries are fast, they always involve a certain amount of work that we tend to forget: listening to data sent by clients, parsing the SQL, mapping the table/column names to actual data structures, checking permissions, finding the best execution strategy… and more.

LAST_INSERT_ID()

In relational databases, the following situation occurs quite often: we insert a new row into a table, we let the DBMS generate the id automatically, and then we need to know that id to insert another row into a linked table. For example we create a row in the order table, and than we create a row in ordered_product – which will contain the order_id.

MySQL has the LAST_INSERT_ID() function, which returns the last id generated by the current session. This allows to insert rows in this way:

INSERT INTO order (id, user_id, date)
    VALUES (DEFAULT, 24, DATE());
INSERT INTO ordered_product (order_id, product_id)
    VALUES (LAST_INSERT_ID(), 12);

Inserting multiple rows

When we insert several rows into the same table, there is no need to run a separate INSERT statement for each row. With MySQL, it is possible to insert many rows with a single statement:

INSERT INTO usert (first_name, last_name) VALUES
    ('Emma', 'Goldman'),
    ('Ada', 'Lovelace'),
    ('John', 'Doe');

Duplicate rows

Sometimes we want to insert a row that could be already there. In this context, “already there” means that trying to insert it will violate a UNIQUE constraint or the primary key uniqueness. If the row is already there, depending on the situation, we may want to:

  1. Receive an error/warning and insert nothing;
  2. Replace the old row or some of the columns.

The most commonly used solution is to run a SELECT to check if the row is present, and then run an INSERT if the row is not found; otherwise, run nothing or an UPDATE, or even a DELETE + INSERT. For all these cases, I propose a better solution.

If you don’t want to insert anything in case the row is already there, you can just run the INSERT and check if it returned an error. If that’s the case, the transaction will not roll back automatically.

If you are running a multi-row INSERT, you may want to insert the rows that are not duplicate. To do so, use INSERT IGNORE. For each row that cannot be inserted, you will receive a warning, not an error, and non-duplicate rows will still be inserted. If you receive an error, you know that the whole statement failed for some other reason and nothing was written.

To entirely replace a row, the simplest solution is to use REPLACE. I don’t suggest to do so. If you want to, be sure to understand what happens when you have foreign keys or triggers – and if you don’t have them now, ask yourself what will happen if they are added later. Even AUTO_INCREMENT behaviour with REPLACE is buggy (pardon, I mean: is interesting).

Instead, we can use INSERT ... ON DUPLICATE KEY UPDATE. This allows to replace all columns (maybe setting some of them to DEFAULT()) or some of them.

Modifying a value

Sometimes we know that a value is part of a certain row, and we need to modify it. Common examples are increasing a counter by one when a page is visited, or decreasing the quantity of products in stack. A less common example is modifying a string, by adding a suffix.

In such cases, the most common solution is to read a value from the database, modifying it in the application, and then writing it. So we have a SELECT followed by an UPDATE.

Normally, you can achieve the same result in this way:

UPDATE product SET qty = qty - 1 WHERE id = 24;

(For string concatenation, check an older article: How to concatenate strings in MySQL and MariaDB)

But what if we also need to know the value? Well, we can use variables:

UPDATE product
    SET qty = (@qty := qty - 1)
    WHERE id = 24;

Unfortunately, while this still works in MySQL 8.0, we will get this error:

mysql> SHOW WARNINGS G
*************************** 1. row ***************************
  Level: Warning
   Code: 1287
Message: Setting user variables within expressions is deprecated and will be removed in a future release. Consider alternatives: 'SET variable=expression, ...', or 'SELECT expression(s) INTO variables(s)'.
1 row in set (0.00 sec)

They did this change because variables are sometimes set in a SELECT as a dirty workaround because window functions were historically not supported in MySQL. However MySQL supports them, and this hack is not needed anymore.

Except that this particular case has nothing to do with windowing functions. The solution proposed in the error message doesn’t help here.

Multi-table DELETE and UPDATE

Suppose you want to delete or modify some posts, and all comments associated to them. You don’t need to run DELETE or UPDATE on two tables separately. MySQL supports multi-table DELETE and multi-table UPDATE instead, which are somewhat similar to JOINs.

I recommend to check the documentation pages I linked. Using subqueries (UPDATE ... WHERE if IN (SELECT ...)) may be more intuitive, but the optimiser will not be able to choose a proper query plan.

Autocommit

In InnoDB (MySQL storage engine) nothing happens outside a transaction. It doesn’t really matter if we start and commit a transaction implicitly or not. Therefore the following pieces of code are equivalent:

SET autocommit := 1;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);

START TRANSACTION;
INSERT INTO t (a, b, c) VALUES (1, 2, 3);
COMMIT;

Autocommit is ON by default. So, unless we changed it explicitly, we can just run:

INSERT INTO t (a, b, c) VALUES (1, 2, 3);

However, I saw ORMs enclosing single-statement transactions in START TRANSACTION and COMMIT statements. Try to avoid that.

AND CHAIN, AND RELEASE

When we complete a transaction, successfully or not, we probably want to start another or to disconnect from MySQL.

To start another transaction, we could use this syntax:

COMMIT AND CHAIN;
ROLLBACK AND CHAIN;

No lock or snapshot is taken until we write or read some data, so there is no drawback in using this feature. We simply avoid to run one more START TRANSACTION.

To close the connection when the transaction ends:

COMMIT AND RELEASE;
ROLLBACK AND RELEASE;

This doesn’t avoid to run another SQL instruction. But it shortens the time we stay connected to MySQL or the proxy.

If this is a web page or application that will reconnect soon to MySQL or a proxy, it may be better not to disconnect at all. In any case, if we use a proxy, it probably shouldn’t disconnect from MySQL.

You may want to make the or the default behavior, for the current session or globally, using the completion_type variable. This would be ok for a database abstraction library which handles the transactions, but I don’t suggest to use it in other cases, because this is not a well-known feature, and it would confuse developers.

Stored procedures and functions

I’ve already mentioned why stored programs can potentially be a huge performance optimisation, in the article 3 good reasons to use Stored Procedures. Let me summarise those reasons again:

  • Less statements sent from the client to the server (one CALL statement instead of several queries);
  • Less results sent from the server to the client (intermediate results are handled inside the stored procedure);
  • Less locks – the key here is to understand that locking problems remain, but at least they should be held for a short time.

I want to remind you about a particular type of stored programs: stored aggregate functions, currently supported in MariaDB but not MySQL. See my article MariaDB Stored Aggregate Functions by examples. With aggregate functions, you may loop on a huge amount of rows and return a single value, saving a lot of client-server traffic.

DO

The DO statement is an almost unknown MySQL syntax. It’s purpose is to invoke a function without returning any result to the client. I believe that it is worth using it in some situations:

  • When invoking a stored function which writes some data and returns a potentially big resultset, but sometimes we have no use for the resultset. If we are doing this, however, we are probably using a function to do something that should be done by a procedure.
  • When we run functions or queries because of their side effects inside a stored procedure, and we don’t want to send multiple resultsets to the client.

INSERT SELECT and CREATE TABLE SELECT

We have two useful statements to copy rows between tables:

  • INSERT SELECT copies the result of a SELECT to an existing table;
  • CREATE TABLE This will reduce server resource usageSELECT copies it to a new table.

Typically, these are one time operations, so using these statements is convenient but it’s not a relevant performance optimisation. However sometimes they are used in cron jobs, for example to automate the archival of old rows, moving them from current tables into compressed or remote tables.

DELETE / INSERT RETURNING

MariaDB implemented the DELETE RETURNING statement, and more recently (version 10.5) INSERT RETURNING and REPLACE RETURNING. They address the following use cases:

  • We want to read details about a row before deleting it.
  • We want to insert a row and then use it in the code (this is something that ORMs commonly do).

Here’s a DELETE RETURNING example. INSERT RETURNING is very similar.

DELETE FROM user
    WHERE id = 24
    RETURNING email;

For more information, see MariaDB RETURNING statements.

SQL_CALC_FOUND_ROWS

In a search page, the results are often paged (for example a website could display 10 results per page, just like Google). It is not an uncommon practice to show the total number of results found.

Let me say that, in most cases, I consider that as a bad practice – because the query to show that number can be horribly slow. Also, it is a common practice to use ORDER BY and LIMIT to page the results. This is also a bad practice.

But if you are doing it, there’s at least something that you can do to improve the search page performance without a big effort. Instead of running a separate query with COUNT(*) to get the number of total results, just add SQL_CALC_FOUND_ROWS to your query and then run a SELECT FOUND_ROWS().

SQL_CALC_FOUND_ROWS and FOUND_ROWS()are deprecated in MySQL 8 and will be removed in a future version of MySQL. As a consequence, I can only recommend them on MariaDB – keeping in mind that they are just a mitigation of a bad practice.

Side note: bug #101325 asked not to remove SQL_CALC_FOUND_ROWS, but Oracle censored it.

SET and SHOW

This is not a hint about something you should do; it is more about what you should avoid. ORMs tend to abuse of statements like SET and SHOW.

SET is generally ran when a session starts, or even before each query. It’s used to set some session variables to be sure, for example, that MySQL will expect the correct character set to be used. However, this is only necessary if the MySQL configuration needs some corrections. If it is the case, it’s better to fix it. Then, the ORM should be instructed not to run SET statements, and just assume that the configuration is correct.

That said, if you want to use SET, keep in mind that you can set more than one variable with the same statement:

SET @my_var    := 1,
    @your_var  := 2,
    @her_var   := 3;

SHOW statements (SHOW TABLES, SHOW COLUMNS, …) are used by some ORMs to retrieve the remote table definitions from the database. If this is done for every new session, this operation consume resources. The ORM should already know the tables structures instead.

I remember that in a company where I have worked, PMM per-statement graphs shown that SHOW CREATE TABLE was ran up to 2K times per second when worker threads started. No one had idea that this happened, because it was done by the ORM transparently. However, this should be avoided.

See also

Related courses:

Conclusions

When it comes to optimising an application performance, what we need to do is to find the slow queries and fix them or build proper indexes. But sensibly reducing the number of executed queries and communication between the application and the database also helps. This will reduce server resource usage, in particular CPU and network.

The above mentioned SQL statements are part of the SQL language supported by MySQL and MariaDB, so there is no reason not to use them. ORM authors should implement them, and, until they do so, developers should consider writing at least part of their queries without using an ORM.

Federico Razzoli

Did you like this article?

Picture credit

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Validating rows with CHECK constraints in MariaDB

Validating rows with CHECK constraints in MariaDB

Relational databases provide several ways to validate data. CHECK constraints are a powerful tool for in-database data validation. Their impact on performance is minimal, if any. In this article we'll discuss MariaDB support for CHECK constraints. Note that the CHECK...

MariaDB ColumnStore SQL limitations

MariaDB ColumnStore SQL limitations

MariaDB ColumnStore is an extremely fast and scalable solution for analytics built on MariaDB, which Vettabase supports. MariaDB ColumnStore inherits the MariaDB SQL dialect, and many MariaDB features. However, numerous MariaDB features are not available for the...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *