Schedule Meeting

a

Hints to optimise queries with a LIKE comparison

by | Sep 15, 2022 | MariaDB, MySQL, PostgreSQL

Need Help?  Click Here for Expert Support

In SQL, using the LIKE operator is a powerful way to find strings that match a certain pattern. It’s suitable for most use cases, thanks to its two wildcard characters:

  • _ means any one character.
  • % means any sequence of zero or more characters.

However, many queries out there are slow because they use LIKE operations that can’t take advantage of an index. This article discusses how to optimise a LIKE operation.

Searching the beginning of a text

There are two ways to use an index for searching:

  • To search for a single value (WHERE id = 24).
  • To search for a range of values (a closed range: WHERE id BETWEEN 24 AND 42, or an open range: WHERE id < 24).

When we use LIKE to search the beginning of a text, in practice we perform a range search. In fact, these two queries are equivalent:

SELECT ... FROM ... WHERE name LIKE 'B%';
SELECT ... FROM ... WHERE name >= 'B' AND name < 'C';

An index can often be used to find the beginning of a text with LIKE.

Sometimes a LIKE search checks the beginning of a text but contains more wildcard characters. For example: name LIKE 'A%B%C'. They still use indexes, but keep reading to see if you can optimise them better.

Most queries are more complex than these examples, so they may combine several filters, an ORDER BY and a GROUP BY. To make sure that a complex query can use an index, you need to understand how indexes work.

Searching the middle of a text

A LIKE operation that searches the middle of a text cannot use an index. But let’s see if you can still do something to optimise your query.

Search the beginning of the text when possible

This may look like a dumb hint, but it isn’t. In the real world, I’ve seen many queries performing a '%middle of text%' search when they could have performed a 'beginning%of%text%' search. Probably the developers didn’t take the time to verify if the beginning of text version was enough, or maybe they just made a mistake. Whatever the reason is, it’s a good idea to double-check this kind of searches.

Using the single-character wildcard

I have already mentioned that LIKE has two wildcard characters: _ and %.

Use _ or a sequence of _ rather than %, whenever possible. This will reduce the amount of characters to search. Take the following examples:

SELECT ... FROM ... WHERE code LIKE '%x%';
SELECT ... FROM ... WHERE code LIKE '__x%';

Let’s suppose that our code has 20 characters. The first version of the query will search all of the characters. The second version will only search the third character. Even if both queries can’t use indexes, the second will cause less work, so it will be faster.

Merging searches

Sometimes you need to search for multiple strings in the middle of a text, but you know in which order they appear if they are present. You should combine them into one search to make a query faster.

Let’s take the following query as an example:

SELECT ... FROM ... WHERE code LIKE '%XX%' AND code LIKE '%YY%';

If you know that “XX” cannot appear after “YY”, you can optimise the query in this way:

SELECT ... FROM ... WHERE code LIKE '%XX%YY%';

This may be an optimisation because the search for “YY” will only be performed after the first appearance of “XX”, and won’t happen at all if “XX” is not present.

You can optimise the query even more if you know how many characters would separate the two strings, as an exact number or as a minimum:

-- exactly 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__YY%';
-- at least 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__%YY%';

SIMILAR TO

Some DBMSs, like PostgreSQL, support the SIMILAR TO syntax, which is more flexible than LIKE but less powerful than REGEXP. Some argue that SIMILAR TO is useless, but we’re going to use it here for simplicity – nothing prevents you from using a regular expression, if you prefer to do so.

SIMILAR TO can occasionally be used to combine two LIKE conditions. See the example below:

SELECT ... FROM ... WHERE code LIKE '%XX%' OR code LIKE '%YY%';
SELECT ... FROM ... WHERE code LIKE '%(XX|YY)%';

Flags

Sometimes we use LIKE to search for text (probably codes) containing a certain flag. In this case, there is usually a small number of possible flags we may need to look for. This is very different from a LIKE search based on whatever our application users may type.

If the number of important flags is small (I’d say 3, just to give you a number), you may consider creating boolean columns that are TRUE if the flag is present, and FALSE otherwise. You may use generated columns to enforce the boolean value correctness:

ALTER TABLE table_name
    ADD COLUMN is_tkk BOOLEAN
    AS (product_code LIKE '%tkk%')
    STORED;

This expression will be calculated when a row is INSERTed or the product_code column is UPDATEd.

GIN indexes

I mentioned that the problem with LIKE searches in any position of the text is that no index can be used, even if a suitable index exists. Actually, it’s only true for Btree indexes, because they are ordered data structures. Now, think about another ordered data structure familiar to everyone: a book index. You can quickly find the topic “Belgium”, or all topics starting with B, or with Be. But if you want to find all topics containing “giu”, you’ll have to search the whole index.

PostgreSQL supports many index types, and one of these types is GIN. An explanation of GIN indexes would be out of the scope of this article, but let’s just say that they are suitable for searches in the middle of a text.

CREATE INDEX idx_gin_code ON code USING gin (code);

Searching the end of a text

A LIKE operation that searches the end of a text cannot use an index. However, there a workaround.

If we reverse both the text in the column and the LIKE pattern, we’ll have a search at the beginning of the text. Here’s the example:

SELECT ... FROM ... WHERE code LIKE '%abc';
-- will become:
SELECT ... FROM ... WHERE code_rev LIKE REVERSE('%abc');

code_rev can either be populated by the application, or it can be a generated column. See the example below:

ALTER TABLE table_name
    ADD COLUMN code_rev VARCHAR(200)
        AS (REVERSE(code))
        STORED,
    ADD INDEX idx_code_rev (code_rev);

If the codes aren’t short, we should ask ourselves if we need to have the whole reversed text in code_rev. To speed up writes and reduce the index size, we can reverse only the first N characters instead like this: REVERSE(LEFT(code, 10)).

Optimising regular expressions with LIKE

It’s worth mentioning that LIKE can also be added to a query with REGEXP, adding a redundant condition, to allow the DBMS to use an index. I called this technique the regexp + LIKE optimisation.

Conclusions

We discussed how to optimise a LIKE condition to make sure that it can take advantage of an index. To learn more about query optimisation, you may consider attending our MariaDB Query Optimisation or MySQL Query Optimisation training courses.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

The Mystery of ProxySQL Galera Writer Switchover!

The Mystery of ProxySQL Galera Writer Switchover!

Writer switchover issues with ProxySQL and Galera can feel like an unsolved puzzle. Recently, I encountered two strange behaviors that revealed underlying issues in how ProxySQL interacts with Galera clusters. In this post, I’ll walk through the unexpected behaviors I...

Enforcing strong passwords for MariaDB users

Enforcing strong passwords for MariaDB users

MariaDB users normally connect using a password. Weak passwords are a common security problem, especially when passwords are generated by humans. However, MariaDB comes with plugins that help validating passwords to make sure they are strong enough. This article is a...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *