Hints to optimise queries with a LIKE comparison

Last updated on 17 Settembre 2022

In SQL, using the LIKE operator is a powerful way to find strings that match a certain pattern. It’s suitable for most use cases, thanks to its two jolly characters:

  • _ means any one character.
  • % means any sequence of zero or more characters.

However, many queries out there are slow because they use LIKE operations that can’t take advantage of an index. This article discusses how to optimise a LIKE operation.

Hints to optimise queries with a LIKE comparison, by Federico Razzoli

Searching the beginning of a text

There are two ways to use an index for searching:

  • To search for a single value ( WHERE id = 24 ).
  • To search for a range of values (a closed range: WHERE id BETWEEN 24 AND 42 , or an open range: WHERE id < 24 ).

When we use LIKE to search the beginning of a text, in practice we perform a range search. In fact, these two queries are equivalent:

SELECT ... FROM ... WHERE name LIKE 'B%';
SELECT ... FROM ... WHERE name >= 'B' AND name < 'C';

An index can often be used to find the beginning of a text with LIKE .

Sometimes a LIKE search checks the beginning of a text but contains more jolly characters. For example: name LIKE 'A%B%C' . They still use indexes, but keep reading to see if you can optimise them better.

Most queries are more complex than these examples, so they may combine several filters, an ORDER BY and a GROUP BY . To make sure that a complex query can use an index, you need to understand how indexes work.

Searching the middle of a text

A LIKE operation that searches the middle of a text cannot use an index. But let’s see if you can still do something to optimise your query.

Search the beginning of the text when possible

This may look like a dumb hint, but it isn’t. In the real world, I’ve seen many queries performing a '%middle of text%' search when they could have performed a 'beginning%of%text%' search. Probably the developers didn’t take the time to verify if the beginning of text version was enough, or maybe they just made a mistake. Whatever the reason is, it’s a good idea to double-check this kind of searches.

Using the single-character jolly

I have already mentioned that LIKE has two jolly characters: _ and % .

Use _ or a sequence of _ rather than % , whenever possible. This will reduce the amount of characters to search. Take the following examples:

SELECT ... FROM ... WHERE code LIKE '%x%';
SELECT ... FROM ... WHERE code LIKE '__x%';

Let’s suppose that our code has 20 characters. The first version of the query will search all of the characters. The second version will only search the third character. Even if both queries can’t use indexes, the second will cause less work, so it will be faster.

Merging searches

Sometimes you need to search for multiple strings in the middle of a text, but you know in which order they appear if they are present. You should combine them into one search to make a query faster.

Let’s take the following query as an example:

SELECT ... FROM ... WHERE code LIKE '%XX%' AND code LIKE '%YY%';

If you know that “XX” cannot appear after “YY”, you can optimise the query in this way:

SELECT ... FROM ... WHERE code LIKE '%XX%YY%';

This may be an optimisation because the search for “YY” will only be performed after the first appearance of “XX”, and won’t happen at all if “XX” is not present.

You can optimise the query even more if you know how many characters would separate the two strings, as an exact number or as a minimum:

-- exactly 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__YY%';
-- at least 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__%YY%';

SIMILAR TO

Some DBMSs, like PostgreSQL, support the SIMILAR TO syntax, which is more flexible than LIKE but less powerful than REGEXP . Some argue that SIMILAR TO is useless, but we’re going to use it here for simplicity – nothing prevents you from using a regular expression, if you prefer to do so.

SIMILAR TO can occasionally be used to combine two LIKE conditions. See the example below:

SELECT ... FROM ... WHERE code LIKE '%XX%' OR code LIKE '%YY%';
SELECT ... FROM ... WHERE code LIKE '%(XX|YY)%';

Flags

Sometimes we use LIKE to search for text (probably codes) containing a certain flag. In this case, there is usually a small number of possible flags we may need to look for. This is very different from a LIKE search based on whatever our application users may type.

If the number of important flags is small (I’d say 3, just to give you a number), you may consider creating boolean columns that are TRUE if the flag is present, and FALSE otherwise. You may use generated columns to enforce the boolean value correctness:

ALTER TABLE table_name
ADD COLUMN is_tkk BOOLEAN
AS (product_code LIKE '%tkk%')
STORED;

This expression will be calculated when a row is INSERT ed or the product_code column is UPDATE d.

GIN indexes

I mentioned that the problem with LIKE searches in any position of the text is that no index can be used, even if a suitable index exists. Actually, it’s only true for Btree indexes, because they are ordered data structures. Now, think about another ordered data structure familiar to everyone: a book index. You can quickly find the topic “Belgium”, or all topics starting with B, or with Be. But if you want to find all topics containing “giu”, you’ll have to search the whole index.

PostgreSQL supports many index types, and one of these types is GIN. An explanation of GIN indexes would be out of the scope of this article, but let’s just say that they are suitable for searches in the middle of a text.

CREATE INDEX idx_gin_code ON code USING gin (code);

Searching the end of a text

A LIKE operation that searches the end of a text cannot use an index. However, there a workaround.

If we reverse both the text in the column and the LIKE pattern, we’ll have a search at the beginning of the text. Here’s the example:

SELECT ... FROM ... WHERE code LIKE '%abc';
-- will become:
SELECT ... FROM ... WHERE code_rev LIKE REVERSE('%abc');

code_rev can either be populated by the application, or it can be a generated column. See the example below:

ALTER TABLE table_name
ADD COLUMN code_rev VARCHAR(200)
AS (REVERSE(code))
STORED,
ADD INDEX idx_code_rev (code_rev);

If the codes aren’t short, we should ask ourselves if we need to have the whole reversed text in code_rev . To speed up writes and reduce the index size, we can reverse only the first N characters instead like this: REVERSE(LEFT(code, 10)) .

Optimising regular expressions with LIKE

It’s worth mentioning that LIKE can also be added to a query with REGEXP , adding a redundant condition, to allow the DBMS to use an index. I called this technique the regexp + LIKE optimisation.

Conclusions

We discussed how to optimise a LIKE condition to make sure that it can take advantage of an index. To learn more about query optimisation, you may consider attending our MariaDB Query Optimisation or MySQL Query Optimisation training courses.

Federico Razzoli

A proposito di Federico Razzoli

Federico is Vettabase Ltd founder, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato.

*