Last updated on 17 September 2022
In SQL, using the
LIKE operator is a powerful way to find strings that match a certain pattern. It’s suitable for most use cases, thanks to its two jolly characters:
_means any one character.
%means any sequence of zero or more characters.
However, many queries out there are slow because they use
LIKE operations that can’t take advantage of an index. This article discusses how to optimise a
Searching the beginning of a text
There are two ways to use an index for searching:
- To search for a single value (
WHERE id = 24).
- To search for a range of values (a closed range:
WHERE id BETWEEN 24 AND 42, or an open range:
WHERE id < 24).
When we use
LIKE to search the beginning of a text, in practice we perform a range search. In fact, these two queries are equivalent:
SELECT ... FROM ... WHERE name LIKE 'B%'; SELECT ... FROM ... WHERE name >= 'B' AND name < 'C';
An index can often be used to find the beginning of a text with
LIKE search checks the beginning of a text but contains more jolly characters. For example:
name LIKE 'A%B%C'. They still use indexes, but keep reading to see if you can optimise them better.
Most queries are more complex than these examples, so they may combine several filters, an
ORDER BY and a
GROUP BY. To make sure that a complex query can use an index, you need to understand how indexes work.
Searching the middle of a text
LIKE operation that searches the middle of a text cannot use an index. But let’s see if you can still do something to optimise your query.
Search the beginning of the text when possible
This may look like a dumb hint, but it isn’t. In the real world, I’ve seen many queries performing a
'%middle of text%' search when they could have performed a
'beginning%of%text%' search. Probably the developers didn’t take the time to verify if the beginning of text version was enough, or maybe they just made a mistake. Whatever the reason is, it’s a good idea to double-check this kind of searches.
Using the single-character jolly
I have already mentioned that
LIKE has two jolly characters: _ and
_ or a sequence of
_ rather than
%, whenever possible. This will reduce the amount of characters to search. Take the following examples:
SELECT ... FROM ... WHERE code LIKE '%x%'; SELECT ... FROM ... WHERE code LIKE '__x%';
Let’s suppose that our code has 20 characters. The first version of the query will search all of the characters. The second version will only search the third character. Even if both queries can’t use indexes, the second will cause less work, so it will be faster.
Sometimes you need to search for multiple strings in the middle of a text, but you know in which order they appear if they are present. You should combine them into one search to make a query faster.
Let’s take the following query as an example:
SELECT ... FROM ... WHERE code LIKE '%XX%' AND code LIKE '%YY%';
If you know that “XX” cannot appear after “YY”, you can optimise the query in this way:
SELECT ... FROM ... WHERE code LIKE '%XX%YY%';
This may be an optimisation because the search for “YY” will only be performed after the first appearance of “XX”, and won’t happen at all if “XX” is not present.
You can optimise the query even more if you know how many characters would separate the two strings, as an exact number or as a minimum:
-- exactly 3 characters in the middle: SELECT ... FROM ... WHERE code LIKE '%XX__YY%'; -- at least 3 characters in the middle: SELECT ... FROM ... WHERE code LIKE '%XX__%YY%';
Some DBMSs, like PostgreSQL, support the
SIMILAR TO syntax, which is more flexible than
LIKE but less powerful than
REGEXP. Some argue that
SIMILAR TO is useless, but we’re going to use it here for simplicity – nothing prevents you from using a regular expression, if you prefer to do so.
SIMILAR TO can occasionally be used to combine two
LIKE conditions. See the example below:
SELECT ... FROM ... WHERE code LIKE '%XX%' OR code LIKE '%YY%'; SELECT ... FROM ... WHERE code LIKE '%(XX|YY)%';
Sometimes we use
LIKE to search for text (probably codes) containing a certain flag. In this case, there is usually a small number of possible flags we may need to look for. This is very different from a
LIKE search based on whatever our application users may type.
If the number of important flags is small (I’d say 3, just to give you a number), you may consider creating boolean columns that are
TRUE if the flag is present, and
FALSE otherwise. You may use generated columns to enforce the boolean value correctness:
ALTER TABLE table_name ADD COLUMN is_tkk BOOLEAN AS (product_code LIKE '%tkk%') STORED;
This expression will be calculated when a row is
INSERTed or the
product_code column is
I mentioned that the problem with
LIKE searches in any position of the text is that no index can be used, even if a suitable index exists. Actually, it’s only true for Btree indexes, because they are ordered data structures. Now, think about another ordered data structure familiar to everyone: a book index. You can quickly find the topic “Belgium”, or all topics starting with B, or with Be. But if you want to find all topics containing “giu”, you’ll have to search the whole index.
PostgreSQL supports many index types, and one of these types is GIN. An explanation of GIN indexes would be out of the scope of this article, but let’s just say that they are suitable for searches in the middle of a text.
CREATE INDEX idx_gin_code ON code USING gin (code);
Searching the end of a text
LIKE operation that searches the end of a text cannot use an index. However, there a workaround.
If we reverse both the text in the column and the
LIKE pattern, we’ll have a search at the beginning of the text. Here’s the example:
SELECT ... FROM ... WHERE code LIKE '%abc'; -- will become: SELECT ... FROM ... WHERE code_rev LIKE REVERSE('%abc');
code_rev can either be populated by the application, or it can be a generated column. See the example below:
ALTER TABLE table_name ADD COLUMN code_rev VARCHAR(200) AS (REVERSE(code)) STORED, ADD INDEX idx_code_rev (code_rev);
If the codes aren’t short, we should ask ourselves if we need to have the whole reversed text in
code_rev. To speed up writes and reduce the index size, we can reverse only the first
N characters instead like this:
Optimising regular expressions with LIKE
It’s worth mentioning that
LIKE can also be added to a query with
REGEXP, adding a redundant condition, to allow the DBMS to use an index. I called this technique the regexp + LIKE optimisation.
We discussed how to optimise a LIKE condition to make sure that it can take advantage of an index. To learn more about query optimisation, you may consider attending our MariaDB Query Optimisation or MySQL Query Optimisation training courses.