In SQL, using the LIKE
operator is a powerful way to find strings that match a certain pattern. It’s suitable for most use cases, thanks to its two wildcard characters:
_
means any one character.%
means any sequence of zero or more characters.
However, many queries out there are slow because they use LIKE
operations that can’t take advantage of an index. This article discusses how to optimise a LIKE
operation.
Searching the beginning of a text
There are two ways to use an index for searching:
- To search for a single value (
WHERE id = 24
). - To search for a range of values (a closed range:
WHERE id BETWEEN 24 AND 42
, or an open range:WHERE id < 24
).
When we use LIKE
to search the beginning of a text, in practice we perform a range search. In fact, these two queries are equivalent:
SELECT ... FROM ... WHERE name LIKE 'B%';
SELECT ... FROM ... WHERE name >= 'B' AND name < 'C';
An index can often be used to find the beginning of a text with LIKE
.
Sometimes a LIKE
search checks the beginning of a text but contains more wildcard characters. For example: name LIKE 'A%B%C'
. They still use indexes, but keep reading to see if you can optimise them better.
Most queries are more complex than these examples, so they may combine several filters, an ORDER BY
and a GROUP BY
. To make sure that a complex query can use an index, you need to understand how indexes work.
Searching the middle of a text
A LIKE
operation that searches the middle of a text cannot use an index. But let’s see if you can still do something to optimise your query.
Search the beginning of the text when possible
This may look like a dumb hint, but it isn’t. In the real world, I’ve seen many queries performing a '%middle of text%'
search when they could have performed a 'beginning%of%text%'
search. Probably the developers didn’t take the time to verify if the beginning of text version was enough, or maybe they just made a mistake. Whatever the reason is, it’s a good idea to double-check this kind of searches.
Using the single-character wildcard
I have already mentioned that LIKE
has two wildcard characters: _ and %
.
Use _
or a sequence of _
rather than %
, whenever possible. This will reduce the amount of characters to search. Take the following examples:
SELECT ... FROM ... WHERE code LIKE '%x%';
SELECT ... FROM ... WHERE code LIKE '__x%';
Let’s suppose that our code has 20 characters. The first version of the query will search all of the characters. The second version will only search the third character. Even if both queries can’t use indexes, the second will cause less work, so it will be faster.
Merging searches
Sometimes you need to search for multiple strings in the middle of a text, but you know in which order they appear if they are present. You should combine them into one search to make a query faster.
Let’s take the following query as an example:
SELECT ... FROM ... WHERE code LIKE '%XX%' AND code LIKE '%YY%';
If you know that “XX” cannot appear after “YY”, you can optimise the query in this way:
SELECT ... FROM ... WHERE code LIKE '%XX%YY%';
This may be an optimisation because the search for “YY” will only be performed after the first appearance of “XX”, and won’t happen at all if “XX” is not present.
You can optimise the query even more if you know how many characters would separate the two strings, as an exact number or as a minimum:
-- exactly 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__YY%';
-- at least 3 characters in the middle:
SELECT ... FROM ... WHERE code LIKE '%XX__%YY%';
SIMILAR TO
Some DBMSs, like PostgreSQL, support the SIMILAR TO
syntax, which is more flexible than LIKE
but less powerful than REGEXP
. Some argue that SIMILAR TO
is useless, but we’re going to use it here for simplicity – nothing prevents you from using a regular expression, if you prefer to do so.
SIMILAR TO
can occasionally be used to combine two LIKE
conditions. See the example below:
SELECT ... FROM ... WHERE code LIKE '%XX%' OR code LIKE '%YY%';
SELECT ... FROM ... WHERE code LIKE '%(XX|YY)%';
Flags
Sometimes we use LIKE
to search for text (probably codes) containing a certain flag. In this case, there is usually a small number of possible flags we may need to look for. This is very different from a LIKE
search based on whatever our application users may type.
If the number of important flags is small (I’d say 3, just to give you a number), you may consider creating boolean columns that are TRUE
if the flag is present, and FALSE
otherwise. You may use generated columns to enforce the boolean value correctness:
ALTER TABLE table_name
ADD COLUMN is_tkk BOOLEAN
AS (product_code LIKE '%tkk%')
STORED;
This expression will be calculated when a row is INSERT
ed or the product_code
column is UPDATE
d.
GIN indexes
I mentioned that the problem with LIKE
searches in any position of the text is that no index can be used, even if a suitable index exists. Actually, it’s only true for Btree indexes, because they are ordered data structures. Now, think about another ordered data structure familiar to everyone: a book index. You can quickly find the topic “Belgium”, or all topics starting with B, or with Be. But if you want to find all topics containing “giu”, you’ll have to search the whole index.
PostgreSQL supports many index types, and one of these types is GIN. An explanation of GIN indexes would be out of the scope of this article, but let’s just say that they are suitable for searches in the middle of a text.
CREATE INDEX idx_gin_code ON code USING gin (code);
Searching the end of a text
A LIKE
operation that searches the end of a text cannot use an index. However, there a workaround.
If we reverse both the text in the column and the LIKE
pattern, we’ll have a search at the beginning of the text. Here’s the example:
SELECT ... FROM ... WHERE code LIKE '%abc';
-- will become:
SELECT ... FROM ... WHERE code_rev LIKE REVERSE('%abc');
code_rev can either be populated by the application, or it can be a generated column. See the example below:
ALTER TABLE table_name
ADD COLUMN code_rev VARCHAR(200)
AS (REVERSE(code))
STORED,
ADD INDEX idx_code_rev (code_rev);
If the codes aren’t short, we should ask ourselves if we need to have the whole reversed text in code_rev
. To speed up writes and reduce the index size, we can reverse only the first N
characters instead like this: REVERSE(LEFT(code, 10))
.
Optimising regular expressions with LIKE
It’s worth mentioning that LIKE
can also be added to a query with REGEXP
, adding a redundant condition, to allow the DBMS to use an index. I called this technique the regexp + LIKE optimisation.
Conclusions
We discussed how to optimise a LIKE condition to make sure that it can take advantage of an index. To learn more about query optimisation, you may consider attending our MariaDB Query Optimisation or MySQL Query Optimisation training courses.
Federico Razzoli
0 Comments