Schedule Meeting

a

Query Optimisation: Using indexes for WHERE with multiple conditions

by | Jun 5, 2025 | PostgreSQL, MariaDB, MySQL

Need Help?  Click Here for Expert Support

You have a beautiful application, but a page is slow. After some investigation, you find out that a query is unexpectedly slow, ruining user experience and causing frustration for you. It is a simple query with a WHERE clause and nothing else. You tried to build an index with all the columns mentioned in the query, but nothing changed. The EXPLAIN command shows that the index is not used, which seems to you inexplicable. Why is this happening? This article will probably help you find out the reason!

Query optimisation is not rocket science, but it is not always intuitive. There are some principles that one should know, concerning for example the order of columns in an index, how the SQL operators can be mixed, and more. Here I’ll explain how to write a complex (ish) WHERE clause that uses the right index.

Index Access Types

An index is an ordered data structure. Keep this in mind, because it’s the key to understand almost all query optimisation techniques. Yes, an index is usually a tree (at least in traditional databases or storage engines), but what you need to know for now is that it’s ordered.

That being said, it is intuitive that a database engine has three ways to access an index:

  • Point search;
  • Range scan;
  • Index scan.

A point search is essentially what you do when you look for a word in a dictionary (unless it’s an online dictionary!). You open it, you read a word, you decide whether your search should continue backward or forward, and you repeat the operation until you search what you’re looking for. Or, in less common cases, until you find out that the word you’re looking for is not in the dictionary. In computer science, the binary search algorithm does this exact operation. Just in a more precise and mathematically defined way.

A point search in an index is performed, for example, when we run a query like this:

SELECT id FROM starship WHERE serial = 'NCC-1701';

A range scan starts with a point search. Then, the next index entries are read until the end of the range is found. A range scan can be performed in any direction: backward or forward.

For example, suppose we have a table with an index on the (surname, name) columns, in this order. A range scan on an index is performed to run a query like this:

SELECT id FROM person WHERE surname = 'Baker' AND name BETWEEN 'Colin' AND 'Tom';

The database will look for Colin Baker in the index (or for the first entry where name > 'Colin'), it will scan the index forward, and it will stop when it finds Tom Baker (or the first entry where name > 'Tom'). A backward scan from Tom to Colin might also be used. But in some databases a backward scan is slightly slower than a forward scan, so it tends to only happen with ORDER BY surname DESC, name DESC.

A range can also be open-ended:

SELECT id FROM person WHERE surname < 'Coltrane';
SELECT id FROM person WHERE surname > 'Davis';

This doesn’t change much. They’re usually performed as a scan from the beginning of the index to 'Coltrane', and from 'Davis' to the end of the index. The order can be reversed. In InnoDB indexes, the beginning and the end of an index are physical records called Infimum and Supremum.

An index scan is the least optimal way to use an index. It often means that all rows are read. Reading the rows from the index instead of reading them from a table is a micro-optimisation that will hardly make a query fast. There are, however, two important exceptions:

  • Maybe the index needs to be completely scanned, but it still helps because rows need be read in a certain order. This happens with ORDER BY and GROUP BY.
  • If a LIMIT clause or equivalent syntax is used, only a certain number of rows are read. For example, with ORDER BY id LIMIT 20, you’ll never read more than 20 rows. However, ORDER BY id LIMIT 20 OFFSET 1000 will read 1020 rows.

Order of Columns

I can’t stress enough that an index is an ordered data structure. As a consequence, the order of its columns matter. An index on (surname, name) is ordered by surname, and rows with the same surname are ordered by name. An index on (name, surname) is exactly the opposite. Sometimes it makes sense to have both, as they can serve different queries.

But why does this matter? Well, consider the following queries.

SELECT id FROM author WHERE surname = 'Asimov';

This query can benefit from an index that starts by surname. It can’t benefit from an index that starts by any other column, even if it contains the surname column, because 'Asimov' might be anywhere in the index.

SELECT id FROM author WHERE surname = 'Asimov' AND name = 'Isaac';

This query can benefit from indexes that starts with (surname, name, ...), as well as indexes that start with (name, surname, ...).

SELECT id FROM person WHERE surname = 'Parker' AND (name IN ('Charlie', 'Peter') OR name IS NULL);

This query can benefit from the same indexes as the previous one.

But let’s complicate things a little bit. Suppose we have an gender column, and an index on (surname, name, gender). Can this index be used to speed up the following query?

SELECT id FROM person WHERE surname = 'Izzard' AND gender = 'F';

Yes and no. The index will be used to find entries that match the first condition: WHERE surname = 'Izzard'. But it won’t be used for the second condition. For each index entry that matches the first condition, the database engine will have to do a lookup in the table, to check the value of the gender column.

Why? Because the index is on (surname, name, gender). Both surname and gender should theoretically be used, but there is a “hole” between them, which makes it impossible to use gender. I suggest you remember this rule of thumb:

A database engine can use a whole index, or its leftmost column(s).

Back to our example, some rows that don’t match the WHERE clause will be read from the table, but not all of them. Will this damage the query performance remarkably? It depends on data distribution. If the non-matching rows that are read from the table are many, then yes, the query might be unnecessarily slow and we should build an index on (surname, gender) to fix the problem. But if only a few non-matching rows will be read from the table, the negative effect on performance might be irrelevant, or even impossible to measure. In this case, don’t build an index, as it’s better avoid having too many indexes on a table.

Covering Indexes

I should mention here covering and non-covering indexes. An index is considered a covering index for a certain query if the query can be performed by reading exclusively the index.

Suppose we have this query:

SELECT name, surname FROM person WHERE surname = 'Picard';

If we have an index on (surname, name), that is a covering index for this query. The following indexes are not covering indexes for this query, because the query can’t use them, or it can use them but it also needs to read the table:

  • (email);
  • (surname);
  • (name, surname).

What if we have an index on (surname, gender, name)? Yes, it’s still a covering index! Because the surname column appears first, and name is contained in the index. There is a hole between them, but it doesn’t matter, because name doesn’t need to be used for filtering or sorting, it only needs to be returned to the client.

Covering Indexes and Primary Keys

Consider this query:

SELECT surname, id FROM person WHERE surname = 'Picard';

You might think that an index on surname only is not a covering index, because the query also needs to read id. But id is the primary key, so the index might be covering or not, depending which technology we’re using.

With some databases and storage engines, like InnoDB, the primary key columns are appended to each secondary index. The table itself starts with the primary key columns and is ordered by the primary key. Primary key values are a reference used to match a secondary index entries to the table rows.

With most technologies, this is not the case. Some databases allow us to cluster a table, meaning that the table is ordered by the primary key (or any other index specified by the user). But this functionality usually has important limitations. For example, PostgreSQL allows to change the table order with the CLUSTER command, but the order won’t be maintained over time, and the clustering index values won’t be appended to other indexes. If you use another database engine, consult its documentation to know whether tables are organised by primary keys and whether some form of clustered indexes are supported.

What’s wrong with non-covering indexes?

A query is faster if it uses a covering index. But this is usually not so important for query optimisation. However, remember:

In some edge cases, non-covering index can damage a query speed instead of improving it.

The reason was explained in this article: When indexes make SQL queries slower. I won’t repeat the explanation here, so please check that article.

Logical Operators

We’ve already seen some queries with the AND operator:

SELECT id FROM author WHERE surname = 'Asimov' AND name = 'Isaac';
SELECT id FROM person WHERE surname = 'Parker' AND (name IN ('Charlie', 'Peter') OR name IS NULL);

AND is database-friendly, because:

  • It’s possible to use an index for all the conditions concatenated by AND;
  • If only some conditions concatenated by AND can use an index, the query is at least partly optimised.

OR is also friendly when it concatenates conditions on the same columns. Consider these examples:

SELECT id FROM author WHERE surname = 'Baker' OR surname = 'Parker';
SELECT id FROM author WHERE (surname = 'Baker' AND name = 'Tom') OR (surname = 'Parker' AND name = 'Charlie');

If we have an index on (surname, name) both these queries will result into two searches instead of one, but I expect them to be fast.

A similar case:

SELECT id FROM person WHERE surname = 'Baker' AND name IN 'Colin', 'Tom';

IN is just a fancy syntax for OR: we might also write it as name = 'Colin' OR name = 'Tom'. This query is still optimal: first the engine will search for the Baker’s, then, in this range, it will look for the Colin’s and the Tom’s.

However, OR is much less database-friendly when it concatenates conditions on different columns:

SELECT id FROM person WHERE name = 'James' OR surname = 'Kirk';

It’s not possible to run this query using a single index. Even if we have an index on (surname, name), it will be ignored. Sure, the engine might look for 'Kirk' using an index, but how would this help? It would still need to check all other rows to see if the name is 'James'.

However, if we have two different indexes that start with surname and name, and if those conditions are expected to be selective enough, some engines will use them. Two different searches will be made, and the results will be merged in memory.

Some engines, however, might decide not to do that. Why? Because that query plan consists in running two different complete searches. Running a full table scan might be faster. Generally, there is a way to force the engine to run the search using indexes:

(
    SELECT id FROM person WHERE name = 'James'
) UNION (
    SELECT id FROM person WHERE surname = 'Kirk'
);

Note that this query is equivalent to UNION DISTINCT, meaning that the duplicates are removed. Rows where name = 'James' AND surname = 'Kirk' will only be returned once. To remove duplicates, the DBMS will have to use an in-memory (or, in some cases, on-disk) temporary table. This has a performance cost, which can be high if many results are returned. If we don’t expect many duplicates and we can handle them on the application side, we can use UNION ALL, which won’t remove duplicates.

In both cases, we should also ask ourselves if it’s better to force the database to run two different searches, or if we should leave it free to run a full table scan. The answer, once again, depends on data distribution. But ideally, it would be better to implement an application logic that doesn’t require these types of queries.

Mixing Point Searches and Range Scans

Based on what we discussed above, the = operator is for point searches, and operators like < or <= are for range searches. Conditions that use these operators can be combined and still be able to take advantage from an index, but with some limitations.

We can combine any number of point searches, as long as we follow the rules above:

SELECT id FROM person WHERE surname = 'Baker' AND name = 'Tom' AND gender = 'M';

This query can fully benefit from an index that starts with surname, name and gender (not necessarily in this order).

But if we add a range search, things become slightly more complicated:

SELECT id FROM person WHERE surname = 'Baker' AND name > 'T';

This query would be able to use an index on (surname, name): it will find the first index entry with surname = 'Baker' AND name > 'T', and it will scan the next entries until the next surname.

But the query wouldn’t benefit from an index on (name, surname), not even partially. Sure, theoretically it could use it to find the names > 'T', but we only need the Baker’s, which would be randomly scattered across the whole index. So, no, that index is not useful in this case. If the database engine decides to use it (which is unlikely), the index will probably make the query slower, not faster.

The rule is the following:

We can check any number of conditions on the leftmost column of an index. But if we have a range condition on a column, the next column in the index cannot be used.

So, if we have an index on the columns (a, b, c), the following queries can fully benefit from it:

SELECT ... WHERE a = 1 AND b = 2 AND c = 3;
SELECT ... WHERE a = 1 AND b = 2 AND c > 3;
SELECT ... WHERE a = 1 AND b = 2 AND c = 3 AND d = 4; -- but for matching rows the table needs be checked
SELECT ... WHERE a = 1 AND b > 2;
SELECT ... WHERE a < 1;

But the following queries, the index can only partially be used:

-- only columns a, b
SELECT ... WHERE a = 1 AND b > 2 AND c = 3;
-- only column a
SELECT ... WHERE a > 1 AND b = 2 AND c = 3;
-- only column a
SELECT ... WHERE a > 1 AND b > 2;

Once again: will this make the query horribly slow? It depends on data distribution. The query will likely limit the number of rows that need be read from the table. If the number of such rows is still big, the query will be slow. If the number of rows to be searched for in the table is small, the query should be fast.

Functions and Expressions

Functions and expressions generally prevent index usage, even when theoretically they shouldn’t.

Consider these examples:

SELECT id FROM person WHERE LEFT(surname, 3) = 'Bak';
SELECT id FROM post WHERE DATE(pub_date) = '2025-01-01';
SELECT id FROM post WHERE YEAR(pub_date) = '2025';

They should theoretically use indexes because:

  • An index can be used to look for values starting by 'Bak';
  • DATE(pub_date) is the same as pub_date, provided that it’s a column of type DATE;
  • YEAR(pub_date) = '2025' is a range search.

But the database engine’s component that elaborate a strategy to run the queries, the query planner / query optimiser, does not know anything about what functions return, and does not evaluate expressions. A function is just a black box for the optimiser.

But the above queries can easily be rewritten without using functions:

SELECT id FROM person WHERE surname LIKE 'Bak%';
SELECT id FROM post WHERE pub_date = '2025-01-01';
SELECT id FROM post WHERE pub_date BETWEEN '2025-01-01' AND '2025-12-31';

In some cases, it’s not possible to get rid of functions in the SQL query. In such cases, you can still create a generated column with an index on it. This will be the topic of a future article.

Note that functions and expressions are only harmful when they wrap a column name. A function won’t damage query performance in cases like these:

SELECT ... WHERE date = NOW();
SELECT ... WHERE surname LIKE LEFT('Baker', 3);
SELECT ... WHERE id > RAND() * 100 ORDER BY id LIMIT 1;

Quick Do’s and Dont’s

I really suggest you read the whole article to get a better understanding of indexes and the causes of slowness of a WHERE clause. But it’s good to summarise the main practices the should be followed or avoided by software engineers:

  • Build indexes that can be used by as many queries as possible.
  • If you have a slow query, try to build an index that starts with columns that are mentioned in the WHERE clause.
  • Try to avoid ORs on different columns.
  • Try to avoid having more than one range condition, and aim to have index that start with columns on which you have equality conditions.
  • Because of the two points above, it’s often impossible to use an index for every condition in the WHERE clause. Try to use an index for the most selective conditions, so that only a few rows need to be read from the table.
  • Avoid passing a column to a function in the WHERE clause, if you can.

Conclusions

Hopefully, now you have a better understanding of what indexes are, and what they can (or cannot) be used for. While this article is about the WHERE clause only, and while it’s just a brief summary of the factors that affect a WHERE clause speed, we discussed topics that should help you optimise the majority of your queries.

You might still have doubts, problems that you don’t know how to solve, or special cases where it’s not clear why an index is not used. Remember that we’re here to help: talk to your boss, and suggest purchasing our services.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *