Schedule Meeting

a

When indexes make SQL queries slower

by | Apr 15, 2025 | MariaDB, MySQL, PostgreSQL

Need Help?  Click Here for Expert Support

During our SQL optimisation training courses, I always stress the need of understanding how databases run SQL queries internally. Which is easier and more intuitive than you’d think, anyway. But if you overlook this aspect, there is always a risk of sticking to simple solutions, and consider them universal even if they aren’t. A typical example is… add an index on the columns used by your query. As I’ll show in this article, there are cases when adding an index can make some queries slower. It’s all about numbers, but no complex math is involved.

Quick reminder: what is an index?

An index can be defined on one or more columns. It can be a regular index, or a UNIQUE index, which means that it won’t accept duplicate values other than NULL.

Indexes are ordered data structures. Yes, you might know that they are B-Tree data structures, or a variation of B-Tree (usually B+Tree). But B+Tree is just an advanced ordered data structure: the lead nodes of the tree can be read from the leftmost to the rightmost to scan all values in ascending order. The other nodes can be seen as a map to find any specific value stored in a leaf node.

There are exceptions. Some databases support index types that are not ordered, the most common being the hash index type. Those indexes might be harmful too, but for different reasons. They’re outside of the scope of this article.

Index cardinality and selectivity

Index cardinality is the number of unique values contained in your index. A high cardinality means that there are many unique values, a low cardinality means that there are just a few unique values. Some examples:

  • For a primary key, the cardinality is the highest possible: it is the number of rows, because each row is unique.
  • For a UNIQUE index, it might be as high, or much lower: it’s the number of rows less NULL pseudo-values, because NULLs can be duplicated.
  • For a BOOLEAN NOT NULL column, it is 2, which is a very low number!

In the context of query optimisation, it’s commonly said that good indexes have high cardinality. That’s a simplification, and sometimes it’s completely wrong. What counts is how much selective an index is for the particular queries it serves. In other words, the important aspect is how many rows an index can exclude from the query plan. Let me show you a couple of examples.

High cardinality, low selectivity

If the cardinality is the highest possible, the selectivity for each query should also be excellent, right? Wrong. Because you might search for ranges that are not very selective. For example, suppose you have an index on a surname and name columns. Each combination is unique. But if you’re looking for all surnames starting with B, they might be a lot!

SELECT * FROM customer WHERE surname ILIKE 'B%';

Suppose we have an index on (surname, name), and its content is the following:

surnamename
AbbottJack
BabbageCharles
BakerColin
BakerTom
BensonRobert
BookPeter
BrewsterEmma
ConstableLisa
KeenanMartha
WindsorWinston

See? In this particular case, the above query will return more than half of the rows! You might say that this example is not very realistic, and you’re right. More realistic examples might include postcodes (if your customers are concentrated in a geographical area), country codes, product codes, and so on.

Low cardinality, high selectivity

Let’s see a different case, where cardinality is low but selectivity is high. How is it possible? Well, consider the case of a table with blog posts. The table has a deleted column, which is BOOLEAN NOT NULL, so only two values are admitted. But the number of deleted values is probably very low. So, the index will have terrible selectivity for a query like this:

SELECT * FROM blog_post WHERE deleted = FALSE;

But it will have great selectivity for the opposite query:

SELECT * FROM blog_post WHERE deleted = TRUE;

So, it’s important to look at index selectivity for practical queries, rather than cardinality.

Why is low selectivity so bad for query performance?

This is a great question. And once again, the correct answer is: sometimes it’s not!

Let me show you an example:

SELECT COUNT(*) FROM blog_post WHERE deleted = FALSE;

This query will have to read all the index entries where deleted = FALSE. Suppose they are tens of thousands. And suppose that only 20 rows have DELETED = TRUE. The index will exclude those 20 rows from the query execution. Over all, this is irrelevant. But it’s still the right thing to do or, at least, it won’t cause any harm.

But consider this query:

SELECT * FROM blog_post WHERE deleted = FALSE;

Do you see the difference? The first query could be executed just by reading the index. It had all the needed information. In other words, the index on deleted is a covering index for the first query. But this time, the database will have to read all the columns. But the index only contains one column. So, for each column that matches our WHERE condition, the database will have to do an additional search, to retrieve the row from the data. This increases the work to do a lot!

To be clear, normally a database will not use this index for this query. But it could. Maybe the statistics on indexes data distribution are obsolete or wrong, maybe the query planner has a nasty bug, maybe there are more complex reasons. Whatever the reason, it shouldn’t happen!

Another example:

SELECT * FROM blog_post WHERE (publication_date BETWEEN '2020-01-01' AND '2020-01-30') AND deleted = FALSE;
  • 20,000 rows in the table;
  • 19,980 rows with deleted = FALSE;
  • 100 rows for the requested time period;
  • 99 undeleted rows for the requested time period.

Now, how many rows will be read?

  • If no index is used (table scan), 20,000 rows are read.
  • If an index on publication_date is used, 100 rows are read. This is good enough!
    • For the sake of completeness: 1 row will be read even if this is not necessary. Again, I wouldn’t care.
  • If an index on delete alone is used, 19,980 entries are read from the index.
    • For each of them, the database will read the corresponding row to evaluate the condition on publication_date.
    • Only 99 rows match this condition and really needed to be read.
  • If an index on (publication_date, deleted) exists, only 99 index entries are read.
    • For each of them, the corresponding row will be read, too.

In practice, it is unlikely that the index is used in the case above. But even if it’s never used improperly (which is hard to verify), probably it shouldn’t exist. Remember: each index makes writes a bit slower, takes space in the buffer pool, and takes storage space.

An ASCII diagram from Claude

I asked Claude Sonnet 3.7 to draw a funny ASCII diagram to visually explain why using an index on deleted would be terrible for performance.

+----------------------------------------------+
|  QUERY: SELECT * FROM blog_post              |
|         WHERE deleted = 0                    |
|                                              |
|  (Translation: "Give me basically            |
|   everything except the trash, please")      |
+----------------------------------------------+
                     |
                     | "Let me check my handy index!"
                     v
+----------------------------------------------+
|     THE WORLD'S LEAST HELPFUL INDEX          |
|        (Secondary Index on deleted)          |
|----------------------------------------------|
| deleted: 0  | id: 1     <--- "Found one!"    |
|----------------------------------------------|
| deleted: 0  | id: 2     <--- "Another one!"  |
|----------------------------------------------|
| deleted: 0  | id: 3     <--- "And another!"  |
|----------------------------------------------|
| deleted: 0  | id: 4     <--- "Still going!"  |
|----------------------------------------------|
| ...         | ...       <--- "This may take  |
|             |                 a while..."    |
|----------------------------------------------|
| deleted: 0  | id: 998   <--- "Almost done!"  |
|----------------------------------------------|
| deleted: 0  | id: 999   <--- "Last one!"     |
|----------------------------------------------|
| deleted: 1  | id: 5     <--- "Nope, trash!"  |
+----------------------------------------------+
                     |
                     | "Now I'll look up EACH ONE individually!"
                     v
+----------------------------------------------+
|   PRIMARY INDEX (looking very exhausted)     |
|----------------------------------------------|
| id: 1 | title: 'Why databases hate me' | ... |
|----------------------------------------------|
| id: 2 | title: 'My index brings all    | ... |
|       |         the rows to the yard'  |     |
|----------------------------------------------|
| id: 3 | title: '99 problems but an     | ... |
|       |         efficient query        |     |
|       |         ain't one'             |     |
|----------------------------------------------|
|                    ...                       |
+----------------------------------------------+
                     |
         "After 999 separate lookups, I'm exhausted!"
                     |
                     v
+----------------------------------------------+
|            DATABASE PERFORMANCE              |
|----------------------------------------------|
|                                              |
|   TABLE SCAN:    |############      | Fast!  |
|                  |                   |        |
|   USING INDEX:   |################## | Slow!  |
|                  |################## |        |
|                  |################## |        |
|                  |################## | "Still |
|                  |################## |  going |
|                  |                   |  ..."  |
+----------------------------------------------+

+----------------------------------------------+
|        MYSQL OPTIMIZER'S INNER MONOLOGUE     |
+----------------------------------------------+
|                                              |
| "Let's see, I could:                         |
|                                              |
|  1. Read the whole table in one smooth pass  |
|                                              |
|  -OR-                                        |
|                                              |
|  2. Check this index, then jump back and     |
|     forth 999 times like I'm playing the     |
|     world's worst game of database pinball   |
|                                              |
|  Hmm, tough choice...                        |
|                                              |
|  *chooses option 2*                          |
|                                              |
|  Developer: *facepalm*"                      |
|                                              |
+----------------------------------------------+

No bad for something that is believed to be a “stochastic parrot”, isn’t it? Let me know what you think about this use of AI. Personally, I think that it’s ok to use it, as long as you remain in control and verify the quality of its output. Some other companies use it to write article from scratch… and they end up describing database features that just don’t exist! But this is a good story for another time.

Conclusions

A friend told me: You database people are so annoying. Whatever I ask you, you reply: it depends! Unfortunately, that’s partly true. It’s because databases are complex pieces of software in charge of complex tasks, so there aren’t many universal truths that you can just apply to your specific case. You need to understand how they work (to some extent!) and then apply that knowledge differently, from case to case.

Indexes are perfect examples. They are supposed to speed up queries, and usually they do it very well, leading to impressive optimisations. But not always, and actually sometimes they might damage your application horribly. To debug the problem, you’ll need to find out how indexes are implemented, and how the DBMS uses them. If you need help, you can use our training courses!

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

MariaDB SEQUENCE: a Simple Approach to Synthetic Data

MariaDB SEQUENCE: a Simple Approach to Synthetic Data

MariaDB SEQUENCE is a storage engine that generates a sequence of positive integer numbers. However, in this article I will show you that it's easy to use SEQUENCE to generate more complex sequences, that are not necessarily numeric. This is a very convenient way to...

Stored Procedures in MariaDB: Smarter, Easier and More Powerful

Stored Procedures in MariaDB: Smarter, Easier and More Powerful

MariaDB supports stored procedures written in procedural SQL. Which essentially means "SQL with IF's and loops". Most DBMSs do the same, but every project supports different, incompatible syntaxes. MariaDB implemented Oracle's dialect, called PL/SQL. The base of the...

The EXPLAIN command and its variants in MariaDB and MySQL

The EXPLAIN command and its variants in MariaDB and MySQL

For a very long time, the EXPLAIN command remained the same in the MariaDB and MySQL world. It was good enough in most cases, but the community welcomed the implementation of extensions and variants. However, MariaDB and MySQL improved it in different ways. So, if...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *