Finding Duplicate Indexes and Unused Indexes in MariaDB/MySQL

Last updated on 16 June 2021

Indexes are designed to speed up queries against a table. However, they come with a cost: every time a INSERT/UPDATE/DELETE statement is executed against a table, its indexes must be updated. And since indexes are ordered data structures, updating them is more expensive than updating the table itself. Further more, indexes take space on disk. That’s why you should periodically check which indexes are not useful anymore, and drop them.

Finding Duplicate and Unused Indexes
Blixy is looking for duplicate and unused indexes in a MariaDB database!

Finding useless indexes

So we want to delete useless indexes. But which indexes are useless? There are two types of them:

  • Duplicate indexes;
  • Unused indexes.

Duplicate indexes

In the most obvious case, a duplicate index is one that is identical to an existing index. But an index can also be duplicate if it contains a subset of the information contained in another index – or, more precisely, if it is the prefix of another index. We’ll talk about this in another article.

There is at least one tool to find such indexes: pt-duplicate-key-checker.

It finds duplicate indexes, and outputs a list of them, as well as the ALTER TABLEs to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.

This tool is safe with respect to constraints. UNIQUE indexes are not considered duplicates, unless they duplicate other UNIQUE indexes. Foreign keys are considered duplicates only if they are identical.

Paranoid modes

pt-duplicate-key-checker is usually called without any arguments. For completeness, I will mention here some additional checks that could be done.

  • --all-structs: Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to manually check the output.
    • NOTE: InnoDB creates all indexes are BTREEs and uses a feature called Adaptive Hash Index to monitor how indexes are used, and internally decide if an index should be converted to HASH, or converted back to BTREE.
  • --ignore-order: Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only correct if we compare both these columns with = and don’t use GROUP BY and ORDER BY on them. If you have this non-common use case (in other words, if your database is used as a pure key/value store) you may use --ignore-order to spot duplicates, but you will have to check manually the output.

Unused indexes

It’s also useful to check for unused indexes. There are a couple of methods to find them. But it’s important to remember that they both will only find indexes that were not used after the last MariaDB/MySQL restart. If the last restart is too recent, do not rely on these methods! Remember that some indexes may be created to speed up jobs that only execute once a week, or once a month.

When eliminating indexes, follow this procedure:

  1. Eliminate duplicate indexes;
  2. Wait a certain amount of time (eg: 1 week?) because some previously unused indexes could be necessary now;
  3. Eliminate unused indexes.

The user_statistics method

If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.

If you enable it, you will have a table called INDEX_STATISTICS, in theinformation_schema database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        (idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0)
        AND st.NON_UNIQUE = 1
    ORDER BY 1, 2, 3
;

The performance_schema method

There is a less safe alternative. For that, you need to have the performance_schema enabled. If it is disabled, you need to restart your server to enable it.

Unfortunately, the performance_schema is disabled by default in MariaDB since version 10.0.

With the performance_schema enabled, you can run this query to find unused indexes:

SELECT p.OBJECT_SCHEMA, p.OBJECT_NAME, p.INDEX_NAME
    FROM performance_schema.table_io_waits_summary_by_index_usage p
    INNER JOIN information_schema.STATISTICS i
        ON
                p.OBJECT_SCHEMA = i.TABLE_SCHEMA
            AND p.OBJECT_NAME   = i.TABLE_NAME
            AND p.INDEX_NAME    = i.INDEX_NAME
    WHERE
        -- exclude PKs and UNIQUEs
            i.NON_UNIQUE = 1
        -- unused 
        AND p.count_star = 0
    ORDER BY p.object_schema, p.object_name, p.index_name
;

Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.

EDIT 16th June 2021: I modified the above query to exclude primary keys and UNIQUE indexes.

Risks

Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception. MySQL and MariaDB support the IGNORE/USE/FORCE INDEX syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of IGNORE INDEX.

Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the performance_schema method could theoretically be used.

To reduce these risks, MySQL 8 and MariaDB 10.6 introduced useful features: MySQL 8 invisible indexes and MariaDB 10.6 ignored indexes. You can make an index invisible in this way:

-- MariaDB
ALTER TABLE t ALTER INDEX idx_a IGNORED;
-- MySQL
ALTER TABLE t ALTER INDEX idx_a INVISIBLE;

If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with IGNORE/USE/FORCE INDEX, the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.

It is important to drop invisible indexes once we’re sure they are not needed. An invisible index still needs to be kept up to date, which has an overhead. We don’t want to have an overhead for something that we don’t need.

See also

Ebooks

Related articles

Related courses

Conclusions

We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the performance_schema database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.

Listing useless indexes is part of my MariaDB Health Checks and MySQL Health Checks.

As usual my conclusion is: what do you think? If you have questions, disagree with something or have more ideas, please comment.

Federico Razzoli

About Federico Razzoli

Federico is Vettabase Ltd director, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems. Federico is also a free software supporter. He believes in information sharing and reciprocal help. He thinks that multiple organisations contributing the same repositoy and benefiting from each other's work are better than many companies re-inventing the wheel. He thinks this view is not just compatible with business, but it's today and tomorrow's way of doing business. Federico also loves beers from the North of Europe, wines from Italy, bagpipes, nature hiking and the history of computing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*