Schedule Meeting

a

Finding Duplicate Indexes and Unused Indexes in MariaDB/MySQL

by | Jun 8, 2021 | MariaDB, MySQL

Need Help?  Click Here for Expert Support

Indexes are designed to speed up queries against a table. However, they come with a cost: every time a INSERT/UPDATE/DELETE statement is executed against a table, its indexes must be updated. And since indexes are ordered data structures, updating them is more expensive than updating the table itself. Further more, indexes take space on disk. That’s why you should periodically check which indexes are not useful anymore, and drop them.

Finding Duplicate and Unused Indexes
Blixy is looking for duplicate and unused indexes in a MariaDB database!

Finding useless indexes

So we want to delete useless indexes. But which indexes are useless? There are two types of them:

  • Duplicate indexes;
  • Unused indexes.

Duplicate indexes

In the most obvious case, a duplicate index is one that is identical to an existing index. But an index can also be duplicate if it contains a subset of the information contained in another index – or, more precisely, if it is the prefix of another index. We’ll talk about this in another article.

There is at least one tool to find such indexes: pt-duplicate-key-checker.

It finds duplicate indexes, and outputs a list of them, as well as the ALTER TABLEs to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.

This tool is safe with respect to constraints. UNIQUE indexes are not considered duplicates, unless they duplicate other UNIQUE indexes. Foreign keys are considered duplicates only if they are identical.

Paranoid modes

pt-duplicate-key-checker is usually called without any arguments. For completeness, I will mention here some additional checks that could be done.

  • --all-structs: Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to manually check the output.
    • NOTE: InnoDB creates all indexes are BTREEs and uses a feature called Adaptive Hash Index to monitor how indexes are used, and internally decide if an index should be converted to HASH, or converted back to BTREE.
  • --ignore-order: Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only correct if we compare both these columns with = and don’t use GROUP BY and ORDER BY on them. If you have this non-common use case (in other words, if your database is used as a pure key/value store) you may use --ignore-order to spot duplicates, but you will have to check manually the output.

Unused indexes

It’s also useful to check for unused indexes. There are a couple of methods to find them. But it’s important to remember that they both will only find indexes that were not used after the last MariaDB/MySQL restart. If the last restart is too recent, do not rely on these methods! Remember that some indexes may be created to speed up jobs that only execute once a week, or once a month.

When eliminating indexes, follow this procedure:

  1. Eliminate duplicate indexes;
  2. Wait a certain amount of time (eg: 1 week?) because some previously unused indexes could be necessary now;
  3. Eliminate unused indexes.

The user_statistics method

If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.

If you enable it, you will have a table called INDEX_STATISTICS, in theinformation_schema database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        (idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0)
        AND st.NON_UNIQUE = 1
    ORDER BY 1, 2, 3
;

The performance_schema method

There is a less safe alternative. For that, you need to have the performance_schema enabled. If it is disabled, you need to restart your server to enable it.

Unfortunately, the performance_schema is disabled by default in MariaDB since version 10.0.

With the performance_schema enabled, you can run this query to find unused indexes:

SELECT p.OBJECT_SCHEMA, p.OBJECT_NAME, p.INDEX_NAME
    FROM performance_schema.table_io_waits_summary_by_index_usage p
    INNER JOIN information_schema.STATISTICS i
        ON
                p.OBJECT_SCHEMA = i.TABLE_SCHEMA
            AND p.OBJECT_NAME   = i.TABLE_NAME
            AND p.INDEX_NAME    = i.INDEX_NAME
    WHERE
        -- exclude PKs and UNIQUEs
            i.NON_UNIQUE = 1
        -- unused 
        AND p.count_star = 0
    ORDER BY p.object_schema, p.object_name, p.index_name
;

Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.

EDIT 16th June 2021: I modified the above query to exclude primary keys and UNIQUE indexes.

Risks

Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception. MySQL and MariaDB support the IGNORE/USE/FORCE INDEX syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of IGNORE INDEX.

Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the performance_schema method could theoretically be used.

To reduce these risks, MySQL 8 and MariaDB 10.6 introduced useful features: MySQL 8 invisible indexes and MariaDB 10.6 ignored indexes. You can make an index invisible in this way:

-- MariaDB
ALTER TABLE t ALTER INDEX idx_a IGNORED;
-- MySQL
ALTER TABLE t ALTER INDEX idx_a INVISIBLE;

If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with IGNORE/USE/FORCE INDEX, the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.

It is important to drop invisible indexes once we’re sure they are not needed. An invisible index still needs to be kept up to date, which has an overhead. We don’t want to have an overhead for something that we don’t need.

See also

Ebooks

Related articles

Related courses

Conclusions

We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the performance_schema database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.

To get an expert review of your database schema and configuration, consider our MariaDB Health Checks and MySQL Health Checks.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Enforcing strong passwords for MariaDB users

Enforcing strong passwords for MariaDB users

MariaDB users normally connect using a password. Weak passwords are a common security problem, especially when passwords are generated by humans. However, MariaDB comes with plugins that help validating passwords to make sure they are strong enough. This article is a...

Validating rows with CHECK constraints in MariaDB

Validating rows with CHECK constraints in MariaDB

Relational databases provide several ways to validate data. CHECK constraints are a powerful tool for in-database data validation. Their impact on performance is minimal, if any. In this article we'll discuss MariaDB support for CHECK constraints. Note that the CHECK...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *