Finding Duplicate Indexes and Unused Indexes in MariaDB/MySQL

Last updated on 8 July 2021

Indexes are designed to speed up queries against a table. However, they come with a cost: every time a INSERT / UPDATE / DELETE statement is executed against a table, its indexes must be updated. And since indexes are ordered data structures, updating them is more expensive than updating the table itself. Further more, indexes take space on disk. That’s why you should periodically check which indexes are not useful anymore, and drop them.

Finding Duplicate and Unused Indexes
Blixy is looking for duplicate and unused indexes in a MariaDB database!

Finding useless indexes

So we want to delete useless indexes. But which indexes are useless? There are two types of them:

  • Duplicate indexes;
  • Unused indexes.

Duplicate indexes

In the most obvious case, a duplicate index is one that is identical to an existing index. But an index can also be duplicate if it contains a subset of the information contained in another index – or, more precisely, if it is the prefix of another index. We’ll talk about this in another article.

There is at least one tool to find such indexes: pt-duplicate-key-checker.

It finds duplicate indexes, and outputs a list of them, as well as the  ALTER TABLE s to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.

This tool is safe with respect to constraints.  UNIQUE  indexes are not considered duplicates, unless they duplicate other UNIQUE indexes. Foreign keys are considered duplicates only if they are identical.

Paranoid modes

pt-duplicate-key-checker is usually called without any arguments. For completeness, I will mention here some additional checks that could be done.

  • --all-structs : Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to manually check the output.
    • NOTE: InnoDB creates all indexes are BTREEs and uses a feature called Adaptive Hash Index to monitor how indexes are used, and internally decide if an index should be converted to HASH, or converted back to BTREE.
  • --ignore-order : Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only correct if we compare both these columns with  =  and don’t use  GROUP BY  and  ORDER BY  on them. If you have this non-common use case (in other words, if your database is used as a pure key/value store) you may use  --ignore-order  to spot duplicates, but you will have to check manually the output.

Unused indexes

It’s also useful to check for unused indexes. There are a couple of methods to find them. But it’s important to remember that they both will only find indexes that were not used after the last MariaDB/MySQL restart. If the last restart is too recent, do not rely on these methods! Remember that some indexes may be created to speed up jobs that only execute once a week, or once a month.

When eliminating indexes, follow this procedure:

  1. Eliminate duplicate indexes;
  2. Wait a certain amount of time (eg: 1 week?) because some previously unused indexes could be necessary now;
  3. Eliminate unused indexes.

The user_statistics method

If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.

If you enable it, you will have a table called  INDEX_STATISTICS , in the information_schema  database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
FROM information_schema.STATISTICS st
LEFT JOIN information_schema.INDEX_STATISTICS idx
ON  idx.INDEX_NAME    = st.INDEX_NAME
AND idx.TABLE_NAME    = st.TABLE_NAME
AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
WHERE
(idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0)
AND st.NON_UNIQUE = 1
ORDER BY 1, 2, 3
;

The performance_schema method

There is a less safe alternative. For that, you need to have the  performance_schema  enabled. If it is disabled, you need to restart your server to enable it.

Unfortunately, the  performance_schema  is disabled by default in MariaDB since version 10.0.

With the  performance_schema  enabled, you can run this query to find unused indexes:

SELECT p.OBJECT_SCHEMA, p.OBJECT_NAME, p.INDEX_NAME
FROM performance_schema.table_io_waits_summary_by_index_usage p
INNER JOIN information_schema.STATISTICS i
ON
p.OBJECT_SCHEMA = i.TABLE_SCHEMA
AND p.OBJECT_NAME   = i.TABLE_NAME
AND p.INDEX_NAME    = i.INDEX_NAME
WHERE
-- exclude PKs and UNIQUEs
i.NON_UNIQUE = 1
-- unused 
AND p.count_star = 0
ORDER BY p.object_schema, p.object_name, p.index_name
;

Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.

EDIT 16th June 2021: I modified the above query to exclude primary keys and UNIQUE indexes.

Risks

Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception. MySQL and MariaDB support the  IGNORE/USE/FORCE INDEX  syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of  IGNORE INDEX .

Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the  performance_schema  method could theoretically be used.

To reduce these risks, MySQL 8 and MariaDB 10.6 introduced useful features: MySQL 8 invisible indexes and MariaDB 10.6 ignored indexes. You can make an index invisible in this way:

-- MariaDB
ALTER TABLE t ALTER INDEX idx_a IGNORED;
-- MySQL
ALTER TABLE t ALTER INDEX idx_a INVISIBLE;

If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with  IGNORE/USE/FORCE INDEX , the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.

It is important to drop invisible indexes once we’re sure they are not needed. An invisible index still needs to be kept up to date, which has an overhead. We don’t want to have an overhead for something that we don’t need.

See also

Ebooks

Related articles

Related courses

Conclusions

We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the  performance_schema  database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.

Listing useless indexes is part of my MariaDB Health Checks and MySQL Health Checks.

As usual my conclusion is: what do you think? If you have questions, disagree with something or have more ideas, please comment.

Federico Razzoli

About Federico Razzoli

Federico is Vettabase Ltd director, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems.

Leave a Reply

Your email address will not be published. Required fields are marked *

*