Indexes are designed to speed up queries against a table. However, they come with a cost: every time a
DELETE statement is executed against a table, its indexes must be updated. And since indexes are ordered data structures, updating them is more expensive than updating the table itself. Further more, indexes take space on disk. That’s why you should periodically check which indexes are not useful anymore, and drop them.
Finding useless indexes
So we want to delete useless indexes. But which indexes are useless? There are two types of them:
- Duplicate indexes;
- Unused indexes.
In the most obvious case, a duplicate index is one that is identical to an existing index. But an index can also be duplicate if it contains a subset of the information contained in another index – or, more precisely, if it is the prefix of another index. We’ll talk about this in another article.
There is at least one tool to find such indexes: pt-duplicate-key-checker.
It finds duplicate indexes, and outputs a list of them, as well as the
ALTER TABLEs to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.
This tool is safe with respect to constraints.
UNIQUE indexes are not considered duplicates, unless they duplicate other
UNIQUE indexes. Foreign keys are considered duplicates only if they are identical.
pt-duplicate-key-checker is usually called without any arguments. For completeness, I will mention here some additional checks that could be done.
--all-structs: Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to manually check the output.
- NOTE: InnoDB creates all indexes are BTREEs and uses a feature called Adaptive Hash Index to monitor how indexes are used, and internally decide if an index should be converted to HASH, or converted back to BTREE.
--ignore-order: Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only correct if we compare both these columns with
=and don’t use
ORDER BYon them. If you have this non-common use case (in other words, if your database is used as a pure key/value store) you may use
--ignore-orderto spot duplicates, but you will have to check manually the output.
It’s also useful to check for unused indexes. There are a couple of methods to find them. But it’s important to remember that they both will only find indexes that were not used after the last MariaDB/MySQL restart. If the last restart is too recent, do not rely on these methods! Remember that some indexes may be created to speed up jobs that only execute once a week, or once a month.
When eliminating indexes, follow this procedure:
- Eliminate duplicate indexes;
- Wait a certain amount of time (eg: 1 week?) because some previously unused indexes could be necessary now;
- Eliminate unused indexes.
The user_statistics method
If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.
If you enable it, you will have a table called
INDEX_STATISTICS, in the
information_schema database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:
SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME FROM information_schema.STATISTICS st LEFT JOIN information_schema.INDEX_STATISTICS idx ON idx.INDEX_NAME = st.INDEX_NAME AND idx.TABLE_NAME = st.TABLE_NAME AND idx.TABLE_SCHEMA = st.TABLE_SCHEMA WHERE (idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0) AND st.NON_UNIQUE = 1 ORDER BY 1, 2, 3 ;
The performance_schema method
There is a less safe alternative. For that, you need to have the
performance_schema enabled. If it is disabled, you need to restart your server to enable it.
performance_schemais disabled by default in MariaDB since version 10.0.
performance_schema enabled, you can run this query to find unused indexes:
SELECT p.OBJECT_SCHEMA, p.OBJECT_NAME, p.INDEX_NAME FROM performance_schema.table_io_waits_summary_by_index_usage p INNER JOIN information_schema.STATISTICS i ON p.OBJECT_SCHEMA = i.TABLE_SCHEMA AND p.OBJECT_NAME = i.TABLE_NAME AND p.INDEX_NAME = i.INDEX_NAME WHERE -- exclude PKs and UNIQUEs i.NON_UNIQUE = 1 -- unused AND p.count_star = 0 ORDER BY p.object_schema, p.object_name, p.index_name ;
Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.
EDIT 16th June 2021: I modified the above query to exclude primary keys and
Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception. MySQL and MariaDB support the
IGNORE/USE/FORCE INDEX syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of
Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the
performance_schema method could theoretically be used.
To reduce these risks, MySQL 8 and MariaDB 10.6 introduced useful features: MySQL 8 invisible indexes and MariaDB 10.6 ignored indexes. You can make an index invisible in this way:
-- MariaDB ALTER TABLE t ALTER INDEX idx_a IGNORED; -- MySQL ALTER TABLE t ALTER INDEX idx_a INVISIBLE;
If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with
IGNORE/USE/FORCE INDEX, the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.
It is important to drop invisible indexes once we’re sure they are not needed. An invisible index still needs to be kept up to date, which has an overhead. We don’t want to have an overhead for something that we don’t need.
- Dropping database objects in a safe way
- MariaDB Ignored Indexes
- Why Tables need a Primary Key in MariaDB and MySQL
We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the
performance_schema database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.