Indexes are designed to speed up queries against a table. However, they come with a cost: every time a INSERT
/UPDATE
/DELETE
statement is executed against a table, its indexes must be updated. And since indexes are ordered data structures, updating them is more expensive than updating the table itself. Further more, indexes take space on disk. That’s why you should periodically check which indexes are not useful anymore, and drop them.
Finding useless indexes
So we want to delete useless indexes. But which indexes are useless? There are two types of them:
- Duplicate indexes;
- Unused indexes.
Duplicate indexes
In the most obvious case, a duplicate index is one that is identical to an existing index. But an index can also be duplicate if it contains a subset of the information contained in another index – or, more precisely, if it is the prefix of another index. We’ll talk about this in another article.
There is at least one tool to find such indexes: pt-duplicate-key-checker.
It finds duplicate indexes, and outputs a list of them, as well as the ALTER TABLE
s to drop them. It also outputs the index definitions, so you will easily find out why a certain index is a duplicate.
This tool is safe with respect to constraints. UNIQUE
indexes are not considered duplicates, unless they duplicate other UNIQUE
indexes. Foreign keys are considered duplicates only if they are identical.
Paranoid modes
pt-duplicate-key-checker is usually called without any arguments. For completeness, I will mention here some additional checks that could be done.
--all-structs
: Ignored the index types. An index can duplicate another even if they are of different types – for example, a FULLTEXT index and a regular BTREE. This is normally wrong, but there is an exception: a BTREE could duplicate a HASH index. If your database has improper HASH indexes, you may want to try this parameter. But then, you will have to manually check the output.- NOTE: InnoDB creates all indexes are BTREEs and uses a feature called Adaptive Hash Index to monitor how indexes are used, and internally decide if an index should be converted to HASH, or converted back to BTREE.
--ignore-order
: Normally, an index on (a, b) should not be considered as a duplicate of (b, a). The reason is that some queries can only be server by one of these indexes, as explained in the previous article. This parameter causes these indexes to be considered as duplicates. This is only correct if we compare both these columns with=
and don’t useGROUP BY
andORDER BY
on them. If you have this non-common use case (in other words, if your database is used as a pure key/value store) you may use--ignore-order
to spot duplicates, but you will have to check manually the output.
Unused indexes
It’s also useful to check for unused indexes. There are a couple of methods to find them. But it’s important to remember that they both will only find indexes that were not used after the last MariaDB/MySQL restart. If the last restart is too recent, do not rely on these methods! Remember that some indexes may be created to speed up jobs that only execute once a week, or once a month.
When eliminating indexes, follow this procedure:
- Eliminate duplicate indexes;
- Wait a certain amount of time (eg: 1 week?) because some previously unused indexes could be necessary now;
- Eliminate unused indexes.
The user_statistics method
If you use MariaDB or Percona Server, you have a plugin called user_statistics. It can also be installed in MySQL.
If you enable it, you will have a table called INDEX_STATISTICS
, in theinformation_schema
database. All indexes not present in that table were not used since the last restart of MySQL. To spot them you can use this query:
SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
FROM information_schema.STATISTICS st
LEFT JOIN information_schema.INDEX_STATISTICS idx
ON idx.INDEX_NAME = st.INDEX_NAME
AND idx.TABLE_NAME = st.TABLE_NAME
AND idx.TABLE_SCHEMA = st.TABLE_SCHEMA
WHERE
(idx.INDEX_NAME IS NULL OR idx.ROWS_READ = 0)
AND st.NON_UNIQUE = 1
ORDER BY 1, 2, 3
;
The performance_schema method
There is a less safe alternative. For that, you need to have the performance_schema
enabled. If it is disabled, you need to restart your server to enable it.
Unfortunately, the
performance_schema
is disabled by default in MariaDB since version 10.0.
With the performance_schema
enabled, you can run this query to find unused indexes:
SELECT p.OBJECT_SCHEMA, p.OBJECT_NAME, p.INDEX_NAME
FROM performance_schema.table_io_waits_summary_by_index_usage p
INNER JOIN information_schema.STATISTICS i
ON
p.OBJECT_SCHEMA = i.TABLE_SCHEMA
AND p.OBJECT_NAME = i.TABLE_NAME
AND p.INDEX_NAME = i.INDEX_NAME
WHERE
-- exclude PKs and UNIQUEs
i.NON_UNIQUE = 1
-- unused
AND p.count_star = 0
ORDER BY p.object_schema, p.object_name, p.index_name
;
Why is it less safe? Because it assumes that an index is only unused if it never caused an IO wait.
EDIT 16th June 2021: I modified the above query to exclude primary keys and UNIQUE
indexes.
Risks
Duplicate indexes are not necessarily unused. This is not a problem if another index can be used in place of the duplicate. But there is an exception. MySQL and MariaDB support the IGNORE/USE/FORCE INDEX
syntax, which mentions an index by name. If the index doesn’t exist, the query fails – even in the care of IGNORE INDEX
.
Dropping an unused index is safe. As mentioned, indexes found with the user_statistics method are surely unused. But indexes found with the performance_schema
method could theoretically be used.
To reduce these risks, MySQL 8 and MariaDB 10.6 introduced useful features: MySQL 8 invisible indexes and MariaDB 10.6 ignored indexes. You can make an index invisible in this way:
-- MariaDB
ALTER TABLE t ALTER INDEX idx_a IGNORED;
-- MySQL
ALTER TABLE t ALTER INDEX idx_a INVISIBLE;
If an index is invisible, MySQL will pretend that it doesn’t exist. Therefore, if it is mentioned by a query with IGNORE/USE/FORCE INDEX
, the query will return an error. And if it is used by a query, the query will become slow – or it will use another index. The advantage is that making an index invisible or visible is an instantaneous operation. Therefore, even if the table is very big, if a problem appears, we can make the index visible again in a moment.
It is important to drop invisible indexes once we’re sure they are not needed. An invisible index still needs to be kept up to date, which has an overhead. We don’t want to have an overhead for something that we don’t need.
See also
Ebooks
Related articles
- Dropping database objects in a safe way
- MariaDB Ignored Indexes
- Why Tables need a Primary Key in MariaDB and MySQL
Related courses
Conclusions
We saw how to find duplicate indexes, including some edge cases that are not reported by default by pt-duplicate-key-checker. Then we saw how to find unused indexes: the performance_schema
database should always be enabled, but for this particular task the user_statistics plugin is better. Finally, we discussed how to use invisible indexes to reduce risks.
To get an expert review of your database schema and configuration, consider our MariaDB Health Checks and MySQL Health Checks.
Federico Razzoli
0 Comments