User Statistics is a plugin for MySQL developed by Percona. It is distributed and enabled by default in Percona Server and in MariaDB. It should be possible to compile it against MySQL, but we did not try.
In this article we’ll see a few simple, yet useful queries to analyse the way tables are used, which of them are used most – and therefore, on what we need to focus on to optimise the server’s workload.
Enabling and disabling User Statistics
To enable it, just run this SQL statement:
SET GLOBAL userstat := 1;
Set to 0 to disable. But its impact on performance is very low. And having more information about a server performance is generally more important than a small speed difference.
Use Cases
In this article, we want to investigate table usage. This is not necessarily about bad queries. For that purpose, we can use the slow log and tables from the information schema. Also, an older article shows how to find useless queries that are executed by a server.
Understanding tables usage, in this context, means:
- Seeing which tables are read and written mostly. Then we can focus on optimising them and the queries that involve them.
- Seeing which indexes are being used. Then we can check if the mostly used are as well-designed as they should be.
- Find tables and indexes that are not used at all.
User Statistics creates some tables in the information_schema
. For our purposes, we’ll use two of them:
TABLE_STATISTICS
;INDEX_STATISTICS
.
These tables can also be monitored with some monitoring tool. PMM monitors TABLE_STATISTICS
automatically, if User Statistics is enabled. This allows to:
- Have a visual representation of how reads and writes are distributed among tables;
- Easily find spikes in one table’s usage, that could match some server’s performance decrease;
- Easily seeing if a table is being used a lot all of a sudden. When I was a DBA in companies where multiple microservices could access the same database servers, checking this graph after a performance degradation allowed me to immediately know which team did something wrong.
Queries
See how the workload is distributed over engines and databases
Workload by storage engine (where workload means reads and writes):
SELECT
t.ENGINE,
COUNT(*) AS table_count,
COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_STATISTICS s
ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY t.ENGINE
WITH ROLLUP
;
Workload by database:
SELECT
t.TABLE_SCHEMA,
COUNT(*) AS table_count,
COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_STATISTICS s
ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY t.TABLE_SCHEMA
WITH ROLLUP
;
Workload by database and engine:
SELECT
t.TABLE_SCHEMA, t.ENGINE,
COUNT(*) AS table_count,
COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
SUM(s.ROWS_READ) AS SUM_ROWS_READ,
SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_STATISTICS s
ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY t.TABLE_SCHEMA, t.ENGINE
WITH ROLLUP
;
Workload by engine and database (same info, opposite approach):
SELECT
t.ENGINE, t.TABLE_SCHEMA,
COUNT(*) AS table_count,
COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
SUM(s.ROWS_READ) AS SUM_ROWS_READ,
SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_STATISTICS s
ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY t.ENGINE, t.TABLE_SCHEMA
WITH ROLLUP
;
See index usage
Rows read with an index vs. read with a tablescan:
SELECT
t.TABLE_SCHEMA, t.TABLE_NAME,
t.ROWS_READ AS TOTAL_ROWS_READ,
SUM(i.ROWS_READ) AS INDEX_ROWS_READ,
t.ROWS_READ - SUM(i.ROWS_READ) AS TABLESCAN_ROWS_READ
FROM information_schema.TABLE_STATISTICS t
LEFT JOIN information_schema.INDEX_STATISTICS i
ON t.TABLE_SCHEMA = i.TABLE_SCHEMA AND t.TABLE_NAME = i.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
GROUP BY t.TABLE_SCHEMA, t.TABLE_NAME, t.ROWS_READ
;
Note: for some small tables it’s ok to have a tablescan (eg: application configuration). For others it’s not, but if you dump them you have tablescans. This query still helps you to find tables largely read with tablescans where this is not expected.
Index usage with index type added (this allows to see, for example, FULLTEXT
indexes usage; this query also includes unused indexes, which are not included in INDEX_STATISTICS
):
SELECT
s.TABLE_SCHEMA, s.TABLE_NAME,
s.INDEX_NAME, s.INDEX_TYPE,
IFNULL(i.ROWS_READ, 0) AS ROWS_READ
FROM information_schema.STATISTICS s
LEFT JOIN information_schema.INDEX_STATISTICS i
ON s.TABLE_SCHEMA = i.TABLE_SCHEMA
AND s.TABLE_NAME = i.TABLE_NAME
AND s.INDEX_NAME = i.INDEX_NAME
WHERE s.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
;
Specifically list unused indexes:
SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
FROM information_schema.STATISTICS st
LEFT JOIN information_schema.INDEX_STATISTICS idx
ON idx.INDEX_NAME = st.INDEX_NAME
AND idx.TABLE_NAME = st.TABLE_NAME
AND idx.TABLE_SCHEMA = st.TABLE_SCHEMA
WHERE
st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
AND idx.INDEX_NAME IS NULL
AND st.NON_UNIQUE = 1
ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;
Compose the ALTER TABLE
s to drop unused indexes:
SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
FROM information_schema.STATISTICS st
LEFT JOIN information_schema.INDEX_STATISTICS idx
ON idx.INDEX_NAME = st.INDEX_NAME
AND idx.TABLE_NAME = st.TABLE_NAME
AND idx.TABLE_SCHEMA = st.TABLE_SCHEMA
WHERE
st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
AND idx.INDEX_NAME IS NULL
AND st.NON_UNIQUE = 1
ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;
Miscellaneous
Find unused tables (since last server restart or User Statistics activation):
SELECT
t.TABLE_SCHEMA, t.TABLE_NAME, t.ENGINE
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_STATISTICS s
ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
AND ROWS_READ IS NULL AND ROWS_CHANGED IS NULL
;
Conclusions
Here are the queries I occasionally use to get information from the User Statistics plugin. As a general rule, the performance_schema
gives more in-depth information, but not all these data are available in it. User Statistics is also simpler to use. And it can be enabled at runtime.
These queries only concern the TABLE_STATISTICS
and INDEX_STATISTICS
tables. Other tables can also provide insightful information, and this can be a topic for one or more future articles, if there is interest.
Our MySQL Health Checks include table usage analysis, if User Statistics are enabled. If it is not enabled, the report includes some similar (but less specific) information obtained by querying the information_schema
and performance_schema
databases. This is meant to both give you a better understanding on your workload, and find some problems about your indexes that need to be fixed.
Federico Razzoli
0 Comments