Schedule Meeting

a

Understanding tables usage with User Statistics on MariaDB and Percona Server

by | Aug 12, 2020 | MariaDB

Need Help?  Click Here for Expert Support

User Statistics is a plugin for MySQL developed by Percona. It is distributed and enabled by default in Percona Server and in MariaDB. It should be possible to compile it against MySQL, but we did not try.

In this article we’ll see a few simple, yet useful queries to analyse the way tables are used, which of them are used most – and therefore, on what we need to focus on to optimise the server’s workload.

An old photo of the Vajont Dam before the disaster, in the 1960
Vajont Dam, in Italy – a sad reminder of what can happen
if you fail to analyse your workload.
https://commons.wikimedia.org/wiki/File:Vajont_Dam_1960_panorama.jpg

Enabling and disabling User Statistics

To enable it, just run this SQL statement:

SET GLOBAL userstat := 1;

Set to 0 to disable. But its impact on performance is very low. And having more information about a server performance is generally more important than a small speed difference.

Use Cases

In this article, we want to investigate table usage. This is not necessarily about bad queries. For that purpose, we can use the slow log and tables from the information schema. Also, an older article shows how to find useless queries that are executed by a server.

Understanding tables usage, in this context, means:

  • Seeing which tables are read and written mostly. Then we can focus on optimising them and the queries that involve them.
  • Seeing which indexes are being used. Then we can check if the mostly used are as well-designed as they should be.
  • Find tables and indexes that are not used at all.

User Statistics creates some tables in the information_schema. For our purposes, we’ll use two of them:

  • TABLE_STATISTICS;
  • INDEX_STATISTICS.

These tables can also be monitored with some monitoring tool. PMM monitors TABLE_STATISTICS automatically, if User Statistics is enabled. This allows to:

  • Have a visual representation of how reads and writes are distributed among tables;
  • Easily find spikes in one table’s usage, that could match some server’s performance decrease;
  • Easily seeing if a table is being used a lot all of a sudden. When I was a DBA in companies where multiple microservices could access the same database servers, checking this graph after a performance degradation allowed me to immediately know which team did something wrong.
A measuring tape
Measure your tables
https://www.pxfuel.com/en/free-photo-xnote

Queries

See how the workload is distributed over engines and databases

Workload by storage engine (where workload means reads and writes):

SELECT
        t.ENGINE,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
        SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
        SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.ENGINE
        WITH ROLLUP
;

Workload by database:

SELECT
        t.TABLE_SCHEMA,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(IFNULL(s.ROWS_READ, 0)) AS SUM_ROWS_READ,
        SUM(IFNULL(s.ROWS_CHANGED, 0)) AS SUM_ROWS_CHANGED,
        SUM(IFNULL(s.ROWS_CHANGED_X_INDEXES, 0)) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA
        WITH ROLLUP
;

Workload by database and engine:

SELECT
        t.TABLE_SCHEMA, t.ENGINE,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(s.ROWS_READ) AS SUM_ROWS_READ,
        SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
        SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA, t.ENGINE
        WITH ROLLUP
;

Workload by engine and database (same info, opposite approach):

SELECT
        t.ENGINE, t.TABLE_SCHEMA,
        COUNT(*) AS table_count,
        COUNT(s.ROWS_READ > 0 OR s.ROWS_CHANGED > 0) AS used_table_count,
        SUM(s.ROWS_READ) AS SUM_ROWS_READ,
        SUM(s.ROWS_CHANGED) AS SUM_ROWS_CHANGED,
        SUM(s.ROWS_CHANGED_X_INDEXES) AS SUM_ROWS_CHANGED_X_INDEXES
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.ENGINE, t.TABLE_SCHEMA
        WITH ROLLUP
;

See index usage

Rows read with an index vs. read with a tablescan:

SELECT
        t.TABLE_SCHEMA, t.TABLE_NAME,
        t.ROWS_READ AS TOTAL_ROWS_READ,
        SUM(i.ROWS_READ) AS INDEX_ROWS_READ,
        t.ROWS_READ - SUM(i.ROWS_READ) AS TABLESCAN_ROWS_READ
    FROM information_schema.TABLE_STATISTICS t
    LEFT JOIN information_schema.INDEX_STATISTICS i
        ON t.TABLE_SCHEMA = i.TABLE_SCHEMA AND t.TABLE_NAME = i.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
    GROUP BY t.TABLE_SCHEMA, t.TABLE_NAME, t.ROWS_READ
;

Note: for some small tables it’s ok to have a tablescan (eg: application configuration). For others it’s not, but if you dump them you have tablescans. This query still helps you to find tables largely read with tablescans where this is not expected.

Index usage with index type added (this allows to see, for example, FULLTEXT indexes usage; this query also includes unused indexes, which are not included in INDEX_STATISTICS):

SELECT
        s.TABLE_SCHEMA, s.TABLE_NAME,
        s.INDEX_NAME, s.INDEX_TYPE,
        IFNULL(i.ROWS_READ, 0) AS ROWS_READ
    FROM information_schema.STATISTICS s
    LEFT JOIN information_schema.INDEX_STATISTICS i
        ON s.TABLE_SCHEMA = i.TABLE_SCHEMA
            AND s.TABLE_NAME = i.TABLE_NAME
            AND s.INDEX_NAME = i.INDEX_NAME
    WHERE s.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
;

Specifically list unused indexes:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND idx.INDEX_NAME IS NULL
        AND st.NON_UNIQUE = 1
    ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;

Compose the ALTER TABLEs to drop unused indexes:

SELECT st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
    FROM information_schema.STATISTICS st
    LEFT JOIN information_schema.INDEX_STATISTICS idx
        ON  idx.INDEX_NAME    = st.INDEX_NAME
        AND idx.TABLE_NAME    = st.TABLE_NAME
        AND idx.TABLE_SCHEMA  = st.TABLE_SCHEMA
    WHERE
        st.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND idx.INDEX_NAME IS NULL
        AND st.NON_UNIQUE = 1
    ORDER BY st.TABLE_SCHEMA, st.TABLE_NAME, st.INDEX_NAME
;

Miscellaneous

Find unused tables (since last server restart or User Statistics activation):

SELECT
        t.TABLE_SCHEMA, t.TABLE_NAME, t.ENGINE
    FROM information_schema.TABLES t
    LEFT JOIN information_schema.TABLE_STATISTICS s
        ON t.TABLE_SCHEMA = s.TABLE_SCHEMA AND t.TABLE_NAME = s.TABLE_NAME
    WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
        AND ROWS_READ IS NULL AND ROWS_CHANGED IS NULL
;

See also

Conclusions

Here are the queries I occasionally use to get information from the User Statistics plugin. As a general rule, the performance_schema gives more in-depth information, but not all these data are available in it. User Statistics is also simpler to use. And it can be enabled at runtime.

These queries only concern the TABLE_STATISTICS and INDEX_STATISTICS tables. Other tables can also provide insightful information, and this can be a topic for one or more future articles, if there is interest.

Part of this information is included in my MySQL Health Checks, if User Statistics are enabled. If it is not enabled, the report includes some similar (but less specific) information obtained by querying the information_schema and performance_schema databases. This is meant to both give you a better understanding on your workload, and find some problems about your indexes that need to be fixed.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Coming up next, a ColumnStore webinar

Coming up next, a ColumnStore webinar

The 27th is fast approaching but you can still signup to our live webinar where we will be exploring the benefits of time series data using MariaDB ColumnStore. We will be jumping into a live demo with some example data and queries simulating high volume time series...

The benefits of MariaDB ColumnStore

The benefits of MariaDB ColumnStore

Last week Richard announced our projects on MariaDB ColumnStore. Take a look at his great post, if you didn't already. Since then, I've got some questions from customers, colleagues and friends: why did you guys decide to robustly invest into ColumnStore, and offer...

Let’s go, MariaDB ColumnStore at Vettabase!

Let’s go, MariaDB ColumnStore at Vettabase!

I have been an avid user of Columnar based storage engines for as long as I have been a DBA. The advantage is instantly obvious once you and your teams start to write or convert queries which previously took tens of minutes, are now returning in fractions of a second....

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *