A summary of MariaDB 10.8: key performance improvements

MariaDB 10.8 was released in May 2022 and I’m writing a review, as I did with MariaDB 10.6. I plan to keep doing this with every major release. Bear in mind that my reviews reflect my personal opinions.

10.8 is a short-term release, which means that the support expires after 1 year. It’s good to experiment with the new features, but shouldn’t be used in production. However, Vettabase supports short-term releases.

Most important changes in MariaDB 10.8

In my opinion, the features I’m describing here are the most important changes in MariaDB 10.8. I’ll discuss them in order of importance.

Lag free ALTER TABLE in replication

Task: MDEV-11675.

With older MariaDB versions and all MySQL versions, replicas can’t start replicating an ALTER TABLE until it has successfully completed on the master. This means that a long running ALTER TABLE on the master will normally be a bottleneck for replication, blocking other data changes from being applied until the master completes the operation.

With this change, replicas don’t wait for ALTER TABLE completion on the master. This is the workflow applied starting from version 10.8:

  1. The master receives an ALTER TABLE statement.
  2. The master logs the statement and starts to run it.
  3. The statement is received by the replicas, that start to run it without unnecessary delays (unless replication is already lagging for other reasons).
  4. If ALTER TABLE succeeds on the master:
    • The master logs a commit.
    • The replicas that already completed the change will make it effective (applications see the new table definition).
    • Other replicas will make it effective as soon as possible.
  5. If ALTER TABLE fails on the master:
    • The master logs a rollback.
    • The replicas that already completed the change will cancel it. Applications will never see the new table definition.
    • Other replicas cancel the operation before it completes.

InnoDB changes

Task: MDEV-14425.

The Redo Log format was modified again to reduce write amplification. See the task for detailed explanations.

This follows the changes made in version 10.5: with MDEV-14425 the Redo Log format was made more efficient, and thanks to MDEV-20907 only one (logically circular) Redo Log file exists now.

Task: MDEV-25342.

innodb_buffer_pool_chunk_size is now allocated dynamically. The automatically set value is innodb_buffer_pool_size / 128 , or 128M, whatever is higher; this value may be increased so that it becomes a multiple of the OS page size (or hugepage size).

The previous default was 128M, to the value has increased for buffer pools bigger than 16G.

Faster Unicode collations

I admit that this part confuses me. The documentation points us to these tasks: MDEV-27266 and MDEV-27265, which are still work in progress. My guess, however, is that part of the code produced as a result of those tasks has been included in 10.8. I think so because I can clearly see improvements in the utf8mb3_unicode_ci and utf8mb4_unicode_ci collations. In my trivial, unprofessional tests, a comparison of two short strings shows more than 25% performance improvement:

SELECT BENCHMARK(1000000000, _utf8mb3'qwerty' = _utf8mb3'qwezxc');
10.6: 20.489 sec
10.8: 14.825 sec
SELECT BENCHMARK(1000000000, _utf8mb4'qwerty' = _utf8mb4'qwezxc');
10.6: 20.828 sec
10.8: 14.234 sec
SELECT BENCHMARK(100000000, _utf8mb4'qwerty' SOUNDS LIKE _utf8mb4'qwezxc');

What about the relative unicode collations, the less precise but faster counterparts? I couldn’t see any improvement here.

Note that I didn’t test longer strings. These “benchmarks” are only good for a first impression.

Descending indexes

Tasks: MDEV-13756MDEV-26938MDEV-26939MDEV-26996.

This is a MySQL 8 feature, and MariaDB has finally implemented it. This feature solves the problem with ORDER BY clauses that mention multiple columns in a different sort order, for example:

SELECT id, publishing_date, title
FROM book
ORDER BY publishing_date DESC, title ASC
;

Optimising a query like this was normally possible with indexed generated columns. For example, in this case a generated column could be defined in this expression: (DATEDIFF({ DATE '2100-01-01' }, publishing_date) , which is the number of days to the year 2100 (more recent values are smaller). Then an index on (days_to_2100, title) could be built, and the query could use: ORDER BY days_to_2100, title .

So, if you really need to optimise such an ORDER BY , you can do it, even with older versions. But you’ll need to use a bad hack that I wouldn’t like to see in a database I own, to be honest.

Notes:

  • I understand that not all possible optimisations are supported yet, but I believe that the most practical case for this feature is the one demonstrated above (a simple ORDER BY , different sort directions).
  • Consider building DESC indexes when you index a column that is normally used in ORDER BY c DESC . This will bring a small improvement because of the InnoDB indexes internal structure (pages form a doubly linked list, records don’t).

JSON Histograms

Tasks: MDEV-21130, MDEV-26519.

This feature concerns optimiser statistics. These are the statistics about data distribution in tables, indexes and columns, intended for optimising queries (in which order the tables will be read, which indexes will be used, etc).

MariaDB implemented Histogram-Based statistics in version 10.4. Histogram-based statistics are collected for columns (not indexes) as engine-independent statistics. JSON histograms allow MariaDB to store more precise statistics about column data distribution. This feature was originally advertised as a 10.7 feature, but it was only merged into 10.8.

JSON histograms enable better query optimisation when we join tables, and we have WHERE conditions on non-indexed columns.

Note that this feature is not used by default. Since version 10.4, engine-independent statistics are not collected by default. To collect them, we can use ANALYZE TABLE table_name PERSISTENT FOR ALL , or ANALYZE TABLE table_name PERSISTENT FOR COLUMNS (column list) . Once collected, engine-independents statistics are used for queriesby default. So, to use this feature, you can do the following:

Usability enhancements

  • mariadb-binlog options --start-position and --end-position now accept GTIDs.
  • Temporal tables partitioning has become easier. We can specify the desired number of partitions, and the rotation will be automatic. For example, if we type
    CREATE TABLE ... PARTITION BY SYSTEM_TIME INTERVAL 1 YEAR PARTITIONS 3
    a temporal table with 3 partitions will be created, one for current data and two for historical data. Partitions will be rotated once a year.
  • While creating a SPIDER table, we can now use the REMOTE_SERVER , REMOTE_DATABASE and REMOTE_TABLE options, instead of using the COMMENT clause.

Conclusions

I listed MariaDB 10.8’s most useful features (in my opinion). Most of those features are performance improvements.

Remember, however, that MariaDB 10.8 is a short-term support release, and it will be discontinued in one year since its first stable release (May 2023). While it is helpful for testing new features, most users shouldn’t use it in production. The latest long-term support release is 10.6.

If you want to take advantage of the recent and old MariaDB performance optimisations, consider MariaDB Health Check service from Vettabase.

Federico Razzoli

About Federico Razzoli

Federico is Vettabase Ltd founder, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems.

4 Replies to “A summary of MariaDB 10.8: key performance improvements”

  1. That was super informative as usual, thank you!

    What is the difference between just ANALYZE TABLE … vs ANALYZE TABLE … PERSISTENT FOR ALL / COLUMNS (column_list)?

    • Thanks, Karl!
      I had to do some research to be sure to answer correctly, because the answer is not in the documentation. Actually the pages related to ANALYZE TABLE and index statistics seem to be in need for a serious update.

      • As explained in MDEV-7384, the goal of this syntax is to persist statistics on disk even if use_stat_tables is not enabled. By default it is enabled anyway. (the variable name is misspelled in the task)
      • This syntax is also necessary to collect engine-independent statistics, including histograms of non-indexed columns.
      • Right, so just “ANALYZE TABLE tablename;” only collects stats for the particular engine, not engine-independent stats, unless use_stat_tables has a particular value. Whereas “ANALYZE TABLE tablename PERSISTENT FOR …;” does collect engine-independent stats regardless. And engine-independent stats are better for the reasons stated here: https://mariadb.com/kb/en/engine-independent-table-statistics/

        BTW, I don’t think ‘use_stat_tables’ is set by default to a value that will make “ANALYZE TABLE tablename;” collect engine-independent stats. The KB article above says it will only do so if the value is ‘complementary’ or ‘preferably’.

        • You’re right, I double checked: the default value is 'preferably_for_queries' . So by default:
          * It won’t affect ANALYZE ;
          * But if you run ANALYZE ... PERSISTENT queries will use engine-independent stats.

          InnoDB statistics are good enough, now that they’re persistent (10.0 and above). BUT they won’t include any information about non-indexed columns. So if this is important for you, my advice is to explicitly run periodical ANALYZE ... PERSISTENT on specific tables.

          When/why is it important? The main use case is explained here, but essentially it’s useful for queries like this:

          SELECT ...
          FROM a JOIN b ON ...
          WHERE a.non_indexed1 = b.non_indexed2

          I can think of other use cases, but I didn’t check if they are supported by the optimiser.

Leave a Reply

Your email address will not be published.

*