Schedule Meeting

a

A summary of MariaDB 10.8: key performance improvements

by | Jun 14, 2022 | MariaDB

Need Help?  Click Here for Expert Support

MariaDB 10.8 was released in May 2022 and I’m writing a review, as I did with MariaDB 10.6. I plan to keep doing this with every major release. Bear in mind that my reviews reflect my personal opinions.

10.8 is a short-term release, which means that the support expires after 1 year. It’s good to experiment with the new features, but shouldn’t be used in production. However, Vettabase supports short-term releases.

Most important changes in MariaDB 10.8

In my opinion, the features I’m describing here are the most important changes in MariaDB 10.8. I’ll discuss them in order of importance.

Lag free ALTER TABLE in replication

Task: MDEV-11675.

With older MariaDB versions and all MySQL versions, replicas can’t start replicating an ALTER TABLE until it has successfully completed on the master. This means that a long running ALTER TABLE on the master will normally be a bottleneck for replication, blocking other data changes from being applied until the master completes the operation.

With this change, replicas don’t wait for ALTER TABLE completion on the master. This is the workflow applied starting from version 10.8:

  1. The master receives an ALTER TABLE statement.
  2. The master logs the statement and starts to run it.
  3. The statement is received by the replicas, that start to run it without unnecessary delays (unless replication is already lagging for other reasons).
  4. If ALTER TABLE succeeds on the master:
    • The master logs a commit.
    • The replicas that already completed the change will make it effective (applications see the new table definition).
    • Other replicas will make it effective as soon as possible.
  5. If ALTER TABLE fails on the master:
    • The master logs a rollback.
    • The replicas that already completed the change will cancel it. Applications will never see the new table definition.
    • Other replicas cancel the operation before it completes.

InnoDB changes

Task: MDEV-14425.

The Redo Log format was modified again to reduce write amplification. See the task for detailed explanations.

This follows the changes made in version 10.5: with MDEV-14425 the Redo Log format was made more efficient, and thanks to MDEV-20907 only one (logically circular) Redo Log file exists now.

Task: MDEV-25342.

innodb_buffer_pool_chunk_size is now allocated dynamically. The automatically set value is innodb_buffer_pool_size / 128, or 128M, whatever is higher; this value may be increased so that it becomes a multiple of the OS page size (or hugepage size).

The previous default was 128M, to the value has increased for buffer pools bigger than 16G.

Faster Unicode collations

I admit that this part confuses me. The documentation points us to these tasks: MDEV-27266 and MDEV-27265, which are still work in progress. My guess, however, is that part of the code produced as a result of those tasks has been included in 10.8. I think so because I can clearly see improvements in the utf8mb3_unicode_ci and utf8mb4_unicode_ci collations. In my trivial, unprofessional tests, a comparison of two short strings shows more than 25% performance improvement:

SELECT BENCHMARK(1000000000, _utf8mb3'qwerty' = _utf8mb3'qwezxc');
10.6: 20.489 sec
10.8: 14.825 sec

SELECT BENCHMARK(1000000000, _utf8mb4'qwerty' = _utf8mb4'qwezxc');
10.6: 20.828 sec
10.8: 14.234 sec

SELECT BENCHMARK(100000000, _utf8mb4'qwerty' SOUNDS LIKE _utf8mb4'qwezxc');

What about the relative unicode collations, the less precise but faster counterparts? I couldn’t see any improvement here.

Note that I didn’t test longer strings. These “benchmarks” are only good for a first impression.

Descending indexes

Tasks: MDEV-13756MDEV-26938MDEV-26939MDEV-26996.

This is a MySQL 8 feature, and MariaDB has finally implemented it. This feature solves the problem with ORDER BY clauses that mention multiple columns in a different sort order, for example:

SELECT id, publishing_date, title
    FROM book
    ORDER BY publishing_date DESC, title ASC
;

Optimising a query like this was normally possible with indexed generated columns. For example, in this case a generated column could be defined in this expression: (DATEDIFF({ DATE '2100-01-01' }, publishing_date), which is the number of days to the year 2100 (more recent values are smaller). Then an index on (days_to_2100, title) could be built, and the query could use: ORDER BY days_to_2100, title.

So, if you really need to optimise such an ORDER BY, you can do it, even with older versions. But you’ll need to use a bad hack that I wouldn’t like to see in a database I own, to be honest.

Notes:

  • I understand that not all possible optimisations are supported yet, but I believe that the most practical case for this feature is the one demonstrated above (a simple ORDER BY, different sort directions).
  • Consider building DESC indexes when you index a column that is normally used in ORDER BY c DESC. This will bring a small improvement because of the InnoDB indexes internal structure (pages form a doubly linked list, records don’t).

JSON Histograms

Tasks: MDEV-21130, MDEV-26519.

This feature concerns optimiser statistics. These are the statistics about data distribution in tables, indexes and columns, intended for optimising queries (in which order the tables will be read, which indexes will be used, etc).

MariaDB implemented Histogram-Based statistics in version 10.4. Histogram-based statistics are collected for columns (not indexes) as engine-independent statistics. JSON histograms allow MariaDB to store more precise statistics about column data distribution. This feature was originally advertised as a 10.7 feature, but it was only merged into 10.8.

JSON histograms enable better query optimisation when we join tables, and we have WHERE conditions on non-indexed columns.

Note that this feature is not used by default. Since version 10.4, engine-independent statistics are not collected by default. To collect them, we can use ANALYZE TABLE table_name PERSISTENT FOR ALL, or ANALYZE TABLE table_name PERSISTENT FOR COLUMNS (column list). Once collected, engine-independents statistics are used for queriesby default. So, to use this feature, you can do the following:

Usability enhancements

  • mariadb-binlog options --start-position and --end-position now accept GTIDs.
  • Temporal tables partitioning has become easier. We can specify the desired number of partitions, and the rotation will be automatic. For example, if we type
    CREATE TABLE ... PARTITION BY SYSTEM_TIME INTERVAL 1 YEAR PARTITIONS 3
    a temporal table with 3 partitions will be created, one for current data and two for historical data. Partitions will be rotated once a year.
  • While creating a SPIDER table, we can now use the REMOTE_SERVER, REMOTE_DATABASE and REMOTE_TABLE options, instead of using the COMMENT clause.

Conclusions

I listed MariaDB 10.8’s most useful features (in my opinion). Most of those features are performance improvements.

Remember, however, that MariaDB 10.8 is a short-term support release, and it will be discontinued in one year since its first stable release (May 2023). While it is helpful for testing new features, most users shouldn’t use it in production. The latest long-term support release is 10.6.

If you want to take advantage of the recent and old MariaDB performance optimisations, consider MariaDB Health Check service from Vettabase.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

Coming up next, a ColumnStore webinar

Coming up next, a ColumnStore webinar

The 27th is fast approaching but you can still signup to our live webinar where we will be exploring the benefits of time series data using MariaDB ColumnStore. We will be jumping into a live demo with some example data and queries simulating high volume time series...

The benefits of MariaDB ColumnStore

The benefits of MariaDB ColumnStore

Last week Richard announced our projects on MariaDB ColumnStore. Take a look at his great post, if you didn't already. Since then, I've got some questions from customers, colleagues and friends: why did you guys decide to robustly invest into ColumnStore, and offer...

Let’s go, MariaDB ColumnStore at Vettabase!

Let’s go, MariaDB ColumnStore at Vettabase!

I have been an avid user of Columnar based storage engines for as long as I have been a DBA. The advantage is instantly obvious once you and your teams start to write or convert queries which previously took tens of minutes, are now returning in fractions of a second....

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *