Schedule Meeting

a

The UUID data type in MariaDB

by | Apr 12, 2023 | MariaDB, MariaDB Features

Need Help?  Click Here for Expert Support

MariaDB introduced the UUID data type in version 10.7. The first long-term support (LTS) version to include it is 10.11, which was declared stable in February 2023.

In this article we will discuss how to use the UUID type, and why it is often wise to use it as a primary key.

What are UUIDs

A UUID is a value of 128 bits, designed to be unique.

MariaDB implements UUIDs of type 1. This means that the following components are used to generate a UUID:

  • a 60 bits timestamp that represents the number of hundreds of nanoseconds elapsed since the adoption of the Gregorian Calendar (15 October 1582);
  • a clock sequence;
  • a MAC address (on Linux and FreeBSD, including containers and virtual machines). This MAC address guarantees UUID uniqueness.
  • a random number (on other platforms).

How to write and read UUIDs in MariaDB SQL

Let’s see how to work with the UUID type.

To create a UUID column, use the following command:

CREATE OR REPLACE TABLE uuid_test (
    uuid UUID NOT NULL
);

We can insert UUID values as literals. We can type these literals as strings or as hexadecimal numbers. Strings can contain dashes that make them more human-readable, but it’s optional. Actually, dashes are completely ignored. The following ways to insert a value are equivalent:

INSERT INTO uuid_test VALUES
    ('b0dbb53f-d7f3-11ed-b4ed-56000464509c'),
    ('b0dbb53fd7f3-11edb4ed56000464509c'),
    ('b-0-d-bb53fd7f311edb4ed56000464509c'),
    ('b0dbb53f-d7f3-11ed-b4ed-56000464509c');

> SELECT * FROM uuid_test;
+--------------------------------------+
| uuid                                 |
+--------------------------------------+
| b0dbb53f-d7f3-11ed-b4ed-56000464509c |
| b0dbb53f-d7f3-11ed-b4ed-56000464509c |
| b0dbb53f-d7f3-11ed-b4ed-56000464509c |
| b0dbb53f-d7f3-11ed-b4ed-56000464509c |
+--------------------------------------+

Inserting UUIDs as literals can be useful if they are generated by external services, but most of the times you will want MariaDB to generate them automatically. You can do it with the UUID() function, or with the SYS_GUID() function. The latter has been added for Oracle compatibility. The only difference is that UUID() returns string representations of UUIDs with dashes, and the second omits the dashes. But, as mentioned above, when inserting values this difference is irrelevant. Another function, SHORT_UUID(), returns shortened UUIDs, but these values are not of the UUID type.

> INSERT INTO uuid_test
    VALUES
        (UUID()),
        (SYS_GUID())
    RETURNING uuid
;
+--------------------------------------+
| uuid                                 |
+--------------------------------------+
| 48b264c0-d7f6-11ed-b4ed-56000464509c |
| 48b26634-d7f6-11ed-b4ed-56000464509c |
+--------------------------------------+

The RETURNING syntax is very useful if you need to know the automatically generated UUIDs that you insert. See our article on MariaDB RETURNING Statements.

A way to generate a UUID and use it multiple times is the following:


START TRANSACTION;
-- theoretically it's unnecessary to run this SELECT in a transaction
-- but if we use ProxySQL or MaxScale we want all queries to be sent
-- to the same MariaDB instance
SELECT @uuid := UUID();
INSERT INTO band (uuid, name) VALUES (@uuid, 'Deep Purple');
INSERT INTO album (uuid, band_uuid, title) VALUES (UUID(), @uuid, 'Machine Head');
COMMIT;

Note that a UUID value takes 16 bytes. If you store the same value in the form of a string representation, it will be 36 characters. If you use the utf8mb4 character set, it will take 72 bytes. If you use ascii, it will take 36 characters. For more discussions on this topic, see Working with MD5 or other hashes.

UUIDs as primary keys

UUIDs can be used as primary keys. First of all, let’s see what the pros and cons are compared to alternatives. Then we’ll discuss how to use UUID primary keys in practice.

UUID versus AUTO_INCREMENT

In InnoDB tables, primary key values should be inserted in a chronological order. This is true for any B-Tree index with any storage engine, because these indexes are ordered data structures, and inserting a value in the middle of an ordered data structure is not very efficient. But in InnoDB, tables are physically ordered by the primary key. For more details, see Why Tables need a Primary Key in MariaDB and MySQL.

An easy way to satisfy this requirement is to use an AUTO_INCREMENT primary key. In this way, integer numbers will be generated in order. But such keys have some problems:

  1. The generation of AUTO_INCREMENT values is governed by a lock at a table level.
    For write-intensive workloads this can result in frequent waits. See the MariaDB documentation.
  2. When the maximum value is reached, trying to insert new rows will result in an error. Even if there are holes, the missing number won’t be regenerated unless you run ALTER TABLE tab AUTO_INCREMENT = num. But even this statement will fail if some numbers exist in the table that are higher than num. You might think that, if you use BIGINT UNSIGNED, the limit will never be reached. Yet, in many cases even that limit is reached. This doesn’t necessarily mean that a table has billions of rows: AUTO_INCREMENT values are wasted if rows are deleted, or if a transaction that inserts rows rolls back.
    To find out how to make sure that the maximum won’t be reached, see How to monitor that AUTO_INCREMENT columns won’t reach their limit.

UUID versus sequences

It is worth mentioning that MariaDB supports sequences, which allow to generate a numeric ID that is unique across multiple tables. I can show how to do this in a future article, if there is interest. Don’t forget to comment to let me know that you’d appreciate it.

When sequences reach their maximum value, they can restart from the minimum. This solves the AUTO_INCREMENT problem 2 discussed above, if you regularly delete the oldest IDs. But only in that case.

Also, two connections are never allowed to generate the same value, so sequences are still subject to a lock (problem 1). Using a cache for a sequence (which means, multiple values are generated and stored) alleviates the problem, but doesn’t eliminate it.

Note that maintaining a sequence that is used in multiple tables is more complicated than using UUID or AUTO_INCREMENT primary keys. The main reason is simple: you will need to check multiple tables to make sure that old values are deleted fast enough.

I don’t necessarily discourage this practice, because it can solve some real-world problems. However, DBAs generally don’t need more complexity to take care of. UUIDs are more practical for the general case.

UUID primary key performance

The UUID can be used to avoid the AUTO_INCREMENT problems described above.

It was mentioned before that primary key values should be inserted in a chronological order. If you know UUID formats, you surely know that the timestamp bits are “shuffled”, so UUIDs don’t follow a chronological order. That’s why the Laravel PHP framework implements a non-standard ordered UUID. MariaDB uses standard UUIDs of type 1, but its timestamp bits are written on disk in a format that follows the chronological order. So, while you’ll see regular, non-ordered UUIDs, the UUID type follows the requirements for an efficient primary key.

Some might object that UUID values consist of 16 bytes, while BIGINT values are 8 bytes long, and INT values are only 4 bytes long. This has several consequences:

  1. More data is written and read.
  2. The primary key is bigger.
  3. For InnoDB tables, this means that all secondary indexes are bigger too.

To be honest, all this hardly matters for small workloads, while for big workloads increasing the latency a bit is a very reasonable price to pay for eliminating some AUTO_INCREMENT locks.

The increased storage size can be a problem for databases that have limited disk space, or pay too much for storage, or have too big backups already. Typically, DBAs facing such issues should look into different ways to optimise storage:

  • Eliminate obsolete tables that are no longer written or read.
  • Eliminate historical rows that don’t get more reads.
  • Normalise tables. Many successful companies have several very big tables with 100+ columns, which are basically used as a sparse matrix.
  • Eliminate unused and duplicate indexes.
  • Compress big columns that are read, but never used for filtering, sorting, or aggregating,
  • and so on.

Most consultants don’t provide such recommendations, because it would involve skills that are beyond the database itself: understanding how databases are used, collaborating with different teams, etc. So they tend to overstate the importance of minor tweaks and pretend there is nothing more they can do. At Vettabase, we don’t use this approach.

Using UUID primary keys

To create UUID primary keys, it’s a good idea to specify a DEFAULT value:

CREATE OR REPLACE TABLE film (
    uuid UUID NOT NULL DEFAULT UUID(),
    title VARCHAR(200)
);

In this case, applications can simply ignore UUID columns when they insert values, as they are generated automatically:

INSERT INTO film (title) VALUES
    ('Restless Natives'),
    ('The Angels\' Share'),
    ('Edie'),
    ('Trainspotting');

> SELECT * FROM film;
+--------------------------------------+-------------------+
| uuid                                 | title             |
+--------------------------------------+-------------------+
| a473548d-d8be-11ed-b7c0-56000464509c | Restless Natives  |
| a47359ac-d8be-11ed-b7c0-56000464509c | The Angels' Share |
| a4735aba-d8be-11ed-b7c0-56000464509c | Edie              |
| a4735b46-d8be-11ed-b7c0-56000464509c | Trainspotting     |
+--------------------------------------+-------------------+

Conclusion

UUIDs are a modern way to generate unique identifiers. MariaDB introduced a handy UUID data type to store this information efficiently. It is particularly useful as a primary key, because it eliminates the inconvenience of using AUTO_INCREMENT IDs. I recommend that you start using UUIDs as identifiers in your applications.

If you’re interested in MariaDB 10.11 features, make sure you don’t miss my next webinar on April 27th: MariaDB 10.11, key features overview for DBAs.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

The Mystery of ProxySQL Galera Writer Switchover!

The Mystery of ProxySQL Galera Writer Switchover!

Writer switchover issues with ProxySQL and Galera can feel like an unsolved puzzle. Recently, I encountered two strange behaviors that revealed underlying issues in how ProxySQL interacts with Galera clusters. In this post, I’ll walk through the unexpected behaviors I...

Enforcing strong passwords for MariaDB users

Enforcing strong passwords for MariaDB users

MariaDB users normally connect using a password. Weak passwords are a common security problem, especially when passwords are generated by humans. However, MariaDB comes with plugins that help validating passwords to make sure they are strong enough. This article is a...

Validating rows with CHECK constraints in MariaDB

Validating rows with CHECK constraints in MariaDB

Relational databases provide several ways to validate data. CHECK constraints are a powerful tool for in-database data validation. Their impact on performance is minimal, if any. In this article we'll discuss MariaDB support for CHECK constraints. Note that the CHECK...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *