Schedule Meeting

a

Why Tables need a Primary Key in MariaDB and MySQL

by | Aug 10, 2020 | MariaDB, MariaDB InnoDB, MySQL, MySQL InnoDB

Need Help?  Click Here for Expert Support

In MariaDB and MySQL, all tables should have a primary key – especially InnoDB tables. I mentioned this in several articles, see Primary Key in InnoDB: how they work and best practices. But occasionally people tell me that they are not convinced about that, and I have to explain the reasons. I am going to write them hear, to keep this article as a future reference. I will include links to other resources for more in depth info about single aspects.

As noted on Facebook by Andrew Hutchings (the LinuxJedi), some storage engines don’t support indexes, so all the following thoughts don’t apply to them. An example is ColumnStore.

Implicit primary keys

First of all, what happens if there is no primary key? There are two possibilities.

If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

MySQL 8.0 manual: 15.6.2.1 Clustered and Secondary Indexes, 28th June 2019

So, it’s possible that the table will still have a clustered index based on an existing secondary index. This will not be obvious from information_schema tables or statements like SHOW CREATE TABLE. The effects of removing that particular UNIQUE index will not be obvious. Basically, such an important behavior depends on the presence of a particular UNIQUE index that was created with the table (not later), and the order of such indexes in the CREATE TABLE statement. This is counter-intuitive and confusing. If we build a view to list tables without a primary key, we need additional logic to check if there is an implicit primary key.

Now let’s see what happens if no index is eligible to be a clustered index.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID thatInnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

MySQL 8.0 manual: 15.6.2.1 Clustered and Secondary Indexes, 28th June 2019

This clarifies what a clustered index is (if there is no NOT NULL UNIQUE column). I’ve met people who thought that this is good, because saves us from having a table without a primary key. But actually, all the problems explained in the rest of this article only apply to tables with the 6 bytes clustered index.

Fast queries

This is the weakest reason. For some tables, it doesn’t even apply. Yet, it is something that we should keep in mind.

As I explained in the article Primary Key in InnoDB: how they work and best practices, in InnoDB a table is actually a primary key. The values of all columns are physically stored in the primary key, ordered by primary key values. All other indexes are stored in separated data structures.

Think what it means for very common queries like this:

SELECT first_name, last_name
    FROM user
    WHERE id = 24;

InnoDB will look for id=24 in the primary key, and it finds all values it needs. A very fast search. But think about this query:

SELECT first_name, last_name
    FROM user
    WHERE email = 'fu********@no********.uk';

Suppose there is an index on email. InnoDB will look for that beautiful email address in the index; if it finds it, it will find id=24, and then it will needs to search the corresponding entry in the primary key. Two operations instead of one. That’s why searches by primary key are much faster with InnoDB.

Clustered index and dict_sys->mutex

The mutex problem is well described in the wonderful article How does InnoDB behave without a Primary Key?, by Jeremy Cole. The core of the problem is that:

This global sequence counter is protected by dict_sys->mutex, even for incrementing (as opposed to using atomic increment). (…)

Given how much other code within InnoDB is protected by dict_sys->mutex. I think it’s fair to say any tables with an implicit clustered key (ROW_ID) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.

How does InnoDB behave without a Primary Key?
Jeremy Cole

I am pretty sure that the article used to list an incomplete series of operations governed by dict_sys->mutex. Unfortunately the list is not there anymore, maybe because it was too incomplete, or it turned out to be incorrect, or simply because it became obsolete with time. What I can add here is that the list included many operations whose correlation with the generation of a clustered index value was quite hard to understand.

Primary key, the binary log and replication

The recommended binary log format is ROW. With this format, the changes are logged in a binary form, which means:

  • For INSERTs, the whole row is logged;
  • For DELETEs, the primary key value for the erased rows is logged;
  • For UPDATEs, the modified row’s primary key value and the new column values are logged.

Actually old values and complete values are also logged if binlog_row_image=FULL. This is not necessary, unless we use some tools the require this, like Debezium.

What happens if there is no primary key? In that case, all column values are logged to identify the row. This makes the binary log bigger. But there is a bigger problem.

In MySQL replication, the binary log entries (events) are sent from the master to the slaves so they can be replicated. A deletion or an update by primary key are supposed to be very fast. But a deletion or an update by all columns is slow, because it implies a full table scan. The bigger the table without a primary key, the slower the slaves.

This problem is demonstrated by Jean-François Gagné in his article The danger of no Primary Key when replicating in RBR.

MariaDB will add the clustered info values to the binary log at some point. This should fix the performance problem for InnoDB tables. See MDEV-24001.

Primary key and clusters

The same problem appears with clusters (both Galera and group replication), but the effect is different.

Why different? Galera and group replication are synchronous replication mechanisms. A node cannot lag behind others.

Instead, whenever we run a COMMIT against one node, the transaction needs to be certified by other nodes. The nodes that received the COMMIT will send the writeset to its fellows, which will try to apply it and communicate the result to the first node. Only after this certification process, the first node will return success to the client.

If the certification takes a long time, the COMMIT is slow from a client point of view.

EDIT. As Art wrote in a comment (thank you, Art) Galera has two additional limitations for tables without primary keys. The following quote is from MariaDB KnowledgeBase, but it’s mentioned as a limitation from mainstream Galera:

All tables should have a primary key (multi-column primary keys are supported). DELETE operations are unsupported on tables without a primary key. Also, rows in tables without a primary key may appear in a different order on different nodes.

MariaDB Galera Cluster – Known Limitations, from MariaDB KnowledgeBase

I believe that it only refers to the physical order of rows in InnoDB tablespaces (which usually means table files). I am not sure about the consequences, but I guess that moving tablespaces between different nodes is not supported if there is not a primary key. And the order of results will be different for queries without ORDER BY (but we should always use ORDER BY if the order of rows matters).

Making primary keys mandatory

MariaDB 10.1 introduced the innodb_force_primary_key variable, which forces InnoDB tables to have a primary key. Unfortunately, this doesn’t solve the replication problem if we use other storage engines.

In MariaDB 10.6+ I recommend to set wsrep_mode='REQUIRED_PRIMARY_KEY', which is not the default for compatibility reasons. There is also a wsrep_certify_nonPK variable, which automatically generates a primary key if it is missing. This variable is ON by default, even if it can cause unpredictable behaviour. Instead, I recommend to set it to OFF, and set to ON instead. In this way no primary key will be automatically generated, but the creation of tables without a primary key will fail.

MySQL 8.0 introduced a better variable: sql_require_primary_key. It simply disallows to create tables without a primary key, or drop a primary key from an existing table.

See also

Ebooks

Related courses

Conclusions

I wrote this article as a future reference. It was mentioned many times since it was originally published in Federico-Razzoli.com and now it’s also a Vettabase recommendation. When someone asks why I insist on always having primary keys, we give a quick answer, with this link for more explanations. You may want to do the same.

Our MySQL Health Checks include a list of tables without a primary key, as well as many other indexing bad practices.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

The Mystery of ProxySQL Galera Writer Switchover!

The Mystery of ProxySQL Galera Writer Switchover!

Writer switchover issues with ProxySQL and Galera can feel like an unsolved puzzle. Recently, I encountered two strange behaviors that revealed underlying issues in how ProxySQL interacts with Galera clusters. In this post, I’ll walk through the unexpected behaviors I...

Enforcing strong passwords for MariaDB users

Enforcing strong passwords for MariaDB users

MariaDB users normally connect using a password. Weak passwords are a common security problem, especially when passwords are generated by humans. However, MariaDB comes with plugins that help validating passwords to make sure they are strong enough. This article is a...

Services

Need Help?  Click Here for Expert Support

3 Comments

  1. Justin Swanhart

    While the dict_sys mutex is held shortly for tables without PK, and could cause problems /in theory/, /in practice/ it isn’t a problem. I have workloads which insert 2M rows/sec in parallel into tables w/ no PK with concurrent reads.

    Partitioning often makes having a primary key difficult or impossible, and for append only workloads that don’t delete or update rows, there is no replication penalty for not having a primary key. It is also possible to use MIXED binary logging mode which will use statement based replication for UPDATE/DELETE that are deterministic and not cause the replication penalty unless non-deterministic queries change data.

    Having a primary key and not inserting in sorted PK order can make insertions very slow as page splits happen frequently.

    Reply
    • Justin Swanhart

      As for row ordering, given a table with a PK or UNIQUE, a table scan will return rows in clustered index order. On tables without a clustered index (or rather using the internal GEN_CLUST_IDX) the rows are returned in insertion order.

      Reply
      • Justin Swanhart

        Also for BULK deletions in RBR, you can set slave_search_algorithm=HASH,INDEX,TABLE (or something like that) which will use a hash scan on the table to apply the bulk deletes. It doesn’t help if you do not delete a lot of rows at once (and depends on the max_binlog_event_size [again not sure about exact variable name] but as long as you can pack lots of deletes in an event, the RBR penalty will be less with a HASH scan.

        Reply

Submit a Comment

Your email address will not be published. Required fields are marked *