MariaDB/MySQL: Working with MD5 or other hashes

by Federico Razzoli | Feb 17, 2022 | MariaDB, MySQL

Need Help? Click Here for Expert Support

Hashes are normally represented in a human-readable format. The easiest way to work with them is to just insert this human-readable form into a table, as a string. But that is not an efficient way to work with hashes. This article explains how to read and write hashes efficiently, using MD5 as an example.

Contents hide

1 Why storing the human-readable string is not efficient

2 Working with an MD5 hash

2.1 Data type

2.2 Inserting or updating an MD5 hash

Why storing the human-readable string is not efficient

Let’s consider MD5. It is a binary string and its length is 128 bits (16 bytes).

Representing it in a human-readable way means to write each byte as an hexadecimal digit. An hexadecimal digit is two characters. So a readable MD5 is 32 characters long.

If you use UTF8, the size of an MD5 is still 32 characters. But a UTF8 character length is variable. The principle is that the most common characters must take 1 byte, but up to 4 bytes can be used, so it’s still possible to represent virtually any alphabet from out planet and beyond. This is a great idea, but it means that processing a UTF8 character requires more CPU time than processing an ASCII character or a byte that is considered as “just a blob”. This additional work is a waste when it comes to hashes.

This article applies with just a few differences to any type of hash. But starting from MariaDB 10.11 it doesn’t apply to UUIDs, because a specific UUID type is now supported.

Working with an MD5 hash

MariaDB and MySQL have all the SQL functions you may need to work with an MD5 hash:

MD5() – Generate a human-readable hash from a string.
UNHEX() – Convert a hexadecimal number to a decimal number that you can insert into a binary column.
HEX() – Convert a decimal number to hexadecimal.

And now… simple recipes for lazy copy/pasters 🙂

Data type

The proper type for a column that will contain hashes is:

BINARY(16)

BINARY is a fixed-length, binary-safe type.

Inserting or updating an MD5 hash

Probably your application has MD5s in a human readable form. This is very common if these MD5s come from an external source, like the md5 system command, or the AWS CLI client. In this case, all you need to do is to convert the hexadecimal number to a string of bytes:

INSERT INTO xmp (hash) VALUES (UNHEX('065d6a3af93b2b889d53080cbb2e5b5b'));

If your application has a string to converto to MD5, you need to use the MD5() function to convert it to a hexadecimal MD5, and then the UNHEX() function to convert it to binary:

INSERT INTO xmp (hash) VALUES (UNHEX(MD5('Inverness')));

Reading an MD5

Find the human-readable form of a hash:

SELECT HEX(hash) FROM xmp;

Given a human-readable MD5, you may want to find the matching row in a table:

WHERE hash = UNHEX('065d6a3af93b2b889d53080cbb2e5b5b')

Given a string, find a hashed match:

WHERE hash = UNHEX(MD5('Inverness'))

Generated columns

We may want to store both the original strings and their hashes. An easy way to do so is to store the hash in a generated column. This column should normally be a STORED column, so that we can build an index on it.

The column can be created in this way:

ALTER TABLE xmp ADD hash BINARY(16)
    AS (UNHEX(MD5(original_string))) STORED;

Note that you may create a virtual column just to simplify your SQL queries, and avoiding including a complex expression. If that is the case, and you don’t plan adding an index, you can create a VIRTUAL column instead. Just replace the STORED keyword with VIRTUAL.

Indexes

Hashes are not ordered, and they are usually unique in a table. So normally you may want to find a single row by hash, but not rows where the hash is greater/lesser than another hash, and you don’t order rows by hash. For this you case, a UNIQUE index on the hash column is enough:

ALTER TABLE t ADD UNIQUE unq_hash (hash);

However, hashes are not really unique. In most cases we treat them as if they were, but we must aware that two different strings may be mapped to the same hash. But this problem should be handled at the application level.

Conclusion

To master more advanced optimisation techniques, consider our SQL optimisation training for teams.

Federico Razzoli

Did you like this article?

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli

Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Jun 18, 2025

MariaDB released a new Long Term Support version as Generally Available: 11.8. I always review MariaDB LTS versions once they're GA, so it's time to write a new review. Support and timeline MariaDB 11.8 is the latest Long Term Support version (LTS). The...

Query Optimisation: Using indexes for WHERE with multiple conditions

Jun 5, 2025

You have a beautiful application, but a page is slow. After some investigation, you find out that a query is unexpectedly slow, ruining user experience and causing frustration for you. It is a simple query with a WHERE clause and nothing else. You tried to build an...

Vettabase is a MariaDB Foundation sponsor!

Jun 2, 2025

Vettabase is a MariaDB Foundation partner. We contributed documentation about automating MariaDB deployments and configuration management.

Services



Email

Schedule Meeting

Phone

MariaDB/MySQL: Working with MD5 or other hashes

Why storing the human-readable string is not efficient

Working with an MD5 hash

Data type

Inserting or updating an MD5 hash

Reading an MD5

Generated columns

Indexes

Conclusion

Recent Posts

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Vettabase is a MariaDB Foundation sponsor!

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

0 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

To BLOB or not to BLOB? The image storage dilemma

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Policies & Licenses

Follow Us on Social Media