Schedule Meeting

a

Validating rows with CHECK constraints in MariaDB

by | Oct 9, 2024 | MariaDB

Need Help?  Click Here for Expert Support

Relational databases provide several ways to validate data. CHECK constraints are a powerful tool for in-database data validation. Their impact on performance is minimal, if any.

In this article we’ll discuss MariaDB support for CHECK constraints. Note that the CHECK syntax and semantics are the same for most relational databases. The most relevant difference between MariaDB and other products is the functions that can be used in the SQL expressions that filter out invalid data.

What are CHECK constraints?

CHECK clauses can be specified with CREATE TABLE or ALTER TABLE. Each clause specifies an SQL expression that is expected to always return 1 for valid data. These expressions are evaluated when a value is added (INSERT, REPLACE) or modified (UPDATE).

In this example, we’ll create a table with two constraints:

CREATE TABLE employee (
    uuid UUID DEFAULT UUID(),
    full_name VARCHAR(100) NOT NULL
        CHECK (CHAR_LENGTH(full_name) > 3),
    email VARCHAR(100) NOT NULL
        CHECK (email LIKE '%@%')
);

A CHECK constraint validates values individually. For example:

CHECK (price > 0)

They can also validate that values in the same row are consistent with each other. For example:

CHECK (purchase_date <= delivery date)

How to check for errors

CHECK constraints generate an error when we try to write invalid data. Normally, applications will need to know which constraint we attempted to violate, so they can show a proper error message. To make this possible, we will simply need to give constraints a name. The application will be able to obtain the name of the failed constraint with a regular expression.

To create a constraint with a name, we will need the following syntax:

CREATE OR REPLACE TABLE product (
    uuid UUID DEFAULT UUID() PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) UNSIGNED NOT NULL,
    discount TINYINT UNSIGNED,
    CONSTRAINT chk_name_is_not_empty
        CHECK (name > ''),
    CONSTRAINT chk_discount_is_percentage
        CHECK (discount BETWEEN 0 AND 99)
);

In this case, the constraint names are chk_name_is_not_empty and chk_discount_is_percentage. We defined two constraints. Now, let’s try to insert a row with invalid discount value:

INSERT INTO product (name, price, discount) VALUES ('Commodore 64', 1000.99, 100);

We’ll get this error:

ERROR 4025 (23000): CONSTRAINT `chk_discount_is_percentage` failed for `test`.`product`

A program can easily check which constraint failed, and produce a proper error message.

Here is a simplified PHP example that you can run in the command line:

#!/bin/env php
<?php

$constraint_errors = array(
    'chk_name_is_not_empty' => 'Product name not specified',
    'chk_discount_is_percentage' => 'Invalid product discount'
);

mysqli_report(0);
$con = mysqli_connect('localhost', 'php', 'secret', 'test');

$sql = "INSERT INTO product (name, price, discount) VALUES ('Commodore 64', 1000.99, 100);";
@mysqli_query($con, $sql);

if (mysqli_errno($con) === 4025) {
    $dbError = mysqli_error($con);
    preg_match('/CONSTRAINT `(.*?)` failed/s', $dbError, $matches);
    $constraint_name = $matches[1];
    $error = $constraint_errors[$constraint_name];
    if (!$error) {
        $error = 'Invalid data';
    }
} elseif (mysqli_errno($con)) {
    $error = 'Something went wrong';
}

echo "$error\n";

$constraint_errors is an associative array where the keys are constraint names, and the values are corresponding error messages. Instead of using an array of errors, we might write those arrays in a table and extract the desired messages with a query. This would make sense because those messages are metadata (to some extent). For the same reason, if we choose to follow this path, we should version the table contents together with the schema. But for the purpose of this article, I decided to keep the examples simple.

Error messages, here, are not redundant as they might seem. We rely on mysqli_errno() to identify the error type. 4025 indicates that a constraint failed. To get the constraint name, we use a regular expression against mysqli_error(). If the constraint name is not present in $constraint_errors, we print a more generic 'Invalid data'. For other non-zero error codes, we print an even more generic 'Something went wrong'.

CHECK constraints examples

CHECK constraints can be used to validate data in many ways. Here you will see some examples.

Simple values

  • Value is boolean:
    CHECK is_deleted IN (FALSE, TRUE)
  • Number is in range:
    CHECK pct BETWEEN -100 AND 100
  • String is not empty:
    CHECK description > ''

String validation

  • String length:
    CHECK CHAR_LENGTH(code) = 10
  • String is uppercase:
    CHECK UPPER(code)
  • Email is valid:
    CHECK email LIKE '_%@_%.__%'
  • No special characters:
    CHECK description NOT RLIKE '/.[<>]./'

JSON validation

Columns of type JSON always contain well-formed JSON objects, but we can check if those objects contain what we expect. The purpose is to show what we can do with these constraints, so we’ll use the short syntax again, without names.

  • Non-null JSON:
    CHECK JSON_TYPE(data) <> 'NULL'
  • Array of 5 elements:
    CHECK JSON_TYPE(data) = 'ARRAY' AND JSON_LENGTH(data) = 5
  • Object containing a given key:
    CHECK JSON_TYPE(data) = 'OBJECT'
    AND JSON_EXTRACT(data, '$.keyname') IS NOT NULL
  • Object containing given keys and nothing else:
    CHECK JSON_TYPE(data) = 'OBJECT' AND JSON_LENGTH(data) = 2
    AND JSON_EXTRACT(data, '$.key1') IS NOT NULL
    AND JSON_EXTRACT(data, '$.key2') IS NOT NULL

Inter-column consistency

  • Date consistency:
    birth_date >= death_date
  • If name is specified, surname must be specified:
    CHECK IF(name, surname > '', 1)
  • At least one column is specified:
    CHECK WS_CONCAT('', personal_email, work_email) > ''
  • Only one of two columns is specified:
    CHECK personal_email > '' XOR work_email > ''
  • Only one of many columns is specified:
    CHECK (phone1 > '') + (phone2 > '') + (phone3 > '') = 1

Performance considerations

When I advise our customers to use a CHECK constraint to solve a problem, a common worry is about performance.

Unlike other constraints like foreign keys and triggers, CHECK constraints are pretty simple, fast, and don’t propagate locks to other tables. The caveat is that a CHECK can’t be faster than the SQL expression is it built on. If a constraint uses a slow function, it will affect the INSERTs and UPDATEs speed.

MariaDB has a useful function to check how fast an expression is: BENCHMARK(). See this example:

SELECT
    BENCHMARK(1000000, JSON_TYPE(
        '{"one": 1, "two": 2, "three": 3, "x": null}'
    ) = 'OBJECT')
AS benchmark;
+-----------+
| benchmark |
+-----------+
|         0 |
+-----------+
1 row in set (0.314 sec)

The key points to notice are:

  • We’ve run the benchmark one million times (first argument);
  • The resultset is always 0, so it’s not relevant;
  • What matters is the execution time: 0.314 seconds in this case.

Conclusions

We discussed one of the most overlooked SQL features: data validation using CHECK constraints.

We’ve seen what CHECK constraints are, how to use them to validate data, and how an application can use them to show proper error messages to users. We’ve seen several examples of CHECK usage, including multi-column rules and JSON validation. As we’ve seen, performance of a CHECK constraint usually is not a problem.

If you want to know more about SQL features that can make your team more productive and your databases more solid, consider our training for MariaDB and MySQL.

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli
Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

Recent Posts

MariaDB ColumnStore SQL limitations

MariaDB ColumnStore SQL limitations

MariaDB ColumnStore is an extremely fast and scalable solution for analytics built on MariaDB, which Vettabase supports. MariaDB ColumnStore inherits the MariaDB SQL dialect, and many MariaDB features. However, numerous MariaDB features are not available for the...

Installing MariaDB ColumnStore on Ubuntu, for production

Installing MariaDB ColumnStore on Ubuntu, for production

Let's see how to install MariaDB ColumnStore (single node) on Ubuntu. Let's also prepare the system to run ColumnStore with good performance, as we should always do in production. The following steps should work on any modern Ubuntu version up to and including 24.04...

Writing User Defined Functions for MariaDB in Go

Writing User Defined Functions for MariaDB in Go

Sometimes standard DBMS functionality is not enough, and we seek to extend it. Most database systems, including MariaDB, support some method of creating User Defined Functions (UDFs), but what is a UDF? UDF definition and examples A UDF is simply a user created...

Services

Need Help?  Click Here for Expert Support

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *