Relational databases provide several ways to validate data. CHECK
constraints are a powerful tool for in-database data validation. Their impact on performance is minimal, if any.
In this article we’ll discuss MariaDB support for CHECK
constraints. Note that the CHECK
syntax and semantics are the same for most relational databases. The most relevant difference between MariaDB and other products is the functions that can be used in the SQL expressions that filter out invalid data.
What are CHECK constraints?
CHECK
clauses can be specified with CREATE TABLE
or ALTER TABLE
. Each clause specifies an SQL expression that is expected to always return 1
for valid data. These expressions are evaluated when a value is added (INSERT
, REPLACE
) or modified (UPDATE
).
In this example, we’ll create a table with two constraints:
CREATE TABLE employee (
uuid UUID DEFAULT UUID(),
full_name VARCHAR(100) NOT NULL
CHECK (CHAR_LENGTH(full_name) > 3),
email VARCHAR(100) NOT NULL
CHECK (email LIKE '%@%')
);
A CHECK
constraint validates values individually. For example:
CHECK (price > 0)
They can also validate that values in the same row are consistent with each other. For example:
CHECK (purchase_date <= delivery date)
How to check for errors
CHECK
constraints generate an error when we try to write invalid data. Normally, applications will need to know which constraint we attempted to violate, so they can show a proper error message. To make this possible, we will simply need to give constraints a name. The application will be able to obtain the name of the failed constraint with a regular expression.
To create a constraint with a name, we will need the following syntax:
CREATE OR REPLACE TABLE product (
uuid UUID DEFAULT UUID() PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) UNSIGNED NOT NULL,
discount TINYINT UNSIGNED,
CONSTRAINT chk_name_is_not_empty
CHECK (name > ''),
CONSTRAINT chk_discount_is_percentage
CHECK (discount BETWEEN 0 AND 99)
);
In this case, the constraint names are chk_name_is_not_empty
and chk_discount_is_percentage
. We defined two constraints. Now, let’s try to insert a row with invalid discount
value:
INSERT INTO product (name, price, discount) VALUES ('Commodore 64', 1000.99, 100);
We’ll get this error:
ERROR 4025 (23000): CONSTRAINT `chk_discount_is_percentage` failed for `test`.`product`
A program can easily check which constraint failed, and produce a proper error message.
Here is a simplified PHP example that you can run in the command line:
#!/bin/env php
<?php
$constraint_errors = array(
'chk_name_is_not_empty' => 'Product name not specified',
'chk_discount_is_percentage' => 'Invalid product discount'
);
mysqli_report(0);
$con = mysqli_connect('localhost', 'php', 'secret', 'test');
$sql = "INSERT INTO product (name, price, discount) VALUES ('Commodore 64', 1000.99, 100);";
@mysqli_query($con, $sql);
if (mysqli_errno($con) === 4025) {
$dbError = mysqli_error($con);
preg_match('/CONSTRAINT `(.*?)` failed/s', $dbError, $matches);
$constraint_name = $matches[1];
$error = $constraint_errors[$constraint_name];
if (!$error) {
$error = 'Invalid data';
}
} elseif (mysqli_errno($con)) {
$error = 'Something went wrong';
}
echo "$error\n";
$constraint_errors
is an associative array where the keys are constraint names, and the values are corresponding error messages. Instead of using an array of errors, we might write those arrays in a table and extract the desired messages with a query. This would make sense because those messages are metadata (to some extent). For the same reason, if we choose to follow this path, we should version the table contents together with the schema. But for the purpose of this article, I decided to keep the examples simple.
Error messages, here, are not redundant as they might seem. We rely on mysqli_errno()
to identify the error type. 4025
indicates that a constraint failed. To get the constraint name, we use a regular expression against mysqli_error()
. If the constraint name is not present in $constraint_errors
, we print a more generic 'Invalid data'
. For other non-zero error codes, we print an even more generic 'Something went wrong'
.
CHECK constraints examples
CHECK
constraints can be used to validate data in many ways. Here you will see some examples.
Simple values
- Value is boolean:
CHECK is_deleted IN (FALSE, TRUE)
- Number is in range:
CHECK pct BETWEEN -100 AND 100
- String is not empty:
CHECK description > ''
String validation
- String length:
CHECK CHAR_LENGTH(code) = 10
- String is uppercase:
CHECK UPPER(code)
- Email is valid:
CHECK email LIKE '_%@_%.__%'
- No special characters:
CHECK description NOT RLIKE '/.[<>]./'
JSON validation
Columns of type JSON always contain well-formed JSON objects, but we can check if those objects contain what we expect. The purpose is to show what we can do with these constraints, so we’ll use the short syntax again, without names.
- Non-null JSON:
CHECK JSON_TYPE(data) <> 'NULL'
- Array of 5 elements:
CHECK JSON_TYPE(data) = 'ARRAY' AND JSON_LENGTH(data) = 5
- Object containing a given key:
CHECK JSON_TYPE(data) = 'OBJECT'
AND JSON_EXTRACT(data, '$.keyname') IS NOT NULL
- Object containing given keys and nothing else:
CHECK JSON_TYPE(data) = 'OBJECT' AND JSON_LENGTH(data) = 2
AND JSON_EXTRACT(data, '$.key1') IS NOT NULL
AND JSON_EXTRACT(data, '$.key2') IS NOT NULL
Inter-column consistency
- Date consistency:
birth_date >= death_date
- If
name
is specified,surname
must be specified:CHECK IF(name, surname > '', 1)
- At least one column is specified:
CHECK WS_CONCAT('', personal_email, work_email) > ''
- Only one of two columns is specified:
CHECK personal_email > '' XOR work_email > ''
- Only one of many columns is specified:
CHECK (phone1 > '') + (phone2 > '') + (phone3 > '') = 1
Performance considerations
When I advise our customers to use a CHECK
constraint to solve a problem, a common worry is about performance.
Unlike other constraints like foreign keys and triggers, CHECK
constraints are pretty simple, fast, and don’t propagate locks to other tables. The caveat is that a CHECK
can’t be faster than the SQL expression is it built on. If a constraint uses a slow function, it will affect the INSERT
s and UPDATE
s speed.
MariaDB has a useful function to check how fast an expression is: BENCHMARK()
. See this example:
SELECT
BENCHMARK(1000000, JSON_TYPE(
'{"one": 1, "two": 2, "three": 3, "x": null}'
) = 'OBJECT')
AS benchmark;
+-----------+
| benchmark |
+-----------+
| 0 |
+-----------+
1 row in set (0.314 sec)
The key points to notice are:
- We’ve run the benchmark one million times (first argument);
- The resultset is always 0, so it’s not relevant;
- What matters is the execution time: 0.314 seconds in this case.
Conclusions
We discussed one of the most overlooked SQL features: data validation using CHECK
constraints.
We’ve seen what CHECK
constraints are, how to use them to validate data, and how an application can use them to show proper error messages to users. We’ve seen several examples of CHECK
usage, including multi-column rules and JSON validation. As we’ve seen, performance of a CHECK
constraint usually is not a problem.
If you want to know more about SQL features that can make your team more productive and your databases more solid, consider our training for MariaDB and MySQL.
Federico Razzoli
0 Comments