MariaDB/MySQL IS NULL conditions and indexes

Last updated on 18 Settembre 2021

It is not uncommon for an application to use  WHERE  conditions like this:

WHERE status = 'DELETED' OR status IS NULL

This particular one comes from the real world, some years ago.

If you run  EXPLAIN , such conditions typically only cause the  type  column to be show  ref_or_null . Whereas, without the check on  NULL , it will be  ref .

But does this mean that only a small detail in the execution will change, while the query will still be extremely fast? In case no one told you, any good database professional has one answer that is almost always valid. And that answer is: it depends. I know it’s frustrating, but maybe I can alleviate the frustration by explaining why it depends.

Ruins of a temple in Baalbak, Lebanon
With time these Roman columns are going NULL
in Baalbak, Lebanon

A look at numbers

I created a test table with slightly more than 3.5M rows. I built an index on the columns  (a, b) , in this order, and another on  (b)  only. Then I’ve run the following queries, and I felt good.

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5);
+----------+
| COUNT(*) |
+----------+
|      212 |
+----------+
1 row in set (0.01 sec)
mysql> SELECT COUNT(*) FROM t WHERE (a = 2) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|      120 |
+----------+
1 row in set (0.01 sec)

As you can see, if we add an  IS NULL  condition to any column in the index, the query remains fast. The problem is when we use  IS NULL  on multiple columns.

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|  1466664 |
+----------+
1 row in set (1 min 21.32 sec)

Blimey! It’s dead slow, is’n it? You may think that this depends on the number of rows. We are selecting a much higher number of rows, perhaps that’s the reason for the slowness. We can test it easily: I ran an  UPDATE  to replace all  NULL s with  0  values, and I repeated the query:

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|      120 |
+----------+
1 row in set (0.02 sec)

Very fast, as expected! But, is that slowdown really normal? Let’s see what happens if we look for  0  values:

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a = 0) AND (b =  5 OR b = 0);
+----------+
| COUNT(*) |
+----------+
|  2457536 |
+----------+
1 row in set (1.93 sec)

We selected even more rows this time, but the query took less than 2 seconds. Still slow, but it’s a huge improvement: the previous version took 81 seconds!

Query plans

What is the difference between these two queries? The first could not use the right index, because  ref_or_null  cannot be used on multiple columns. It used the index on  (b)  instead.:

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ref_or_null
possible_keys: idx_b,idx_a_b
key: idx_b
key_len: 10
ref: NULL
rows: 1815359
filtered: 11.43
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

The other version checks two regular values per column, so it uses range on the right index:

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2 OR a = 0) AND (b =  5 OR b = 0) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: range
possible_keys: idx_b,idx_a_b
key: idx_a_b
key_len: 10
ref: NULL
rows: 1908763
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

range  search can involve multiple columns from the same index. The  Extra  column confirms that the query is executed by only reading the  idx_a_b  index.

More details on this optimisation are in the IS NULL optimization page, in MySQL documentation.

Equal followed by range

You may or may not be familiar with this aspect of query optimisation, that I will probably describe in a later post. If you know what I’m talking about, you may at least be glad to know that the “= must precede >” rule works with both versions of the query. In other words, it is not affected by the  ref_or_null  search type.

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2000 OR a = 0) AND (b >  5 OR b = 0) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: range
possible_keys: idx_b,idx_a_b
key: idx_a_b
key_len: 10
ref: NULL
rows: 1761559
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.01 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2000 OR a IS NULL) AND (b >  5 OR b IS NULL) \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: range
possible_keys: idx_b,idx_a_b
key: idx_a_b
key_len: 10
ref: NULL
rows: 4
filtered: 100.00
Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

See also

Courses:

Conclusions

There are some good reasons to use NULL . For example, Using NULL as default value by FromDual’s Shinguz, argues that using  NULL  makes tables smaller.

But I believe that most of the times we should avoid anything that makes a good query plan impossible. This is something you may want to think about, before declaring your next column  NULL able.

Did you notice any mistake? Do you have thoughts about this article? Please comment!

Federico Razzoli

Did you like this article?

Photo credit

A proposito di Federico Razzoli

Federico is Vettabase Ltd director, and he's an expert database consultant specialised in the MariaDB and MySQL ecosystems.

2 Replies to “MariaDB/MySQL IS NULL conditions and indexes”

  1. Just another reason to use WARP 🙂

    mysql> select sum(c1 is null), sum(c2 is null), sum(c3 is null), sum(c4 is null) from mixed_nulls;
    +—————–+—————–+—————–+—————–+
    | sum(c1 is null) | sum(c2 is null) | sum(c3 is null) | sum(c4 is null) |
    +—————–+—————–+—————–+—————–+
    | 0 | 1048576 | 2097152 | 3145728 |
    +—————–+—————–+—————–+—————–+
    1 row in set (12.16 sec)

    mysql> select count(*), sum(c1 is null), sum(c2 is null), sum(c3 is null), sum(c
    4 is null)
    +———-+—————–+—————–+—————–+—————–+
    | count(*) | sum(c1 is null) | sum(c2 is null) | sum(c3 is null) | sum(c4 is null) |
    +———-+—————–+—————–+—————–+—————–+
    | 4194304 | 0 | 1048576 | 2097152 | 3145728 |
    +———-+—————–+—————–+—————–+—————–+
    1 row in set (12.07 sec)

    mysql> insert into mixed_nulls values (5,null,6,null);
    Query OK, 1 row affected (0.03 sec)

    mysql> select count(*) from mixed_nulls where (c1 = 5 or c1 is null) and (c3 = 5 and c4 is null);
    +———-+
    | count(*) |
    +———-+
    | 0 |
    +———-+
    1 row in set (0.59 sec)

    mysql> select count(*) from mixed_nulls where (c1 = 5 or c1 is null) and (c3 = 5 and c4 is null);
    +———-+
    | count(*) |
    +———-+
    | 0 |
    +———-+
    1 row in set (0.01 sec)

    mysql> select count(*) from mixed_nulls where (c1 = 5 or c1 is null) and (c3 = 6 and c4 is null);
    +———-+
    | count(*) |
    +———-+
    | 1 |
    +———-+
    1 row in set (0.01 sec)

    • Also curious what the plan is for:
      EXPLAIN SELECT * FROM t WHERE (a = 2000 OR a IS NULL) AND (b > 5 OR b IS NULL) \G

      mysql> select * from mixed_nulls where (c1 = 5 or c1 is null) and (c3 = 6 and c4 is null);
      +——+——+——+——+
      | c1 | c2 | c3 | c4 |
      +——+——+——+——+
      | 5 | NULL | 6 | NULL |
      +——+——+——+——+
      1 row in set (0.01 sec)

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *

*