MariaDB/MySQL IS NULL conditions and indexes

by Federico Razzoli | Nov 14, 2020 | MariaDB, MySQL

Need Help? Click Here for Expert Support

It is not uncommon for an application to use WHERE conditions like this:

WHERE status = 'DELETED' OR status IS NULL

If you run EXPLAIN, such conditions typically only cause the type column to be show ref_or_null. Whereas, without the check on NULL, it will be ref.

But does this mean that only a small detail in the execution will change, while the query will still be extremely fast? In case no one told you, any good database professional has one answer that is almost always valid. And that answer is: it depends. I know it’s frustrating, but maybe I can alleviate the frustration by explaining why it depends.

A look at numbers

I created a test table with slightly more than 3.5M rows. I built an index on the columns (a, b), in this order, and another on (b) only. Then I’ve run the following queries, and I felt good.

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5);
+----------+
| COUNT(*) |
+----------+
|      212 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(*) FROM t WHERE (a = 2) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|      120 |
+----------+
1 row in set (0.01 sec)

As you can see, if we add an IS NULL condition to any column in the index, the query remains fast. The problem is when we use IS NULL on multiple columns.

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|  1466664 |
+----------+
1 row in set (1 min 21.32 sec)

Blimey! It’s dead slow, is’n it? You may think that this depends on the number of rows. We are selecting a much higher number of rows, perhaps that’s the reason for the slowness. We can test it easily: I ran an UPDATE to replace all NULLs with 0 values, and I repeated the query:

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL);
+----------+
| COUNT(*) |
+----------+
|      120 |
+----------+
1 row in set (0.02 sec)

Very fast, as expected! But, is that slowdown really normal? Let’s see what happens if we look for 0 values:

mysql> SELECT COUNT(*) FROM t WHERE (a = 2 OR a = 0) AND (b =  5 OR b = 0);
+----------+
| COUNT(*) |
+----------+
|  2457536 |
+----------+
1 row in set (1.93 sec)

We selected even more rows this time, but the query took less than 2 seconds. Still slow, but it’s a huge improvement: the previous version took 81 seconds!

Query plans

What is the difference between these two queries? The first could not use the right index, because ref_or_null cannot be used on multiple columns. It used the index on (b) instead.:

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2 OR a IS NULL) AND (b =  5 OR b IS NULL) G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: ref_or_null
possible_keys: idx_b,idx_a_b
          key: idx_b
      key_len: 10
          ref: NULL
         rows: 1815359
     filtered: 11.43
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

The other version checks two regular values per column, so it uses range on the right index:

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2 OR a = 0) AND (b =  5 OR b = 0) G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idx_b,idx_a_b
          key: idx_a_b
      key_len: 10
          ref: NULL
         rows: 1908763
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

A range search can involve multiple columns from the same index. The Extra column confirms that the query is executed by only reading the idx_a_b index.

More details on this optimisation are in the IS NULL optimization page, in MySQL documentation.

Equal followed by range

The solution would be to use a regular value instead of NULL. Something that logically means “nothing”, but it is not NULL. Normally, it is 0, or the empty string '', or the beginning of the UNIX epoch '1970-01-01 00:00:00'.

You may or may not be familiar with this aspect of query optimisation, that I will probably describe in a later post. If you know what I’m talking about, you may at least be glad to know that the “= must precede >” rule works with both the following queries. In other words, it is not affected by the ref_or_null limitations.

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2000 OR a = 0) AND (b >  5 OR b = 0) G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idx_b,idx_a_b
          key: idx_a_b
      key_len: 10
          ref: NULL
         rows: 1761559
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.01 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t WHERE (a = 2000 OR a IS NULL) AND (b >  5 OR b IS NULL) G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
   partitions: NULL
         type: range
possible_keys: idx_b,idx_a_b
          key: idx_a_b
      key_len: 10
          ref: NULL
         rows: 4
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Conclusions

There are some good reasons to use NULL. For example, Using NULL as default value by FromDual’s Shinguz, argues that using NULL makes tables smaller.

But I believe that most of the times we should avoid anything that makes a good query plan impossible. This is something you may want to think about, before declaring your next column NULLable.

Did you notice any mistake? Do you have thoughts about this article? Please comment!

Federico Razzoli

All content in this blog is distributed under the CreativeCommons Attribution-ShareAlike 4.0 International license. You can use it for your needs and even modify it, but please refer to Vettabase and the author of the original post. Read more about the terms and conditions: https://creativecommons.org/licenses/by-sa/4.0/

About Federico Razzoli

Federico Razzoli is a database professional, with a preference for open source databases, who has been working with DBMSs since year 2000. In the past 20+ years, he served in a number of companies as a DBA, Database Engineer, Database Consultant and Software Developer. In 2016, Federico summarized his extensive experience with MariaDB in the “Mastering MariaDB” book published by Packt. Being an experienced database events speaker, Federico speaks at professional conferences and meetups and conducts database trainings. He is also a supporter and advocate of open source software. As the Director of Vettabase, Federico does business worldwide but prefers to do it from Scotland where he lives.

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Jun 18, 2025

MariaDB released a new Long Term Support version as Generally Available: 11.8. I always review MariaDB LTS versions once they're GA, so it's time to write a new review. Support and timeline MariaDB 11.8 is the latest Long Term Support version (LTS). The...

Query Optimisation: Using indexes for WHERE with multiple conditions

Jun 5, 2025

You have a beautiful application, but a page is slow. After some investigation, you find out that a query is unexpectedly slow, ruining user experience and causing frustration for you. It is a simple query with a WHERE clause and nothing else. You tried to build an...

Vettabase is a MariaDB Foundation sponsor!

Jun 2, 2025

Vettabase is a MariaDB Foundation partner. We contributed documentation about automating MariaDB deployments and configuration management.

Services



Email

Schedule Meeting

Phone

MariaDB/MySQL IS NULL conditions and indexes

A look at numbers

Query plans

Equal followed by range

See also

Conclusions

Recent Posts

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Vettabase is a MariaDB Foundation sponsor!

Services

Database Automation

Database Training

Database Health Check

Monthly DBA Time

Database Upgrade

0 Comments

Submit a Comment Cancel reply

Email

Schedule Meeting

Phone

Quick Links

Recent Posts

To BLOB or not to BLOB? The image storage dilemma

MariaDB 11.8 LTS: Parallel Dumps, PARSEC authentication, new SQL syntaxes, and more

Query Optimisation: Using indexes for WHERE with multiple conditions

Policies & Licenses

Follow Us on Social Media