In the article What does NULL mean in SQL? I explained why I consider SQL
inconsistent. As a general rule, I should have made clear that I don’t think it is a good idea to use
, even if columns are
able by default in SQL.
However, there are exceptions. This article mentions the classical cases when using
appears much simpler than the alternatives.
is actually very useful with
indexes, but only apparently useful with foreign keys.
indexes, which are functionally the same thing) disallow duplicate values for a certain column or set of columns. But with most DBMSs,
s allow any number of
The most frequent case when this is desirable is when we need to guarantee that some optional data is unique. For example, maybe our users are not required to provide us with their email; but if they do, each user should have a unique email. This becomes very easy if we store missing emails as
Some databases support a feature called partial indexes. They are indexes that only include data matching a certain
clause. Apparently, they could be a solution, as they could allow to only include non-
data in a
- MySQL and MariaDB do not support partial indexes;
- PostgreSQL does, but they cannot be
In theory, the alternative is also simple. Instead of having an
In practice, this forces us to run a
every time we need to read a user’s email, which has an impact on performance.
Keep in mind however that this second approach is cleaner and more generic. Suppose that at some point the users can provide two emails. The typical solution is to add a column called
, which is quite a dirty trick and does not allow us to guarantee the uniqueness of all emails.
So, which method am I cheering for? None of them. In most cases, the former looks more practical. I wouldn’t use the second if the only reason is avoiding
When we have a child table logically linked to a parent table, we can enforce some constraints by adding a foreign key to the child table. So, for example, a child table book can have a foreign key that references the parent table author (in the simplistic case that each book has no more than one author).
One of the constraint that will be enforced is that the child table cannot have an orphan row. So, if a book has
, there must be a book with
The exception is that we can have a book with
. From a logical point of view, this means that we don’t know the author of the book. From the implementation point of view, the foreign key will allow to insert such a book instead of producing an error.
Alternatively, we could use a special value, like
. To be able to do so, we’ll need to add a row in the
table, which represent an unknown author.
This solution needs to be known by whoever writes the application queries. For example, a query that count the authors should keep into account that one row does not represent a specific author.
Note that this solution can easily be generalised. So we could have one row for unknown authors, one for no author at all (the book is an anthology), one could even be for unknown French Middle Age authors, and so on.
Now, suppose you are not allowing orphaned rows (books whose author is unknown) and at some point you need to start to do it. Or the opposite. This solution simply imply to add or delete a row in a table, and developers can do it themselves. But if you use
, you will have to change a column definition and run an
As usual, I’ll be happy to fix errors and discuss your ideas. I want to thank all the persons who contributed this website with their comments, creating a valuable shared knowledgebase.
Did you notice any mistake? Do you have ideas to contribute?