Rarely is the full behavior of the NULL value in SQL taught or described in detail, and with good reason: Some of the SQL rules surrounding NULL can be surprising or unintuitive.

Unfortunately, if you have deal with NULL in real databases, the results can be downright frustrating. The SQLite project, for example, uses trial and error to determine how a database behaves in the presence of NULL values.

Fortunately, Date and Darwen's A Guide to the SQL Standard (fourth edition) [1] describes SQL's rules concerning NULL in good detail.

Intuitively, NULL approximately represents an unknown value.

- An arithmetic operation involving a NULL returns NULL. For example, NULL minus NULL yields NULL, not zero. [2]
- A boolean comparison between two values involving a NULL
returns neither true nor false,
but unknown in SQL's three-valued logic.
[3]
For example, neither NULL equals NULL nor NULL not-equals NULL
is true.
Testing whether a value is NULL requires an expression such as
`IS NULL`

or`IS NOT NULL`

. - An SQL query selects only values whose
`WHERE`

expression evaluates to true, and groups whose`HAVING`

clause evaluates to true. - The aggregate
`COUNT(*)`

counts all NULL and non-NULL tuples;`COUNT`

(attribute) counts all tuples whose attribute value is not NULL. Other SQL aggregate functions ignore NULL values in their computation. [4]

Here is what the SQL standard mandates for some operations involving sets and multisets.

For a simple relation `R`

CREATE TABLE R (a INTEGER);the following queries attempt to reliably determine the maximum known value of the attribute

`a`

in the table `R`

.
For brevity below, let

- empty refer to an empty result of no rows, and
- NULL (in the context of a table or result) refer to a table with one row holding a NULL value.

This is the obvious query:

SELECT MAX(a) FROM R

- If
`R`

is empty, the query must return NULL, not empty. [5] - If
`R`

is a one-row table holding NULL, Date and Darwen don't specifically declare what`MAX()`

should return if its argument consists of exclusively NULL values. - If
`R`

is a table holding NULL and non-NULL integers, NULLs are ignored, and`MAX()`

returns the maximum integer.

This expression of the maximum seems consistent with mathematical logic, but fails completely in SQL:

SELECT DISTINCT a FROM R WHERE a >= ALL (SELECT * FROM R)

- If
`R`

is empty, the query returns empty. The`>= ALL`

test is vacuously true with an empty subquery [6], but there is no value of a to exploit the test. - If
`R`

holds a NULL value, the query returns empty, because the test`a >= ALL(...)`

returns unknown (not false!) for any NULL or maximum non-NULL integer value of`a`

if the subquery includes a NULL value. [7]

This expression is one derivation of maximum as computed in
relational algebra:
subtract all the non-maximum values from the table `R`

,
leaving the maximal ones:

(SELECT DISTINCT * FROM R) EXCEPT (SELECT R.a FROM R, R AS S WHERE R.a < S.a)

- If
`R`

is empty, the query returns empty. - If
`R`

holds a NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). The lower subquery never includes NULL, so NULL is never subtracted from`R`

.

This expression is another writing of maximum as computed in
relational algebra:
find values not in the non-maximum values of `R`

:

SELECT DISTINCT * FROM R WHERE a NOT IN (SELECT R.a FROM R, R AS S WHERE R.a < S.a)This writing turns out to be subtly different from the last one.

- If
`R`

is empty, the query returns empty. - If
`R`

holds one integer, and at least one NULL value, the query returns NULL, in addition to whatever maximal integer is present (if any). In this case, the subquery is always empty; the one available integer is compared only to NULL values, so do not participate in the subquery's result. NULL`NOT IN`

(empty) is vacuously true as it is in mathematics [8], so NULL is selected as part of the result. - If
`R`

holds more than one integer, the query returns the maximal integer. In this case, the subquery includes at least one value. Now that the subquery is not empty, NULL`NOT IN`

(nonempty result) evaluates to unknown (not false!), and is no longer selected as part of the result. As an aside, NULL`NOT IN`

(nonempty result) returns unknown even if the nonempty result includes NULL. [9]

Because it is somewhat awkward to have an expression for `MAX`

return two rows whose values do not equal, the following expression
adjusts the EXCEPT expression to exclude NULL from the answer:

(SELECT DISTINCT * FROM R) EXCEPT (SELECT R.a FROM R, R AS S WHERE R.a < S.a OR R.a IS NULL)

- If
`R`

is empty, the query returns empty. - If
`R`

holds NULL, the query returns the maximal integer, or empty if`R`

has no integers.`EXCEPT`

will remove NULL from the result if NULL appears in the bottom subquery, even though NULL is not equal to NULL. [10] Similarly,`DISTINCT`

,`UNION`

, and`INTERSECT`

always returns at most one NULL.

Where SQL mandates a behavior for a query above, PostgreSQL complies.

`MAX()`

of NULLs only: returns NULL (consistent with ignoring NULLs, then computing the`MAX()`

of an empty remainder).

Where SQL mandates a behavior for a query above, Oracle complies.

`MAX()`

of NULLs only: returns NULL (consistent with ignoring NULLs, then computing the`MAX()`

of an empty remainder).

No pair of the queries from the list above are equivalent when faced with NULLs in relational data, despite their conceptual similarity.

The two implementations tested, PostgreSQL and Oracle, seem to comply with NULL behavior for the set and multiset operations tested here, even when such behavior is sometimes subtle or unintuitive.

Consider the above a good reason to define away NULLs from relational schema whenever possible.

Footnotes for this page

[1] C. J. Date and Hugh Darwen
A Guide to the SQL Standard. Fourth edition, Addison-Wesley,
Reading, Massachusetts, 1997. (ISBN 0-201-96426-0)

[2] page 236.

[3] page 239.

[4] page 236-237.

[5] page 237.

[6] page 176.

[7] page 244-245.

[8] page 176.

[9] page 244.

[10] page 249.

This document was written originally by Wang Lam for Prof. Jennifer Widom's CS145 class, Spring 2003.