Spooky Scary NULL: Unexpected Danger Lurking in Your Database

NULL, NOT NULL,

Relational databases often store the value NULL in their tables. If you’re not familiar with NULL and what it is, you could run into unexpected problems. In this article, we’ll take a look at how the IN and EXISTS operators work with NULL.

What Is NULL?

In SQL, NULL is a special value indicating that a particular piece of data doesn’t exist in the database. For SQL beginners, it’s often a source of problems. Each column in a table requires values of a specific data type. But when a given field doesn’t store any data, it is initialized with the value NULL by default.

The concept of NULL in the relational model was introduced by Edgar Frank Codd. This involves a three-value logic model that consists of TRUE, FALSE, and UNKNOWN.
These logical values are returned by logical operations.

Halloween Data!

First, let’s familiarize ourselves with the data we’ll be using. We’ll be working with two of the most popular Halloween-themed cities: Salem and Anoka.

Our friends in these cities have organized Halloween parties. They’ve created tables for each city that list the attendees and the costumes they’ll be wearing. John decided to attend the event in Salem; Sara wants to go to Anoka. Unfortunately, the two forgot to specify their costumes when signing up!


Vertabelo Halloween

Take a look at the two data tables below. As you’ll notice, since we don’t know what costumes John and Sara will be wearing, their corresponding halloween_costume values are set to NULL for the time being.

salem_city

id name halloween_costume
1 John NULL
2 Mary skeleton
3 Eva witch
4 Tom pirate

The table salem_city contains information specific to the party in Salem. Everyone but John has specified a costume. Since John didn’t write what he’ll be wearing, his halloween_costume value is set to NULL.

anoka_city

id name halloween_costume
1 Sara NULL
2 Eli ghost
3 Olivier skeleton

Likewise, the table anoka_city contains information specific to the party in Anoka. Again, since we don’t know Sara plans to wear, her halloween_costume value is set to NULL.

Comparing Costumes

To recap, we have data for our friends and their halloween costumes in our database. We need to determine if any of our friends attending the Salem party will be wearing the same costumes as those attending the one in Anoka.

When working with this data, we’ll focus on using the IN operator in a subquery. First, let’s remind ourselves what a subquery is. Simply put, it’s an SQL query that contains another SQL query. If multiple independent subqueries are nested, they may run separately. Otherwise, if one subquery depends on any others to perform its tasks, it will behave as a correlated subquery.

Recall that both tables store the value NULL in the halloween_costume column. Let’s first write an SQL query that retrieves the name and costume of any friend from Salem whose costume is also being worn by someone attending the Anoka party.

SELECT name, halloween_costume
FROM salem_city
WHERE halloween_costume
IN (
    SELECT halloween_costume
    FROM anoka_city
) 

The inner query simply returns a list of costume names from the anoka_city table. For now, keep in mind that this list will contain the value NULL because Sara didn’t say what she’ll be wearing.

Next, the outer query returns the names of attendees and their corresponding costume names from the Salem party whose costumes match any of the ones listed in the anoka_city table.

In the salem_city table, recall that John didn’t list a costume. Perhaps he was in a hurry and forgot about this requirement. Thus, his halloween_costume value is NULL.

So, we have two attendees from different parties whose costumes are both unknown. However, one NULL is not the same as another! Thus, the above query does not return John’s name and his halloween_costume value of NULL. John’s NULL is different from Sara’s NULL.

Here is the resulting set after we run the query:

name halloween_costume
Mary skeleton

Notice that the query only returned the row for Mary from the salem_city table, as her costume is also worn by Olivier at the Anoka party.

Unexpected Results

Now, let’s try something different. What happens if we try to retrieve information about the attendees of the Salem party whose costumes are not stored in the anoka_city table? In other words, what costumes are worn to the Salem party but not to the Anoka party?

By just looking at the tables, you’ll notice there are three such attendees: Eva, Tom, and John. Again, because John’s NULL is not the same as Sara’s, he should technically be included in the resulting set. Here’s the query that corresponds to our logic:

SELECT name, halloween_costume
FROM salem_city
WHERE halloween_costume
NOT IN   (
          SELECT halloween_costume
          FROM anoka_city
        ) ;

After we run this query, the resulting set is, to our surprise, completely empty! We expected three records. What went wrong?

The problem is with the NULL record in the anoka_city table. To fix the error, omit the record with NULL in the halloween_costume column from the subquery.

SELECT name, halloween_costume
FROM salem_city
WHERE halloween_costume
NOT IN (
        SELECT halloween_costume
        FROM anoka_city  
        WHERE halloween_costume IS NOT NULL
      ) ;

Now, the result set contains two records. Eva is dressed as a witch, and Tom is rocking his pirate swag. As expected, these halloween costumes don’t appear in the anoka_city table.

Resulting set:

name halloween_costume
Eva witch
Tom pirate

This is a slight improvement over our original approach, of course. However, we now don’t see John because we excluded the value NULL from our subquery. There’s got to be a better way—and there is!

Solution: EXISTS Operator

We can resolve this problem entirely by using the EXISTS operator in place of IN. The EXISTS operator is used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns at least one record.

Take a look at the modified query:

SELECT s.name, s.halloween_costume
FROM salem_city s
WHERE NOT EXISTS  (
                    SELECT a.halloween_costume
                    FROM anoka_city a
                    WHERE a.halloween_costume = s.halloween_costume
                  ) ;

This allows us to achieve our desired result. Take a look at the updated result set:

name halloween_costume
John NULL
Eva witch
Tom pirate

In this case, all expected records were returned, including John’s!

Of course, if you’re curious to see what John and Sara are really wearing for Halloween, you must attend both parties!

Summary

Databases often store the value NULL in addition to values of specified data types. NULL can create many problems and lead to unexpected results. To avoid these, make sure you account for NULL values when writing your queries!

To learn more about NULL, take a look at our other articles:

“Useful SQL Patterns: Matching Nulls by Masking Nulls”
“Understanding the Use of NULL in SQL Three-Valued Logic”

Dorota Wdzięczna

Dorota is an IT engineer and works as a Data Science Writer for Vertabelo. She has experience as a Java programmer, webmaster, teacher, lecturer, IT specialist, and coordinator of IT systems. In her free time, she loves working in the garden, taking photos of nature, especially macro photos of insects, and visiting beautiful locations in Poland.