Spooky Scary NULL: Unexpected Danger Lurking in Your Database
Relational databases often store the value NULL in their tables. If you’re not familiar with NULL and what it is, you could run into unexpected problems. In this article, we’ll take a look at how the IN and EXISTS operators work with NULL.
What Is NULL?
In SQL, NULL is a special value indicating that a particular piece of data doesn’t exist in the database. For SQL beginners, it’s often a source of problems. Each column in a table requires values of a specific data type. But when a given field doesn’t store any data, it is initialized with the value NULL by default.
The concept of NULL in the relational model was introduced by Edgar Frank Codd. This involves a three-value logic model that consists of TRUE, FALSE, and UNKNOWN.
These logical values are returned by logical operations.
Halloween Data!
First, let’s familiarize ourselves with the data we’ll be using. We’ll be working with two of the most popular Halloween-themed cities: Salem and Anoka.
Our friends in these cities have organized Halloween parties. They’ve created tables for each city that list the attendees and the costumes they’ll be wearing. John decided to attend the event in Salem; Sara wants to go to Anoka. Unfortunately, the two forgot to specify their costumes when signing up!
Take a look at the two data tables below. As you’ll notice, since we don’t know what costumes John and Sara will be wearing, their corresponding halloween_costume
values are set to NULL for the time being.
salem_city
id | name | halloween_costume |
---|---|---|
1 | John | NULL |
2 | Mary | skeleton |
3 | Eva | witch |
4 | Tom | pirate |
The table salem_city
contains information specific to the party in Salem. Everyone but John has specified a costume. Since John didn’t write what he’ll be wearing, his halloween_costume
value is set to NULL.
anoka_city
id | name | halloween_costume |
---|---|---|
1 | Sara | NULL |
2 | Eli | ghost |
3 | Olivier | skeleton |
Likewise, the table anoka_city
contains information specific to the party in Anoka. Again, since we don’t know Sara plans to wear, her halloween_costume
value is set to NULL.
Comparing Costumes
To recap, we have data for our friends and their halloween costumes in our database. We need to determine if any of our friends attending the Salem party will be wearing the same costumes as those attending the one in Anoka.
When working with this data, we’ll focus on using the IN operator in a subquery. First, let’s remind ourselves what a subquery is. Simply put, it’s an SQL query that contains another SQL query. If multiple independent subqueries are nested, they may run separately. Otherwise, if one subquery depends on any others to perform its tasks, it will behave as a correlated subquery.
Recall that both tables store the value NULL in the halloween_costume
column. Let’s first write an SQL query that retrieves the name and costume of any friend from Salem whose costume is also being worn by someone attending the Anoka party.
SELECT name, halloween_costume FROM salem_city WHERE halloween_costume IN ( SELECT halloween_costume FROM anoka_city )
The inner query simply returns a list of costume names from the anoka_city
table. For now, keep in mind that this list will contain the value NULL because Sara didn’t say what she’ll be wearing.
Next, the outer query returns the names of attendees and their corresponding costume names from the Salem party whose costumes match any of the ones listed in the anoka_city
table.
In the salem_city
table, recall that John didn’t list a costume. Perhaps he was in a hurry and forgot about this requirement. Thus, his halloween_costume
value is NULL.
So, we have two attendees from different parties whose costumes are both unknown. However, one NULL is not the same as another! Thus, the above query does not return John’s name and his halloween_costume
value of NULL. John’s NULL is different from Sara’s NULL.
Here is the resulting set after we run the query:
name | halloween_costume |
---|---|
Mary | skeleton |
Notice that the query only returned the row for Mary from the salem_city
table, as her costume is also worn by Olivier at the Anoka party.
Unexpected Results
Now, let’s try something different. What happens if we try to retrieve information about the attendees of the Salem party whose costumes are not stored in the anoka_city
table? In other words, what costumes are worn to the Salem party but not to the Anoka party?
By just looking at the tables, you’ll notice there are three such attendees: Eva, Tom, and John. Again, because John’s NULL is not the same as Sara’s, he should technically be included in the resulting set. Here’s the query that corresponds to our logic:
SELECT name, halloween_costume FROM salem_city WHERE halloween_costume NOT IN ( SELECT halloween_costume FROM anoka_city ) ;
After we run this query, the resulting set is, to our surprise, completely empty! We expected three records. What went wrong?
The problem is with the NULL record in the anoka_city
table. To fix the error, omit the record with NULL in the halloween_costume
column from the subquery.
SELECT name, halloween_costume FROM salem_city WHERE halloween_costume NOT IN ( SELECT halloween_costume FROM anoka_city WHERE halloween_costume IS NOT NULL ) ;
Now, the result set contains two records. Eva is dressed as a witch, and Tom is rocking his pirate swag. As expected, these halloween costumes don’t appear in the anoka_city
table.
Resulting set:
name | halloween_costume |
---|---|
Eva | witch |
Tom | pirate |
This is a slight improvement over our original approach, of course. However, we now don’t see John because we excluded the value NULL from our subquery. There’s got to be a better way—and there is!
Solution: EXISTS Operator
We can resolve this problem entirely by using the EXISTS operator in place of IN. The EXISTS operator is used to test for the existence of any record in a subquery. It returns TRUE if the subquery returns at least one record.
Take a look at the modified query:
SELECT s.name, s.halloween_costume FROM salem_city s WHERE NOT EXISTS ( SELECT a.halloween_costume FROM anoka_city a WHERE a.halloween_costume = s.halloween_costume ) ;
This allows us to achieve our desired result. Take a look at the updated result set:
name | halloween_costume |
---|---|
John | NULL |
Eva | witch |
Tom | pirate |
In this case, all expected records were returned, including John’s!
Of course, if you’re curious to see what John and Sara are really wearing for Halloween, you must attend both parties!
Summary
Databases often store the value NULL in addition to values of specified data types. NULL can create many problems and lead to unexpected results. To avoid these, make sure you account for NULL values when writing your queries!
To learn more about NULL, take a look at our other articles:
“Useful SQL Patterns: Matching Nulls by Masking Nulls”
“Understanding the Use of NULL in SQL Three-Valued Logic”