Useful SQL Patterns: Matching Nulls by Masking Nulls

SQL patterns, matching NULLs, masking NULLs

As you start coding in SQL, you will use some statements and techniques over and over again. We call these “SQL patterns”. This series will look at the most common SQL patterns and consider how to use them.

In database development, SQL developers often find themselves returning to the same SQL statements. Learning about these now, early in your SQL journey, will help you work more efficiently. Today, in the first post of this series, we will consider the match by null SQL pattern.

Why Match by Null?

Often, an SQL developer must join two tables based on the values in a certain column. For the join to work, these columns must have matching values. For example, suppose we have a poorly-designed database with two tables, movies and genres. Every movie in the movie has a genre (action, comedy, adventure), but some movies have not yet been classified by genre. These records have NULL values in the genres column. In the genres table, there is a matching ID for NULL.

So, the movies table has the following columns: movieid, title, and genres. The genres table has two columns, id and genres.


movies and genres tables in Vertabelo

 

Here’s the data in the movies table:

movieid title genres
1 Toy Story (1995) 1
2 Jumanji (1995) 1
3 Grumpier Old Men (1995) 2
4 Waiting to Exhale (1995) 2
5 Father of the Bride Part II (1995) 2
6 Heat (1995) 3
7 Sabrina (1995) 2
8 Tom and Huck (1995) 1
9 Sudden Death (1995) 3
10 GoldenEye (1995)

And here’s what you’d see in the genre table:

id genres
1 Adventure
2 Comedy
3 Action
Unknown

Let’s say that you need to get a list of all movie titles with their genres, including any movies with unknown genres. Obviously, it makes sense to join the two tables using the genres column.

If you were to do a simple join…

SELECT
	title,
	g.genres
FROM
	movies m
	JOIN genres g ON (m.genres = g.id)

… You would get results that leave out any movies with an unknown value in the genre row.

title genres
Tom and Huck (1995) Adventure
Jumanji (1995) Adventure
Toy Story (1995) Adventure
Sabrina (1995) Comedy
Father of the Bride Part II (1995) Comedy
Waiting to Exhale (1995) Comedy
Grumpier Old Men (1995) Comedy
Sudden Death (1995) Action
Heat (1995) Action
GoldenEye (1995)

Hmm. No GoldenEye. What went wrong?

The Problem with Matching by Null

Remember, for a join to work, the values in the joining columns must match. But as regular readers of the Vertabelo Academy blog know, NULL does not equal NULL. NULL is not the same as NULL!

Why not? Sometimes English speakers will use “null” to mean “zero”. But in SQL, NULL does not equal zero. It does not signify “no value”. It is an unknown value. Since we do not know what one NULL represents, we can never say that it is equal to another NULL.

Yet, we still need to match records on the basis of NULL values. How can we do this?

Matching by Null with Masking Nulls

To get around the fact that we can’t match NULLs, we use the COALESCE statement. This masks all NULL values with a default value :

SELECT
	title,
	g.genres
FROM
	movies m
	JOIN genres g
		ON (COALESCE(m.genres, 0.0) = COALESCE(G.ID, 0.0))

The results now include the unknown value:

title genres
Tom and Huck (1995) Adventure
Jumanji (1995) Adventure
Toy Story (1995) Adventure
Sabrina (1995) Comedy
Father of the Bride Part II (1995) Comedy
Waiting to Exhale (1995) Comedy
Grumpier Old Men (1995) Comedy
Sudden Death (1995) Action
Heat (1995) Action
GoldenEye (1995) Unknown

The genres column is a NUMERIC type column, so we have to mask NULL values with a number – in this case, “0”. If the column was of another data type, we would mask it with a default value of that data type. For instance, if we were dealing with a CHAR column, we could use an X. To read more about data types, visit the Vertabelo Academy Blog.

One More Thing About Matching Nulls

Matching nulls is usually a quick fix used to get around data modeling mistakes. It’s better to fix your model rather than to work with null masking statements in join conditions. Still, sometimes this cannot be avoided, as with MERGE statements in complex ETL processing.

By the way, there was a mistake in the design of this example. It was done on purpose to demonstrate the match by null pattern. Can you spot it? Can you fix it?

Aldo Zelen

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.

comments powered by Disqus

GET ACCESS TO EXPERT CONTENT!

Over 85.000 happy students
and counting!