# NULL Values and the GROUP BY Clause

We’ve already covered how to use the `GROUP BY` clause and some aggregation functions like SUM(), AVG(), MAX(), MIN(), COUNT(). In this article, we will explain how the `GROUP BY` clause works when NULL values are involved. We will also explain about using NULLs with the `ORDER BY` clause.

In SQL, NULL is a special marker used to indicate that a data value does not exist in the database. For more details, check out Wikipedia’s explanation of NULL in SQL.

We will use the following `employee` table to illustrate how the `GROUP BY` clause works with NULL values.

EMPLOYEE TABLE
Emplid Name Department Salary
100 John Smith IT 2000
101 Jean Pellu NULL 2500
102 Mary Popins FINANCES 2000
103 Blas Merrieu NULL NULL
104 Joan Piquet IT 1000
105 Jose Gomez IT NULL

## The GROUP BY Clause and NULL Values

Let’s start by executing a simple SQL query with both the `GROUP BY` clause and NULL values:

```SELECT department
FROM employee
GROUP BY department;
```
###### RESULTS
```department
1.
2.	IT
3.	FINANCES
```

Note: I’ve added a numbered list here for clarity; usually the results would be shown as an unnumbered list.

We can see that the first result value is a NULL represented by an empty string (the empty line before the IT department). This empty space represents all the NULL values returned by the `GROUP BY` clause, so we can conclude that `GROUP BY` treats NULLs as valid values.

In the next query, we will count how many employees are in each department, including the “NULL” department:

```SELECT department, count(*)
FROM employee
GROUP BY department;
```
###### RESULTS
```department		count(*)
1. 			2
2. IT        		3
3. FINANCES 		1
```

Note: I’ve added a numbered list here for clarity; usually the results would be shown as an unnumbered list.

Analyzing the previous results from a “`GROUP BY` perspective”, we can conclude that all NULL values are grouped into one value or bucket. This makes it look like NULL is one department with two employees. However, treating NULLs this way – grouping many NULLs into one bucket – does not align with the concept that a NULL value is not equal to any other value, even another NULL.

To explain why NULLs are grouped into one bucket, we need to review the SQL standard. SQL defines “any two values that are equal to one another, or any two NULLs”, as “not distinct”. This definition of “not distinct” allows SQL to group and sort NULLs when the `GROUP BY` clause (or other keywords that perform grouping) is used.

There is another confusing point in the previous result: the way the NULL is represented (by a blank line) is not clear. One interesting way to solve this issue is to use the `COALESCE` function, which converts NULLs to a specific value but leaves other values unchanged. Let’s see the following query:

```SELECT coalesce(department,’Unassigned department’), count(*)
FROM employee
GROUP BY 1;
```
###### RESULTS
```department			count(*)
IT				3
Unassigned department	        2
FINANCES			1
```

### Aggregate Functions and Null Values

Until now we’ve been working with the NULL values in the department column, and we’ve only used the `GROUP BY` clause. Let’s try executing some queries using NULL values as parameters in aggregate functions. First, we’ll use the COUNT() function:

```SELECT COUNT(salary) as “Salaries”
FROM employee
```
###### RESULTS
```Salaries
4
```

Without the `DISTINCT` clause,` COUNT(salary) `returns the number of records that have non-NULL values (2000, 2500, 2000, 1000) in the salary column. So, we can conclude that `COUNT` doesn’t include NULL values.

Let’s try using the `COUNT(distinct column)` aggregate function, which counts all the different values in a column. How does this treat NULL values?

```SELECT COUNT(distinct salary) as “Different Salaries”
FROM employee
```
###### RESULTS
```Different Salaries
3
```

The query returned a “3”, but there are four different salaries: 2000, 2500, 1000, and NULL. Again, we can conclude that the NULL is not included in the resulting value.

Let’s see another example, this time using the `AVG()` aggregate function:

```SELECT coalesce(department,’Unassigned department’), AVG(salary)
FROM employee
GROUP BY 1
```
###### RESULTS
```department			count(*)
Unassigned department    	2500
IT				1500
FINANCES			2000
```

Let’s analyze if NULL values are included in the `AVG()` function. The IT department has three employees with the following salary values: 2000, 1000, and NULL. The `AVG` result for IT is 1500, so it is clear that the NULL value is not considered in the average calculation. (Because (1000 + 2000 ) / 2 = 1500.)

The conclusion is that averages are only calculated using non-NULL values. The general rule is NULL values are not considered in any aggregate function like `SUM()`, `AVG()`, `COUNT()`, `MAX()` and `MIN()`. The exception to this rule is the `COUNT(*)` function, which counts all rows, even those rows with NULL values. Here’s an example:

```SELECT COUNT(*) as “Total Records”
FROM employee
```
###### RESULTS
```Total Records
6
```

As we can see, `COUNT(*)` returns the total number of records in the “`employee`” table, even those records with NULL values in some or all fields.

## The ORDER BY Clause and NULL Values

The SQL standard does not explicitly define a default sort order for NULLs. Some databases like Oracle and PostgreSQL use a `NULLS FIRST` or `NULLS LAST` specification to indicate the place of the NULL value. The following example shows this feature:

```SELECT department, COUNT(*) as “Num of employees” , AVG(salary) as “Avg Dept. Salary”
FROM employee
GROUP BY department
ORDER BY department NULLS LAST
```
###### RESULTS
```department			Num of employees		Avg Dept. Salary
FINANCES			1				2000
IT				3				1500
2				2500

```

### Boolean Expressions Involving NULLS

We normally see TRUE or FALSE as a Boolean result, but it is usual for expressions or conditions that include a NULL to return an UNKNOWN result. The UNKNOWN result is covered in detail in another article we previously published here on our blog.

### TRY IT YOURSELF!

There are many relational database features and functions that produce a specific behavior whenever a NULL value is involved. You can learn more using Vertabelo Academy’s Standard SQL Functions course. Try it out for free!

### Maria Alcaraz

Former Freelance Database Developer, Mother of 4 children