Getting the Hang of the GROUP BY Clause

Grooup By clause

GROUP BY is an important part of the SQL SELECT statement. But new SQL coders can run into some problems when this clause is used incorrectly. Here’s how to avoid those issues.

You’re learning SQL. You know how to SELECT some data FROM a table and how to filter the data with a WHERE clause. You can process the data using aggregate functions (MIN, MAX, SUM, AVG, and others). But when you’re dealing with a lot of data, you may need to narrow it down even more. That’s where the GROUP BY clause comes in – it lets you organize information based on parameters you set.

In this article, we’ll explain how you can use GROUP BY. We’ll also talk about a couple of possible problems and how to avoid or fix them.

sql cheat sheet

Using GROUP BY

To understand how to use GROUP BY, we first need to define a table to practice on. How about one describing people?

| ID | Name     | Gender | Height | Weight | Eye_color |
--------------------------------------------------------
|  1 | Mark     | Male   |    180 |     78 | Blue      |
|  2 | Susan    | Female |    172 |     59 | Brown     |
|  3 | Thomas   | Male   |    179 |     92 | Brown     |
|  4 | Katarina | Female |    164 |     53 | Green     |
|  5 | Mindy    | Female |    170 |     58 | Blue      |
--------------------------------------------------------

So we have our practice table. Now we get into the statistics. For example, what is the average height of all our people? To find out, we type:

SELECT AVG(Height)
FROM People
+-------------+
| AVG(Height) |
+-------------+
|    173.0000 |
+-------------+

Now suppose we want to know the average height by gender. That appears to be quite easy; we simply add a WHERE clause. So we type:

SELECT AVG(Height)
FROM People
WHERE Gender = ‘Male’

But what if we introduce other genders into our table? In that case, we’d have to write additional queries and manually collect the data we need. It’s easier to GROUP our data BY Gender and then compute the average height for each group, as shown below.

GROUP BY Gender

SELECT Gender, AVG(Height)
FROM People
GROUP BY Gender
+--------+-------------+
| Gender | AVG(Height) |
+--------+-------------+
| Female |    168.6667 |
| Male   |    179.5000 |
+--------+-------------+

Grouping seems easy, right? All you need to do is add the GROUP BY clause to your SQL statement. Suppose, though, we need to have two parameters in our search. In that case, we’d need to group by two columns. Let’s say we want to know how many men and women have blue, brown, or green eyes. We’d type:

SELECT Gender, Eye_color, COUNT(*)
FROM People
GROUP BY Gender, Eye_color
+--------+-----------+----------+
| Gender | Eye_color | COUNT(*) |
+--------+-----------+----------+
| Female | Blue      |        1 |
| Female | Brown     |        1 |
| Female | Green     |        1 |
| Male   | Blue      |        1 |
| Male   | Brown     |        1 |
+--------+-----------+----------+

This is just an introduction to GROUP BY. You can use it in many ways. Why not try experimenting with different aggregate functions (like AVG and COUNT) to get a better grasp of GROUP BY?

standard sql functions

Common GROUP BY Errors

Although GROUP BY seems fairly easy to use, it’s common for SQL newbies to find themselves confronted by some confusing error messages. Here are some we come across quite often:

1. Selecting Multiple Values

The most important thing to remember when using GROUP BY is that whatever you’re going to SELECT must be a single value. That’s why we need to use an aggregate function: it takes multiple values and produces a single one to return.

To make it a bit clearer, let’s take a look at our groups:

+--------+--------+
| Gender | Height |
+--------+--------+
| Male   |    180 |
|        |    179 |
| Female |    172 |
|        |    164 |
|        |    170 |
+--------+--------+

When we ask for Height grouped by Gender, we want to get a single value. But here Male has two values for height and Female has three. Which should we choose?

SELECT Gender, Height
FROM People
GROUP BY Gender;
(MYSQL) ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains 
nonaggregated column 'vertabelo.People.Height' which is not functionally dependent on columns 
in GROUP BY clause;

That’s why we need to use an aggregate function (such as AVG, which we used earlier) to get a single value.

2. Using WHERE to Filter Results

Suppose we only want to see eye-color results for groups of people taller than 170 cm. If we try to put that into the WHERE clause, as shown below:

SELECT Gender, Eye_color, COUNT(*)
FROM People
WHERE AVG(Height) > 170
GROUP BY Gender, Eye_color

We get the error message shown below:

(MYSQL) ERROR 1111 (HY000): Invalid use of group function

That’s because the database is grouping our records after filtering them. We want them to filter the result of the GROUP BY statement. To do this, we use another clause called HAVING. It comes after GROUP BY and works like WHERE. The difference is you can use aggregate functions in it. To get the desired result set, we’d write this code:

SELECT Gender, Eye_color, COUNT(*)
FROM People
GROUP BY Gender, Eye_color
HAVING AVG(Height) > 170

Remember that in the database, an SQL SELECT statement always executes in this order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

We take a data source, filter the records, group them, filter the groups, select the columns we want and then sort them.

3. Multiple Entity Errors

This is a little more advanced. When we’ve JOINed several tables, it’s quite possible that some entities will have the same names. So we’ll often want results grouped by entity ID rather than the entity’s name.

For example, suppose you are looking at city data for Warsaw. There’s a Warsaw that’s the capital of Poland and a Warsaw that’s in the state of Indiana, USA.

So let’s say we have a table defined like this:

Table City:
---------------------
| ID | Name | Award | 
---------------------

This table describes cities that were given one or more awards. A city is identified by its ID and it can have many awards.

If we wanted to see the number of awards received by a city and have this information grouped by city name, perhaps we would use this query:

SELECT City.Name, COUNT(Award)
FROM City
GROUP BY City.Name

The problem is that by using City.Name, the awards for Warsaw, Poland and Warsaw, Indiana would be added together. After all, both are Warsaw! However they are different places, and as such have different City.ID values. When you think about a city as a database entity, it is identified by its ID rather than its attributes (like Name). If we group results by ID, we’ll get the correct information. Since we still want to display the name, we’ll use something like this:

SELECT City.Name, COUNT(Award)
FROM City
GROUP BY City.ID

The result will have separate entries for the different Warsaws with the values we want. (Normally, there would be another column, like “country” or “state”, to differentiate between those two cities. But for the sake of example, let’s say there isn’t.)

4. Using Values That Are “Not Aggregated”

In the above example, we’ve selected the City.Name attribute and grouped results by the City.ID attribute. In our table, every record with the same ID also has the same city name. Some databases will have no problem with this and will return the expected results, but others will give you an error saying that City.Name is not in the GROUP BY clause and that it is not aggregated. I’ve reproduced the MySQL error message below:

(MYSQL) ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains 
nonaggregated column 'vertabelo.City.Name' which is not functionally dependent on columns 
in GROUP BY clause;

But it is a single value! How do we fix this error? Well, since one City.ID means exactly one City.Name, we can put both of them in the GROUP BY clause:

SELECT City.Name, COUNT(*)
FROM City
JOIN Person ON (Person.CityID = City.ID)
GROUP BY City.ID, City.Name

This should fix the problem.

Practice Makes Perfect

If you’re still not getting how or when to use the GROUP BY clause or if you want to practice using it, check out our SQL Queries course. It covers all the basics as well as the more advanced stuff, so you’ll have lots of opportunities to perfect your SQL skills.

Junior Technical Writer @ Vertabelo

GET ACCESS TO EXPERT SQL CONTENT!