Learning SQL JOINs Using Real-Life Situations

sql join in real-life situations

The JOIN statement lets you work with data stored in multiple tables. This article is a practical introduction to the SQL JOIN.

Imagine if you could only work with one database table at a time. Fortunately, this isn’t anything we have to worry about. Once you learn the JOIN statement, you can start linking data together. This article will give you examples that illustrate how we use JOINs, how each type of JOIN works, and when to use each type. Plus, I’ll share some tips that will help you avoid making common mistakes when using JOINs.

What is the SQL JOIN Clause?

The JOIN clause allows us to combine the columns from two or more tables based on shared column values. Look at the following example. Say we have a students table that holds student names, their respective usernames, and an ID number. We also have a “comments” table that stores any comments students have posted in a forum. Here are the two tables.

Let’s add some test data:

INSERT INTO `students` VALUES
(1,'Jhon','jj2005'),(2,'Albert','salbert'),(3,'Mathew','powermath'),
(4,'Lisa','lisabbc'),(5,'Sandy','imsandyw'),(6,'Tamara','tamy21');
id name forum_username
1 Jhon jj2005
2 Albert salbert
3 Mathew powermath
4 Lisa lisabbc
5 Sandy imsandyw
6 Tamara tamy21
INSERT INTO 'comments' VALUES 
(1,'jj2005','Awesome!'),(2,'jj2005','This is great :)'),
(3,'powermath','Hmmm...'),(4,'imsandyw','Let\'s wait a moment'),
(5,'lisabbc','Sure thing'),(6,'lisabbc','wow!'),
(7,'lisabbc','lol 🙂 🙂 :)');
id forum_username comment
1 jj2005 Awesome!
2 jj2005 This is great 🙂
3 powermath Hmmm…
4 imsandyw Let’s wait a moment
5 lisabbc Sure thing
6 lisabbc wow!
7 lisabbc lol 🙂 🙂 🙂

As you can see, both tables have the forum_username column in common. Therefore, this column can be used to in a JOIN statement to relate the two tables together.

For example, if we wanted to know the actual student name for each of the comments in the forum, we would write this JOIN query:

SELECT 
    students.name, comments.forum_username, comments.comment
FROM
   students 
       INNER JOIN
   comments ON students.forum_username = comments.forum_username
ORDER BY students.name ASC;

The results would look like this:

name forum_username comment
Jhon jj2005 Awesome!
Jhon jj2005 This is great 🙂
Lisa lisabbc lol 🙂 🙂 🙂
Lisa lisabbc wow!
Lisa lisabbc Sure thing
Mathew powermath Hmmm…
Sandy imsandyw Let’s wait a moment

Note: I have purposely used forum_username to illustrate the JOIN concept, but in practice you would use the primary key (in this case, the id column from the students table) to relate tables.

Getting to Know the JOIN Types

There are several types of JOINs. Let’s review them quickly:

INNER JOIN: This JOIN returns records that have a match in both tables based on the join predicate (which comes after the ON keyword). This is the same JOIN as we used in the preceding example. The INNER keyword is optional.

LEFT [OUTER] JOIN: This will return all records from the left table (i.e. the table you list first in the JOIN) and only matching records from the right (i.e. second) table. The OUTER keyword is optional.

Following the student forum example, this would be:

SELECT 
    students.name, comments.forum_username, comments.comment
FROM
   students 
        LEFT JOIN
   comments ON students.forum_username = comments.forum_username
ORDER BY students.name ASC;

This LEFT JOIN would return a record for all students – including Tamara and Albert, who don’t have any comments and wouldn’t be listed in the results of an INNER JOIN. Notice too that in this LEFT JOIN example, Tamara’s and Albert’s comment column results will be NULL.

RIGHT [OUTER] JOIN: This is the inverse of the LEFT JOIN; it returns all records from the right (second) table and only those that have a match from the left (first) table.

A query similar to the previous example would look like this:

SELECT 
    students.name, comments.forum_username, comments.comment
FROM
   students 
        RIGHT JOIN
   comments ON students.forum_username = comments.forum_username
ORDER BY students.name ASC;

FULL [OUTER] JOIN: This is essentially the combination of a LEFT JOIN and a RIGHT JOIN. The result set will include all rows from both tables, populating the columns with table values when possible or with NULLs when there is no match in the counterpart table. This is not a JOIN you’ll use very often in real life. Note: MySQL lacks this statement, but a similar result can be achieved using the UNION of LEFT JOIN and RIGHT JOIN.

Notice that in the following FULL JOIN example we are supplying the ON keyword as we do in LEFT or RIGHT JOINs:

SELECT 
    *
FROM
   students 
        FULL OUTER JOIN
   comments ON students.forum_username = comments.forum_username
ORDER BY students.name ASC;

CROSS JOIN: This is another join type that you won’t use all that often – in this case, because it retrieves the Cartesian product of both tables. Basically, this gives you the combination of all records from both tables. CROSS JOIN does not apply a predicate (there’s no ON keyword), but it’s still possible to filter rows using WHERE. Doing this could make the result set equivalent to an INNER JOIN. In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents, i.e. they can replace each other.

Below, there is no ON clause to filter the results. All possible combinations from both tables will be shown in the result set:

SELECT 
    *
FROM
   students 
    CROSS JOIN
   comments 
ORDER BY students.name ASC;

JOINs in the Real World

The number of scenarios that require a JOIN is endless, but some scenarios do appear more often. Instead of going through the typical table1/table2 example, I’d rather present you with some realistic cases. We can use these to get some practical tips.

The Grandfather, Father, and Son Relationship

A common scenario deals with data that follows this kind of relationship. For instance, a user is located in a city that belongs to a state. The tables (with more or fewer columns) look something like this:

Let’s add some data:

INSERT INTO `states` VALUES (3,'California'),(2,'Florida'),(1,'Nevada');

INSERT INTO `cities` VALUES (1,1,'Miami'),(2,1,'Orlando'),
(3,2,'Las Vegas'),(4,2,'Coyote Springs');

INSERT INTO `users` VALUES (1,1,'Jhon','Doe'),(2,1,'Albert','Thomson'),
(3,2,'Robert','Ford'),(4,3,'Samantha','Simpson');

To obtain the full list of users in a given city and state we will need to join the son table (User) with its father (City) and grandfather (State).

SELECT 
    User.first_name, user.last_name, City.cityname, State.statename
FROM
   users User
        INNER JOIN
   cities City ON User.city_id = City.id
        INNER JOIN
   states State ON City.state_id = State.id;

Already we have a couple of useful tips:

  • The columns used to link tables should be indexed for greater performance.
  • When the columns that link the tables (like the former example) are pointing to the primary key of the related table, then we are talking about foreign keys. In this case, it is better to include this relationship as part of your table definition; it will increase performance. In MySQL, you can create a users/city foreign key like this:
    ALTER TABLE `users` 
    ADD INDEX `fk_city_idx` (`city_id` ASC);
    ALTER TABLE `users` 
    ADD CONSTRAINT `fk_city`
      FOREIGN KEY (`city_id`)
      REFERENCES `cities` (`id`)
      ON DELETE CASCADE
      ON UPDATE CASCADE;
    

    This gives you the additional benefit of the integrity check that will be performed by the engine when data in these tables is updated or deleted.

Suppose we want to find all the users in one state. You add a filtering condition to the query, as shown below:

SELECT 
    User.first_name, user.last_name, City.cityname, State.statename
FROM
   users User
        INNER JOIN
   cities City ON User.city_id = City.id
        INNER JOIN
   states State ON City.state_id = State.id
        AND State.statename = 'Nevada';

Or you could even use an implicit join (shown in bold text), like this:

SELECT 
    User.first_name, user.last_name, City.cityname, State.statename
FROM
   users User,
   cities City,
   states State
WHERE
    User.city_id = City.id
       AND City.state_id = State.id
       AND State.statename = 'Nevada';

But I’d suggest you explicitly write the JOIN and keep the joining criteria and the filtering conditions separate:

SELECT 
    User.first_name,    user.last_name,    City.cityname,    State.statename
FROM
   users User
        INNER JOIN
   cities City ON User.city_id = City.id
        INNER JOIN
   states State ON City.state_id = State.id
WHERE
    State.statename = 'Nevada';

A couple of important concepts:

  • We can see how the son (user) table is filtered based on the conditions used on the grandfather (state) table. In other words, the users were based on a given state. Similarly, we could have filtered results based on the father (city) table and gotten a list of users based on a given city.
  • A general rule of thumb is that the JOIN predicates (the conditions after the ON keyword) should be used for the joining relationship only. Leave the rest of the filtering conditions inside the WHERE section. This will simplify query readability and future code maintenance.

Conversely, we could return states based on certain criteria. For instance, we could retrieve states that have at least one user belonging to that state:

SELECT 
    DISTINCT State.statename
FROM
   states State
        INNER JOIN
   cities City ON City.state_id = State.id
        INNER JOIN
   users User ON User.city_id = City.id

What do we learn here?

  • The INNER JOIN removes any “unused” or unmatched records (records without a match on both sides of the JOIN).
  • DISTINCT filters out duplicate records. Since there could be several users for a state, if we didn’t use DISTINCT we would get as many repeated states as users belonging to it. (The same filtering effect could also be achieved using GROUP BY.)
  • Further filtering could be accomplished by adding WHERE conditions.

The Many-to-Many Relationship

Another typical scenario for JOINs is when records relate to each other in a “many-to-many” or N-to-N manner. Say you have a system where you create badges that are awarded to users. In this case, a user has badges and at the same time a badge has users. These relationships will need a third table that will connect the primary keys from users and badges. It would look something like this:

Let’s add some sample data:

INSERT INTO `badges` VALUES 
(1,'gold heart',100),(2,'silver heart',50),(3,'metal heart',10),(4,'star',5),(5,'wood',2),(6,'dust',1);
INSERT INTO `users` VALUES (1,'Robert','Williams'),(2,'Anthony','McPeters'),(3,'Tania','Krugger'),(4,'JJ','Richards'),(5,'Katy','Thomas');
INSERT INTO `badges_users` VALUES
(1,1,1),(2,4,1),(3,4,2),(4,4,2),(5,4,2),(6,3,3),(7,3,3),(8,2,4);

How can a JOIN fetch us all the users with their respective badges?

SELECT 
    User.first_name,
    User.last_name,
    BU.user_id,
    BU.badge_id,
    Badge.badge_name,
    Badge.badge_points
FROM
   users User
        LEFT JOIN
   badges_users BU ON User.id = BU.user_id
        LEFT JOIN
   badges Badge ON BU.badge_id = Badge.id
ORDER BY Badge.badge_points DESC

Here are a few things to remember about this type of query:

  • We’ve used LEFT JOIN here on purpose because it will show users that have no badge at all. If we had used an INNER JOIN or an implicit inner join (by setting the ID’s equalities in a WHERE), then users who have no badges would not be included in the results. If you want to exclude these users, you should use an INNER JOIN.
  • Also, using a LEFT JOIN means that unused badges will not be listed; we are focusing on the users and not on the badges.
  • Finally, remember to properly index the intermediate table (badges_users). All its foreign keys should be defined.

Now let’s grab all the badges that have at least one user. Expressed in another way, these are badges that have been used at least once. The query would be:

SELECT 
    DISTINCT Badge.badge_name
FROM
   badges Badge
        LEFT JOIN
   badges_users BadgeUser ON Badge.id = BadgeUser.badge_id

And if we wanted to get all unused badges, the query becomes:

SELECT 
    Badge.badge_name, Badge.badge_points
FROM
   badges Badge
        LEFT JOIN
   badges_users BadgeUser ON Badge.id = BadgeUser.badge_id
WHERE
    BadgeUser.badge_id IS NULL

Notice that:

  • We don’t always need to join all three tables. By only joining with the intermediate table (badges_users), which holds the references to the user and badge tables, we can still successfully run our queries.
  • The intermediate table could also be used to save additional information. For instance, it could store the timestamp value when a user was given a certain badge.

Try It Yourself

You will likely daily face these types of situations when dealing with tables that hold related data. I strongly recommend you examine the above examples and test them with the actual data. This will give you a clearer understanding of the concepts involved.

If you have any comments or ideas about practical JOIN scenarios, please feel free to share it so we can all keep learning!

Engineer @ Axones

GET ACCESS TO EXPERT SQL CONTENT!