The JOIN statement lets you work with data stored in multiple tables. This article is a practical introduction to the SQL JOIN.
Imagine if you could only work with one database table at a time. Fortunately, this isn’t anything we have to worry about. Once you learn the
JOIN statement, you can start linking data together. This article will give you examples that illustrate how we use
JOINs, how each type of
JOIN works, and when to use each type. Plus, I’ll share some tips that will help you avoid making common mistakes when using
What is the SQL JOIN Clause?
JOIN clause allows us to combine the columns from two or more tables based on shared column values. Look at the following example. Say we have a
students table that holds student names, their respective usernames, and an ID number. We also have a “comments” table that stores any
comments students have posted in a forum. Here are the two tables.
Let’s add some test data:
INSERT INTO `students` VALUES (1,'Jhon','jj2005'),(2,'Albert','salbert'),(3,'Mathew','powermath'), (4,'Lisa','lisabbc'),(5,'Sandy','imsandyw'),(6,'Tamara','tamy21');
INSERT INTO 'comments' VALUES (1,'jj2005','Awesome!'),(2,'jj2005','This is great :)'), (3,'powermath','Hmmm...'),(4,'imsandyw','Let\'s wait a moment'), (5,'lisabbc','Sure thing'),(6,'lisabbc','wow!'), (7,'lisabbc','lol 🙂 🙂 :)');
|2||jj2005||This is great 🙂|
|4||imsandyw||Let’s wait a moment|
|7||lisabbc||lol 🙂 🙂 🙂|
As you can see, both tables have the
forum_username column in common. Therefore, this column can be used to in a
JOIN statement to relate the two tables together.
For example, if we wanted to know the actual student name for each of the comments in the forum, we would write this JOIN query:
SELECT students.name, comments.forum_username, comments.comment FROM students INNER JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC;
The results would look like this:
|Jhon||jj2005||This is great 🙂|
|Lisa||lisabbc||lol 🙂 🙂 🙂|
|Sandy||imsandyw||Let’s wait a moment|
Note: I have purposely used
forum_username to illustrate the
JOIN concept, but in practice you would use the primary key (in this case, the
id column from the
students table) to relate tables.
Getting to Know the JOIN Types
There are several types of
JOINs. Let’s review them quickly:
INNER JOIN: This
JOIN returns records that have a match in both tables based on the join predicate (which comes after the
ON keyword). This is the same
JOIN as we used in the preceding example. The
INNER keyword is optional.
LEFT [OUTER] JOIN: This will return all records from the left table (i.e. the table you list first in the
JOIN) and only matching records from the right (i.e. second) table. The
OUTER keyword is optional.
Following the student forum example, this would be:
SELECT students.name, comments.forum_username, comments.comment FROM students LEFT JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC;
LEFT JOIN would return a record for all students – including Tamara and Albert, who don’t have any comments and wouldn’t be listed in the results of an
INNER JOIN. Notice too that in this
LEFT JOIN example, Tamara’s and Albert’s
comment column results will be NULL.
RIGHT [OUTER] JOIN: This is the inverse of the
LEFT JOIN; it returns all records from the right (second) table and only those that have a match from the left (first) table.
A query similar to the previous example would look like this:
SELECT students.name, comments.forum_username, comments.comment FROM students RIGHT JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC;
FULL [OUTER] JOIN: This is essentially the combination of a
LEFT JOIN and a
RIGHT JOIN. The result set will include all rows from both tables, populating the columns with table values when possible or with NULLs when there is no match in the counterpart table. This is not a
JOIN you’ll use very often in real life. Note: MySQL lacks this statement, but a similar result can be achieved using the
LEFT JOIN and
Notice that in the following
FULL JOIN example we are supplying the ON keyword as we do in
SELECT * FROM students FULL OUTER JOIN comments ON students.forum_username = comments.forum_username ORDER BY students.name ASC;
CROSS JOIN: This is another join type that you won’t use all that often – in this case, because it retrieves the Cartesian product of both tables. Basically, this gives you the combination of all records from both tables.
CROSS JOIN does not apply a predicate (there’s no
ON keyword), but it’s still possible to filter rows using
WHERE. Doing this could make the result set equivalent to an
INNER JOIN. In MySQL,
CROSS JOIN, and
INNER JOIN are syntactic equivalents, i.e. they can replace each other.
Below, there is no
ON clause to filter the results. All possible combinations from both tables will be shown in the result set:
SELECT * FROM students CROSS JOIN comments ORDER BY students.name ASC;
JOINs in the real time scenarios
The number of scenarios that require a
JOIN is endless, but some scenarios do appear more often. Instead of going through the typical table1/table2 example, I’d rather present you with some realistic cases. We can use these to get some practical tips.
The Grandfather, Father, and Son Relationship
A common scenario deals with data that follows this kind of relationship. For instance, a
user is located in a
city that belongs to a
state. The tables (with more or fewer columns) look something like this:
Let’s add some data:
INSERT INTO `states` VALUES (3,'California'),(2,'Florida'),(1,'Nevada'); INSERT INTO `cities` VALUES (1,1,'Miami'),(2,1,'Orlando'), (3,2,'Las Vegas'),(4,2,'Coyote Springs'); INSERT INTO `users` VALUES (1,1,'Jhon','Doe'),(2,1,'Albert','Thomson'), (3,2,'Robert','Ford'),(4,3,'Samantha','Simpson');
To obtain the full list of users in a given city and state we will need to join the son table (
User) with its father (
City) and grandfather (
SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id;
Already we have a couple of useful tips:
- The columns used to link tables should be indexed for greater performance.
- When the columns that link the tables (like the former example) are pointing to the primary key of the related table, then we are talking about foreign keys. In this case, it is better to include this relationship as part of your table definition; it will increase performance. In MySQL, you can create a users/city foreign key like this:
ALTER TABLE `users` ADD INDEX `fk_city_idx` (`city_id` ASC); ALTER TABLE `users` ADD CONSTRAINT `fk_city` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
This gives you the additional benefit of the integrity check that will be performed by the engine when data in these tables is updated or deleted.
Suppose we want to find all the users in one state. You add a filtering condition to the query, as shown below:
SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id AND State.statename = 'Nevada';
Or you could even use an implicit join (shown in bold text), like this:
SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User, cities City, states State WHERE User.city_id = City.id AND City.state_id = State.id AND State.statename = 'Nevada';
But I’d suggest you explicitly write the
JOIN and keep the joining criteria and the filtering conditions separate:
SELECT User.first_name, user.last_name, City.cityname, State.statename FROM users User INNER JOIN cities City ON User.city_id = City.id INNER JOIN states State ON City.state_id = State.id WHERE State.statename = 'Nevada';
A couple of important concepts:
- We can see how the son (
user) table is filtered based on the conditions used on the grandfather (
state) table. In other words, the users were based on a given state. Similarly, we could have filtered results based on the father (
city) table and gotten a list of users based on a given city.
- A general rule of thumb is that the
JOINpredicates (the conditions after the
ONkeyword) should be used for the joining relationship only. Leave the rest of the filtering conditions inside the
WHEREsection. This will simplify query readability and future code maintenance.
Conversely, we could return states based on certain criteria. For instance, we could retrieve states that have at least one user belonging to that state:
SELECT DISTINCT State.statename FROM states State INNER JOIN cities City ON City.state_id = State.id INNER JOIN users User ON User.city_id = City.id
What do we learn here?
INNER JOINremoves any “unused” or unmatched records (records without a match on both sides of the
DISTINCTfilters out duplicate records. Since there could be several users for a state, if we didn’t use
DISTINCTwe would get as many repeated states as users belonging to it. (The same filtering effect could also be achieved using
- Further filtering could be accomplished by adding
The Many-to-Many Relationship
Another typical scenario for
JOINs is when records relate to each other in a “many-to-many” or N-to-N manner. Say you have a system where you create badges that are awarded to users. In this case, a user has badges and at the same time a badge has users. These relationships will need a third table that will connect the primary keys from
badges. It would look something like this:
Let’s add some sample data:
INSERT INTO `badges` VALUES (1,'gold heart',100),(2,'silver heart',50),(3,'metal heart',10),(4,'star',5),(5,'wood',2),(6,'dust',1); INSERT INTO `users` VALUES (1,'Robert','Williams'),(2,'Anthony','McPeters'),(3,'Tania','Krugger'),(4,'JJ','Richards'),(5,'Katy','Thomas'); INSERT INTO `badges_users` VALUES (1,1,1),(2,4,1),(3,4,2),(4,4,2),(5,4,2),(6,3,3),(7,3,3),(8,2,4);
How can a
JOIN fetch us all the users with their respective badges?
SELECT User.first_name, User.last_name, BU.user_id, BU.badge_id, Badge.badge_name, Badge.badge_points FROM users User LEFT JOIN badges_users BU ON User.id = BU.user_id LEFT JOIN badges Badge ON BU.badge_id = Badge.id ORDER BY Badge.badge_points DESC
Here are a few things to remember about this type of query:
- We’ve used
LEFT JOINhere on purpose because it will show users that have no badge at all. If we had used an
INNER JOINor an implicit inner join (by setting the ID’s equalities in a
WHERE), then users who have no badges would not be included in the results. If you want to exclude these users, you should use an
- Also, using a
LEFT JOINmeans that unused badges will not be listed; we are focusing on the users and not on the badges.
- Finally, remember to properly index the intermediate table (
badges_users). All its foreign keys should be defined.
Now let’s grab all the badges that have at least one user. Expressed in another way, these are badges that have been used at least once. The query would be:
SELECT DISTINCT Badge.badge_name FROM badges Badge LEFT JOIN badges_users BadgeUser ON Badge.id = BadgeUser.badge_id
And if we wanted to get all unused badges, the query becomes:
SELECT Badge.badge_name, Badge.badge_points FROM badges Badge LEFT JOIN badges_users BadgeUser ON Badge.id = BadgeUser.badge_id WHERE BadgeUser.badge_id IS NULL
- We don’t always need to join all three tables. By only joining with the intermediate table (
badges_users), which holds the references to the
badgetables, we can still successfully run our queries.
- The intermediate table could also be used to save additional information. For instance, it could store the timestamp value when a user was given a certain badge.
Try It Yourself
You will likely daily face these types of situations when dealing with tables that hold related data. I strongly recommend you examine the above examples and test them with the actual data. This will give you a clearer understanding of the concepts involved.
If you have any comments or ideas about practical JOIN scenarios, please feel free to share it so we can all keep learning!