FROM: A Clause with Plenty of Possibilities
Think that FROM only specifies tables in an SQL statement? Think again …
When I first began to work with SQL, I saw the FROM
clause as the simplest part of any query. You’d put only one table there, or may be two or more tables in the case of a join. That was all I thought about the FROM
clause in those days.
As I began to learn more about SQL, I discovered that there are a lot of techniques that can be used in a FROM
. Often, these techniques are the key that unlocks the solution of a complex query.
In this post, we’re going to consider everything that goes into a FROM
clause: one table, multiple tables (in a join), a view, a materialized view, a temporary table, a subquery, or a blend of any of the above.
We will use the following tables, team
and game
, to illustrate our examples.
team
table:
team_id | name | founded |
---|---|---|
1 | Lions | 01-01-1900 |
2 | Penguins | 04-01-1989 |
3 | Blackbirds | 30-12-1950 |
game
table:
team_1 | team_2 | match_day | stadium | points_1 | points_2 |
---|---|---|---|---|---|
1 | 2 | 05-03-2017 | PenguinHouse | 2 | 1 |
1 | 3 | 12-03-2017 | SuperArena | 1 | 0 |
2 | 3 | 19-03-2017 | SuperArena | 1 | 1 |
The Simplest FROM: Just One Table
The simplest way to use the FROM
clause is to put just one table in it. Suppose we want to obtain all the team names from the above table. We can use the following simple query:
SELECT name FROM team
RESULTS
Lions Penguins Blackbirds
Multiple Tables: A Join
We can also have more than one table in the FROM
clause. When we need to read data from two or more tables, we join these tables by listing them in the FROM
:
FROM team, game
A join is a really important operation in SQL and it is one of the building blocks of the relational database model. Please note that when join tables we also need to include in a WHERE
or ON
clause to associate records from one table with their corresponding records in another table. In our example, we use the WHERE
clause to join records from the team
and game
tables. This specifies a join based on the values in the team_id
and team_1
columns:
WHERE team.team_id = game.team_1
If we put two tables in the FROM
clause and we forget the join condition, we have a problem. The query will return a result set called a Cartesian product, which includes all the possible combinations of all matching records in both tables. Cartesian products should almost always be avoided, although under some specific situations they can be necessary.
Here is an example of a proper join. Notice the WHERE
clause:
SELECT * FROM team, game WHERE team.team_id = game.team_1
RESULTS
team_id name founded team_1 team_2 match_day stadium points_1 points_2 1 Lions 1900-01-01 1 2 2017-03-05 PenguinHouse 2 1 1 Lions 1900-01-01 1 3 2017-03-12 SuperArena 1 0 1 Lions 1900-01-01 1 2 2017-03-05 PenguinHouse 2 1
You can also use an alternative syntax to join tables. It looks different, but it returns the same results. This alternative syntax uses two reserved words: JOIN
and ON
. Here is the same query presented in the alternative syntax:
SELECT * FROM team JOIN game ON team.team_id = game.team_1
Both syntaxes are equivalent in terms of results. However, using the JOIN
clause gives you some extra options, like using an OUTER
clause to manage rows that have no corresponding records.
The Cartesian Product
Let’s go back to the Cartesian product. When would we want this in our result set? Suppose we are planning a tournament where every team must play against every other team. We can obtain a list of all possible matches (a Cartesian product) with the following SQL:
SELECT t1.name, t2.name FROM team t1, team t2
RESULTS
Lions Lions Lions Penguins Lions Blackbirds Blackbirds Lions Blackbirds Penguins Blackbirds Blackbirds Penguins Lions Penguins Penguins Penguins Blackbirds
This example is a pure Cartesian product. However, there are some pairs that we don’t want in the result – e.g. having a team play against itself (Lions vs. Lions) or having a redundant match (Lions vs. Penguins and Penguins vs. Lions). To exclude these pairs, we add a condition to the WHERE
clause:
SELECT t1.name, t2.name FROM team t1, team t2 WHERE t1.name > t2.name
RESULTS
Lions Blackbirds Penguins Blackbirds Penguins Lions
The condition in the WHERE
clause discards pairs of teams playing against itself or for duplicate matches to appear in our list.
A Self Join
A self join is a join where both tables are the same, mainly when a table have a foreign key pointing to the same table. Self joins can also be made by joining one table to itself. For example, you may use a self join to compare two columns in the same table against each other.
Suppose we have a table called employee
with columns called employee_id
and manager_id
. The following SQL uses a self join to compare these columns:
SELECT E1.name as employee, E2.name as boss FROM employee E1, employee E2 WHERE E1.manager_id = E2.employee_id
E1
and E2
are aliases for the employee
table. SQL does not allow you to use the same table name more than once per statement, so you must use table aliases to get around that rule. In this example, SQL is treating the employee
table as two different tables. The aliases differentiate between the “two tables”.
Using a View in the FROM Clause
Another option is to use a named view in the FROM
clause. A view appears to be a table but is not. A view is part of a table – it is actually a subset of records displayed by a SELECT
– but it doesn’t store records. As far as the FROM
clause is concerned, a view is treated like a table.
Let’s see an example. Suppose we need a view that shows any teams that have never lost a game. We will create a view titled invictus
to select the appropriate teams. The code is:
CREATE VIEW invictus AS SELECT name FROM team T WHERE NOT EXISTS ( SELECT * FROM game G WHERE (G.team_1 = T.team_id AND G.points_1 < G.points_2) OR (G.team_2 = T.team_id AND G.points_1 > G.points_2) )
Now we can use the view to obtain the unbeaten teams:
SELECT * FROM invictus
RESULTS
Lions
Using a Materialized View in the FROM Clause
A materialized view is similar to a view; it is also created using SELECT
statements on base tables. However, materialized views can store records. These records can be obtained or calculated from regular tables or even other views, so materialized views introduce a kind of redundancy. The records in materialized views are synchronized with the relevant base tables using the REFRESH
statement.
We can use a materialized view in the FROM
clause, just as we did in the previous section. Let’s create a materialized view called mvw_results
using the columns name
and match_day
. We will create a new column, result
, which will store a “WON”, “TIE” or “LOST” value for every game played by the team.
CREATE MATERIALIZED VIEW mvw_results AS -- won matches SELECT team_1 as team_id, match_day, ‘WON’ FROM game WHERE points_1 > points_2 UNION SELECT team_2 as team_id, match_day, ‘WON’ FROM game WHERE points_2 > points_2 -- tied matches UNION SELECT team_1 as team_id, match_day, ‘TIE’ FROM game WHERE points_1 = points_2 UNION SELECT team_2 as team_id, match_day, ‘TIE’ FROM game WHERE points_2 = points_2 -- tied matches UNION SELECT team_2 as team_id, match_day, ‘LOST’ FROM game WHERE points_1 > points_2 UNION SELECT team_1 as team_id, match_day, ‘LOST’ FROM game WHERE points_2 > points_2
Using a Subquery in the FROM Clause
We can also enclose a SELECT
in parenthesis, assign a name to it (thus making it a named subquery), and put it in the FROM
clause. The database will treat it as a regular table.
Let’s see an example. Suppose we want to rank every team in the tournament. To do this, we need to calculate how many points each team has. Two points are awarded for each win and one for each tie.
First, we will create two subqueries: one called q_won
that uses the columns name of team
and quantity of won matches
, and another called q_tied
with the same columns. Let’s use the previous materialized view when we create the two subqueries in the FROM
clause:
SELECT T.name, (COALESCE(sq_won.quantity_won,0) * 2 + COALESCE(sq_tie.quantity_tied,0) * 1 ) points FROM (team T LEFT OUTER JOIN (SELECT team_id , count(*) quantity_won FROM mvw_results WHERE result ='WON' GROUP BY team_id) AS sq_won ON T.team_id = sq_won.team_id ) LEFT OUTER JOIN ( SELECT team_id , count(*) quantity_tied FROM mvw_results WHERE result ='TIE' GROUP BY team_id) AS sq_tie ON T.team_id = sq_tie.team_id ORDER BY points DESC
Using A Temporary Table in the FROM Clause
The last option we will review is the use of a temporary table in the FROM
clause. A temporary table is created during the current Postgres session; it belongs to the session, and other sessions cannot access it. Moreover, other sessions can have a temporary table with the same name.
Temporary tables are destroyed when the session ends or when a DROP TABLE
statement is executed. In the following example, we will do the same query as we did with the subquery example. The difference is that we will use two temporary tables instead of the subqueries. Here is the code:
CREATE TEMPORARY TABLE sq_won AS SELECT team_id , count(*) quantity_won FROM mvw_results WHERE result ='WON' GROUP BY team_id; CREATE TEMPORARY TABLE sq_tie AS SELECT team_id , count(*) quantity_tied FROM mvw_results WHERE result ='TIE' GROUP BY team_id; SELECT T.name, (coalesce(sq_won.quantity_won,0) * 2 + coalesce(sq_tie.quantity_tied,0) * 1 ) points FROM (team T LEFT OUTER JOIN sq_won ON T.team_id = sq_won.team_id ) LEFT OUTER JOIN sq_tie ON T.team_id = sq_tie.team_id ORDER BY points DESC;
So, we did the same query using subqueries and using temporary tables. Is there any difference between these approaches? Yes. We can create indexes on temporary tables; if we need to improve performance, the temporary table approach enables that.
Try It Yourself
In this article, we saw the many things we can use in the FROM
clause: one table, joined tables, views, materialized views, temporary tables, and subqueries. SQL is flexible and allows us to combine all these elements in a single FROM
.
SQL offers several ways to write the same query. As a challenge to the reader, I’m going to mention that it is possible to obtain a materialized view using a CASE
statement instead of a UNION
. How would you do this?
For more SQL information, check out Vertabelo SQL Academy. It’s free to try and covers lots of good stuff!