Happy Thanksgiving! Using SQL to Prepare a Traditional Menu
Thanksgiving is a holiday many Americans celebrate with their families and friends, seated around a table loaded with a hearty feast. The most prominent of all dishes is, of course, the turkey. However, at least a few other specialties are also cooked for Thanksgiving. Do you find it hard to choose what to prepare? If so, we’ve got a solution for you: a special SQL query that will help you pick the best dishes.
Gathering Our Thanksgiving Recipes
Before we move on to write our SQL query, we should quickly store any Thanksgiving recipes we’ve found in a database table. Take a look at the picture below representing a sample set of dishes. By the end of this article, we will have picked three of them to place on our dinner table.
The food
table below contains a list of potential Thanksgiving dishes:
id | name |
---|---|
2 | cranberry sauce |
3 | cornbread stuffing |
4 | mashed potatoes |
5 | mushroom gravy |
6 | sweet potato pie |
7 | marshmallows |
8 | green bean casserole |
9 | pumpkin pie |
10 | mincemeat pie |
As you can tell, the table stores the identifiers and names of the various dishes we nominated as candidates. Since our data are already assembled, let’s go ahead and begin writing our SQL query.
SQL to the Rescue: Choosing Thanksgiving Meals with Queries
It’s quite a difficult task to choose three dishes from the above menu (especially on an empty stomach). To make our lives easier, we’ll use an SQL query that will help us choose three random options that we’ll cook on Thanksgiving Day.
Choosing random IDs
We’ll start with a simple query that we will gradually expand. Take a look at the following line of code:
SELECT random() AS random_value;
Here’s the result of running that query:
random_value |
---|
0.94023854797706 |
The query does only one thing—it calls the random function, which is made available by the PostgreSQL database server. We store this value in a table with a single column that is renamed to random_value
. Other servers may use different names for the random function, but they will all return similar results. The table below presents a summary of the most important database systems and their random number generator names:
database | name of function |
---|---|
Oracle | DBMS_RANDOM.RANDOM |
SQL Server | RAND |
PostgreSQL | RANDOM |
MySQL | RAND |
IBM DB2 | RAND |
More specifically, the function random returns a floating-point number that is greater than or equal to 0 and less than 1. In this case, the query returned the number 0.94023854797706.
The identifiers in the food table are consecutive numbers ranging from 2 to 10, inclusive. However, the value returned by our above query is less than 1. This is a problem. If we’d like to obtain random numbers from a different range, we need to adjust the value returned by the random function.
Take a look at the updated query:
SELECT random() * (10 – 2 + 1) + 2 AS random_value;
Here’s the result of running that query:
random_value |
---|
2.11618995852768 |
Notice that we multiplied the value returned by the function random (something in the range 0–1) by the maximum value in the set of dish identifiers (10) minus the lowest identifier value (2), plus 1. We then also added the minimum value of 2 after all these calculations.
Why does this logic work? Briefly, the expression max - min + 1
evaluates to the number 9 in our case. Since random()
returns values in the range [0, 1), where 0 is inclusive and 1 is exclusive, multiplying this range by 9 simply changes the returned value so it falls in the range [0, 9). Finally, when we add 2 to the end, we update the range to [2, 11), where 11 is once again exclusive. As a result, the value returned by the query above will fall in the range of numbers from 2 to less than 11.
The table below presents an explanation of the step-by-step calculations performed to arrive at the appropriate number range.
random() | 0<=…<1 | 0<=…<1 |
random()•10 | 0•10<=…<1•10 | 0<=…<10 |
random()•(10+1) | 0•11<=…<1•11 | 0<=…<11 |
random()•(10+1)+2 | 0•11+2<=…<1•11+2 | 2<=…<13 |
random()•(10-2+1)+2 | 0•9+2<=…<1•9+2 | 2<=…<11 |
However, the query still has an obvious problem. Dish identifiers are integers, but our query returns decimal numbers! To round our results to integers, we’ll use the function floor
, which takes a number and rounds it to the greatest integer less than or equal to its argument (in other words, traveling leftward along the number line, starting at the argument, the function returns the first integer it encounters on its path). We pass in the number we obtain from the function random as the argument to the function floor
.
Here’s our updated query, implementing the logic discussed above:
SELECT floor( random() * (10 – 2 + 1) + 2 ) AS random_id ;
Here’s the result of running that query:
random_id |
---|
9 |
Here, the query returned a random identifier of 9. Mathematically, the query is guaranteed to return values in the inclusive range of [2, 10] and to never return anything outside this range.
Returning multiple random identifiers
So far, our simple query returns a single random integer in the inclusive range from 2 to 10. However, we want to cook three dishes, not just one, so we need to edit our query once again. Take a look at the following modified version:
SELECT floor( random() * (10 – 2 + 1) + 2 ) AS random_id FROM generate_series(1,9) ;
Here’s the result of running that query:
random_id |
---|
3 |
4 |
6 |
6 |
5 |
2 |
3 |
8 |
2 |
This time around, we used the function generate_series
, which is available in PostgreSQL. The first parameter is the initial value; the second parameter is the final value. The generate_series
function returns 9 rows in this case, and the entire query returns just as many random numbers. So, in short, the query generates nine random integers ranging from 2 to 10.
Returning unique identifiers
The query still does not fulfill our requirements because it does not account for duplicates. Our task is to prepare unique dishes. In order to strictly retrieve unique identifiers, we will need to use the DISTINCT
clause. Take a look at this modified query:
SELECT DISTINCT floor( random() * (10 – 2 + 1) + 2 ) AS random_id FROM generate_series(1,9) ;
Here’s the result of running that query:
random_id |
---|
10 |
6 |
8 |
4 |
2 |
3 |
7 |
Notice that DISTINCT
removed all repeated rows, returning only the seven rows containing unique values. This of course means that there will be fewer numbers in the resulting set, but they will at least be unique. This step is essential, since we need unique dishes for our Thanksgiving Day meal.
There’s one thing we should note: due to the unexpected nature of random number generation, it’s possible that the DISTINCT
clause could return just a single row if all numbers happened to be duplicates (though this is highly unlikely). On the off chance that this does happen, all you need to do is simply run the query again.
Finally, we’d now like to retrieve only three rows from the resulting table. To that end, we will use the LIMIT
clause:
SELECT DISTINCT floor( random() * (10 – 2 + 1) + 2 ) AS random_id FROM generate_series(1,9) LIMIT 3;
Here’s the result of running that query:
random_id |
---|
10 |
9 |
2 |
Awesome! This time around, our query returned three unique dish identifiers.
As the name suggests, the LIMIT
clause limits the number of rows that are returned by a particular query. It is unique to PostgreSQL. In other database systems, limiting the number of rows may be achieved differently. Take a look at the summary table below:
database | keyword |
---|---|
Oracle | ROWNUM |
SQL Server | TOP |
PostgreSQL | LIMIT |
MySQL | LIMIT |
IBM DB2 | FETCH FIRST ROWS |
Returning three random dish names
We’re just one step away from finalizing our query. Instead of identifiers, we want dish names. The new query looks like this:
SELECT name FROM food JOIN ( SELECT DISTINCT floor( random() * (10 – 2 + 1) + 2 ) AS random_id FROM generate_series(1,9) LIMIT 3 ) AS ids ON ids.random_id = food.id ;
Here’s the result of running that query:
name |
---|
mincemeat pie |
pumpkin pie |
cranberry sauce |
Above, we treated the previous query as a subquery that returns a set of three unique identifiers. Then, using an inner join, we joined the food
table with the generated identifiers. We then compared the generated IDs with the dish IDs in the joining condition. In the outer query, we selected only those dish names whose IDs matched the random identifiers generated by the subquery.
Great! We’re all set. In addition to the turkey, we’ll prepare the following dishes for our Thanksgiving Day meal: mincemeat pie, pumpkin pie, and cranberry sauce.
If you’re interested in learning more about SQL and its applications, Vertabelo Academy is a great place for beginners. Check out our interactive SQL Basics course today to get started!