Key Insights for Mastering SQL Queries Using Tinder Data Example

""

Going from zero to one can be daunting in any endeavor. The same is true for learning new programming languages, even simple ones like SQL. In this article, we’ll take a look at some key insights that will help you understand the nuances of SQL queries. If you’ve never used SQL, you’re in the right place.

When learning anything new, you’ll find that there are always some key insights or tips that can help you on your way. Learning SQL queries is no different. Today, we’ll explore some ways you can accelerate your understanding of basic SQL queries, with visualization and analogies as the main drivers of mastery. With the basics covered, we’ll look at some important ways you can expand your SQL knowledge with exercises. This article is intended for complete newbies who have never worked with SQL in their lives.

Laying the groundwork with SELECT

Let’s say you’ve installed Tinder and are looking at the collection of profiles available to you. The value you want to derive is simple: you’re interested in finding a date from among an array of profiles. But there are far too many profiles; you need to refine your search. Perhaps you prefer a tall man or a blue-eyed woman. Whatever your preference, you’ll need to define and refine your interests. The same process is true for the SELECT clause in SQL statements: you want to select some data and filter it by specifying certain criteria.

So if you’d like to translate your Tinder request into a query, you’d write something like:

SELECT NAME, SURNAME FROM TINDER_PROFILES;

The best way to make the syntax stick is to read the statement aloud. Think of it as commanding the database to do something:

There’s a reason why SQL is so intuitive: it’s a fourth-generation programming language (4GL). As such, SQL is really simple to use and understand. If we would like to refine our search, we’d simply need to add a new clause to the statement to form a more complex query:

SELECT NAME, SURNAME FROM TINDER_PROFILES 
WHERE eye_color = ‘blue’ and sex = ‘F’;

In this statement, we’re simply refining our search to reduce the number of profiles we receive from the table named TINDER_PROFILES. How can you describe a database table? Think of it as a collection of every profile out there, where each profile has attributes like name, surname, sex, eye_color, and others. To make the statement more readable, just imagine that the equality sign reads as ‘is’ in the above query.

One thing that confuses newbies is that the attributes we use to filter a table don’t always appear in the select list. But this is perfectly acceptable if you think about it—we’re only interested in selecting certain information, but we also need to apply filters to our search. Whether or not we actually select those filters is up to us.

In JOINs we find strength

Let’s move on to a slightly more difficult way to express ourselves. This time, we’ll use multiple entities—or, as we call them in SQL, tables! Below is an example showing two related tables.


One table contains our friends, and the other table contains their addresses. When we want to command the database to retrieve all of our friends who live in the city of London, we can write something like this:

SELECT NAME, SURNAME FROM FRIENDS 
JOIN ADDRESSES ON (FRIENDS.ADDRESS_ID = ADDRESSES.ID) 
WHERE addressES.CITY = ‘London’

Let’s translate this query to regular English:

The keyword here is JOIN. With a JOIN, we are commanding the database to connect two tables in some manner. The connection is established via identification.

It’s simpler than it sounds. Basically, each friend has an address_id that refers to the city where they live. Each city has its own id. These two attributes are the same, but they’re just named differently and are found in two separate tables.

Bear in mind is that every friend lives only in one city, and each city has a unique id. If we had cities that shared IDs, we would get erroneous result sets in some queries. So let’s say that London is represented by an address_id of 25, and Mumbai has the same ID, 25. Our friend Steve lives at an address with an ID of 25.

The query where we look for our friend’s city is:

SELECT FRIENDS.NAME, ADDRESS.CITY FROM FRIENDS 
JOIN ADDRESSES ON (FRIENDS.ADDRESS_ID = ADDRESSES.ID) WHERE FRIENDS.NAME = ‘Steve’

The result set for the query would look like :

Steve London
Steve Mumbai

As you see in the data name Steve is duplicated to two cities because we have not been diligent and we allowed the table of cities to have two cities with the same ID.

My personal tip is that you should always check the underlying data to see if there are some erroneous rows before joining tables. Checking underlying data will always be part of your routine.

Okay, so we have to be honest with you: structuring our database model like this—with the possibility of identical ids in the same table–is not good SQL database design. If you’re interested, you can learn more about how to create tables in SQL.

This JOIN is an example of an INNER join, but there are many other kinds of SQL JOINs out there.

Unions are a good thing

By now, you should be a little more comfortable with understanding SQL queries that tell a database to return certain data. Now, we’ll look at how you can unify two different result sets. This is not to be confused with JOINing two tables.

Let’s say you have a table that contains all the foods that you like; let’s call this table my_foods. We also have a table for every food that my friend Steve likes, aptly named steve_foods. How could we look at all the foods that we both enjoy? We could write this UNION:

SELECT name FROM my_foods UNION SELECT name from steve_foods;

This query returns food that both you and Steve like. There is one more thing to note here: the resulting names of foods do not contain any duplicates. Food that both you and Steve enjoy, like pizza, will only be listed once in the result set of this query.

If we’d like to know what foods you and Steve both enjoy, we could seek the intersection of the two sets:

SELECT name FROM my_foods 
INTERSECT SELECT name FROM steve_foods;

Here’s a good hack: if we want to list all the foods that you and Steve like, regardless of whether they’re duplicates, we can use the UNION ALL clause, like so:

SELECT name FROM my_foods 
UNION ALL SELECT name FROM steve_foods;

TIP: Always use UNION ALL when possible, as it is much faster than UNION.

Okay, so we have to be honest with you: structuring our database model like this—as two tables, my_foods and steve_foods—is not good SQL database design. If you’re interested, you can learn more about how to create tables in SQL.

Aggregating your knowledge

When you master simple selecting, joining, and unifying, you should move on to learning how you can calculate certain values for entire data sets or groups within these data sets. Calculating a value for a column or data set is known as aggregation, and queries that use aggregation are known as aggregate SQL queries.

Let’s say we want to calculate the average price we spend on different categories of personal items per month. These categories could be food, entertainment, travel, and so on. How would you go about answering this question in the real world?

We would do something like this

  • First, we would log every purchase throughout a given year.
  • In this log, we would write the name of the item, price, category, and time of purchase.
  • We would sum up the prices of all the items within their respective categories.
  • We would then divide the total yearly item expenditure by 12 to get our monthly expenditure.

To get the sum of all the items in a certain category, we would have to GROUP the items BY category.

So the command to the database would be something like this:

	SELECT CATEGORY, SUM (PRICE)/12 AS AVERAGE_EXPENDITURE_OF_CATEGORIES
	FROM EXPENDITURES 
	GROUP BY CATEGORY

We tell the database to select the sum of the PRICE column from the EXPENDITURES table, divide that sum by 12, and name it average_expenditure_of_categories. And we group the results by category. Armed with information about how much we spend in each of these categories, we can now construct a budget.

This special sort of SELECT query—one that calculates values like sums, averages, minimums, and so on for one or more rows—is known as an aggregate query. Aggregation queries are always reductionist in the sense that the data sets they return always have fewer rows than the original table(s). The rows are returned in such a way that there is always one row for the group we are constructing. In our example, that means we have one row per category of item.

It is important to note that we can SELECT only grouped categories and aggregate calculations like SUM, MIN, MAX and MEAN. This usually confuses newbies. A good trick to avoid syntactic errors when writing aggregation queries is to first write the query without the columns in the SELECT clause and to fill them in only after you have constructed the rest of the query.

What next?

In this short article, we looked at the basics of SQL queries and how you can gain insights into mastering SQL. We learned about simple SELECT statements that we can use to select useful information from a table. With JOINs, we expanded our knowledge to include more complex ways to talk to tables. UNIONs are useful for representing and working with our data in set form. With aggregation, we dove into the logic of constructing statements that calculate values over columns.

But this is just the beginning. With these tools, you can start your journey of learning SQL. There are many more usefully features of the language that you can use to solve business problems and pet projects. So open your favorite SQL editor, try out the things we touched on in this article, and continue learning!

Aldo Zelen

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.

comments powered by Disqus

GET ACCESS TO EXPERT SQL CONTENT!