Learn how to extract data from strings in PostgreSQL using the split_part function.
Quite often, we’d like to extract parts of a string when working with text values. A common example is when we have a full name and need to retrieve only the last name. In this article, we’ll examine
split_part, a PostgreSQL string-related function that can be used to extract a substring.
Why Use String Functions in Your Database?
There are many reasons why you’d want to use SQL string functions in your database, but the main reason is to address common data quality issues. In another article titled “How to Deal with Unwanted Characters in SQL”, we took a look at how common SQL string functions can help us control the quality of our data. In this article, we’ll mainly focus on extracting parts of a string for further analysis.
For a better understanding, we’ll work with a table with sample health care data created by a doctor who stores all his patient information in this single table. The table contains the patient’s full name, the date of their visit, the doctor’s diagnosis, the treatment the doctor suggested, and any drugs the doctor prescribed.
Every time a patient visits his office, the doctor creates a new record and manually types his notes into the database. We’ll mainly focus on the
drugs_in_receipt column. If we take a look at the database below, we can see a list comma-separated drugs under that column.
|John Smith||’03/04/2016′||Flu||rest||“aspirin, paracetamol”|
|Mary Deep||’10/11/2016′||Food Poisoning||Rest & diet||“penicillin,paracetamol”|
|Agnes Jason||’03/12/2016′||flu||rest||“aspirin, paracetamol”|
|Johnny SMITH||’15/03/2017′||Food Poisoned||Rest & diet||“penicillin,paracetamol”|
|AGNES Jason||’19/072017′||angina||Rest and don’t speak||“amoxicillin”|
Extracting a Substring from a String
Suppose the first drug in each list is the main drug the patient will need to take, and our doctor wants to extract the main drug from each list under the
drugs_in_receipt column. This field stores a list of drugs prescribed by the doctor for each case. As we already mentioned, these drug names are comma separated, like “aspirin, paracetamol”.
We’ll use the PostgreSQL
split_part function to extract only the first drug name from each record. Here is the query we’ll use:
SELECT split_part(drugs_in_receipt, ',',1) as first_medicine FROM patient_data
split_part function takes three arguments. The first is the name of the field—in this case,
drugs_in_receipt. The second argument is the delimiter (a comma), and the third is the position of the desired element. In our example, we want the first element of the list, so we pass in
1 as the third argument.
If we take a close look at the result, we can see an opening double quote before each drug name. How can we remove this? In one of our other articles, we learned how you can use the SQL TRIM function to remove unwanted characters. For our purposes, we’ll specify what we want
trim to remove—the double quotes.
SELECT trim( split_part( drugs_in_receipt , ',' , 1) , ' “” ') FROM patient_data
There! That was pretty easy, wasn’t it?
Learn More About SQL String Functions
split_part function is one of the most important tools for manipulating string entries in your database. But it’s only one among many other useful functions you should learn if you plan to master SQL. To that end, be sure to check out our Standard SQL Functions course. Try it for free today!