Extracting Data From a String in PostgreSQL

how to do data extraction, data extraction example, data extraction techniques, data extraction tools, data extraction systematic review, data extraction in data mining

Learn how to extract data from strings in PostgreSQL using the split_part function.

Quite often, we’d like to extract parts of a string when working with text values. A common example is when we have a full name and need to retrieve only the last name. In this article, we’ll examine split_part, a PostgreSQL string-related function that can be used to extract a substring.

Why Use String Functions in Your Database?

There are many reasons why you’d want to use SQL string functions in your database, but the main reason is to address common data quality issues. In another article titled “How to Deal with Unwanted Characters in SQL”, we took a look at how common SQL string functions can help us control the quality of our data. In this article, we’ll mainly focus on extracting parts of a string for further analysis.

For a better understanding, we’ll work with a table with sample health care data created by a doctor who stores all his patient information in this single table. The table contains the patient’s full name, the date of their visit, the doctor’s diagnosis, the treatment the doctor suggested, and any drugs the doctor prescribed.

Every time a patient visits his office, the doctor creates a new record and manually types his notes into the database. We’ll mainly focus on the drugs_in_receipt column. If we take a look at the database below, we can see a list comma-separated drugs under that column.

Full_Name date diagnostic treatment Drugs_in_receipt
John Smith ’03/04/2016′ Flu rest “aspirin, paracetamol”
Mary Deep ’10/11/2016′ Food Poisoning Rest & diet “penicillin,paracetamol”
Agnes Jason ’03/12/2016′ flu rest “aspirin, paracetamol”
Johnny SMITH ’15/03/2017′ Food Poisoned Rest & diet “penicillin,paracetamol”
AGNES Jason ’19/072017′ angina Rest and don’t speak “amoxicillin”
Peter Duckerz ’10/10/2017′ Flu Rest “paracetamol”

Extracting a Substring from a String

Suppose the first drug in each list is the main drug the patient will need to take, and our doctor wants to extract the main drug from each list under the drugs_in_receipt column. This field stores a list of drugs prescribed by the doctor for each case. As we already mentioned, these drug names are comma separated, like “aspirin, paracetamol”.

We’ll use the PostgreSQL split_part function to extract only the first drug name from each record. Here is the query we’ll use:

SELECT split_part(drugs_in_receipt, ',',1) as first_medicine
FROM  patient_data

The split_part function takes three arguments. The first is the name of the field—in this case, drugs_in_receipt. The second argument is the delimiter (a comma), and the third is the position of the desired element. In our example, we want the first element of the list, so we pass in 1 as the third argument.

first_medicine
“aspirin
“penicillin
“aspirin
“penicillin
“amoxicillin”
“paracetamol”

If we take a close look at the result, we can see an opening double quote before each drug name. How can we remove this? In one of our other articles, we learned how you can use the SQL TRIM function to remove unwanted characters. For our purposes, we’ll specify what we want trim to remove—the double quotes.

SELECT trim( split_part( drugs_in_receipt , ',' , 1) , ' “” ')
FROM  patient_data
first_medicine
aspirin
penicillin
aspirin
penicillin
amoxicillin
paracetamol

There! That was pretty easy, wasn’t it?

Learn More About SQL String Functions

The PostgreSQL split_part function is one of the most important tools for manipulating string entries in your database. But it’s only one among many other useful functions you should learn if you plan to master SQL. To that end, be sure to check out our Standard SQL Functions course. Try it for free today!

Ignacio is a database consultant from Buenos Aires, Argentina. He’s worked for 15 years as a database consultant for IT companies like Informix and IBM. These days, he teaches databases at Sarmiento University and works as a PostgreSQL independent SQL consultant. A proud father of four kids with 54 years in his backpack, Ignacio plays soccer every Saturday afternoon, enjoying every match as if it’s his last one.

GET ACCESS TO EXPERT SQL CONTENT!