How can you store textual information in database tables? Read this post to discover your options.
Note: This post is intended for readers familiar with SQL data definition language (DDL) and the DDL CREATE statement. To learn more about DDL, check out Vertabelo SQL Academy.
Most data types are classified as NUMERIC, CHARACTER, or DATE. In this post, we’re going to focus on TEXTUAL or CHARACTER data types. These data types handle groups of alphanumeric characters – i.e. letters and numbers – known as strings. They are less restrictive than other data types and have fewer properties. You can store a number in a CHARACTER data type, but you cannot store a letter in a NUMERIC data type.
Every major RDBMS (relational database management system) vendor has some specifics about the data types they support. We will use a more integrated approach and consider the data types we find across vendors.
CHARACTER data types can be classed as one of the following:
- Fixed-length (They have a predetermined, fixed size.)
- Variable-length (The length is determined by the text they store.)
- Very large characters (This is used to store entire files.)
Each of these subtypes can also be broken down to regular and Unicode representations.
When Working With Text In SQL, Remember…
Before we get started with the data types, here are some things you should remember about working with text in SQL:
Text Requires Single Quotes
In SQL, text is defined by single quotes [‘ ‘]. For example, if you wanted to insert a name into a field, the name must have opening and closing single quotes. In the code below, France is inserted into the
INSERT INTO COUNTRY(1, ‘France’)
Text Is Case-Sensitive
Another thing to remember is that text is case-sensitive. Inserting
FRANCE into the database means it will have two separate records for France. Because the capitalization is different, the database does not realize they are the same country.
INSERT INTO COUNTRY(2, ‘FRANCE’)
“France” is not the same as “FRANCE”. Bear that in mind.
— Vertabelo (@Vertabelo) January 3, 2017
Text Has Special Functions
You can find out the length of a text field by using the LENGTH function on a text column. So this command …
SELECT LENGTH(NAME) FROM COUNTRY
… will return the number of letters (6) in the word “France”.
Text data types have other properties, but as they are more advanced we will skip them for now.
Text Is Comparable
You can compare two or more text fields. If you input …
SELECT * FROM COUNTRY WHERE ‘France’ = ‘FRANCE’
… you will not get any rows, as you already know that
FRANCE does not equal
France in SQL!
FIXED-LENGTH CHARACTER data types
The fixed-length character data type is represented as CHARACTER or CHAR. When we define a CHAR data type, we must also define the length (the default value is 1) of the string we anticipate storing in this column. Some databases will allow you to define CHAR without setting the length, but this is not advisable.
When creating a table with the DDL statement, we write:
CREATE TABLE COUNTRY ( ID NUMBER, NAME CHAR(48) );
We expect that the maximal length of any country’s name will be 48 characters (as in “The United Kingdom of Great Britain and Northern Ireland”), so we define the string’s max length as 48.
What happens when we store shorter country names? Suppose we store Poland in our 48-length CHAR field. The RDBMS will pad the remaining 42 spaces with blanks (i.e. blank spaces). This can take a lot of room in the database, so be careful to use CHARs only for smaller fields unless there is a good reason to do otherwise.
Another question arises: What would happen if you inserted a country name that is longer than the column length? Let’s say the UK gave more autonomy to Scotland and changed the country’s official name to “The United Kingdom of Great Britain, Scotland, and Northern Ireland” (58 characters). If you were to enter this, the RDBMS would return an error.
“No problem,” you say, “I’ll just change the column length”. Actually, changing column length is not as easy as it sounds. It’s best to plan in advance for column length. Also, keep in mind that the maximum length of a CHAR changes with RDBMS systems. Oracle sets a max of 2,000 characters; MS SQL limits theirs to 8,000.
The other representation of this data type is NCHAR or NCHARACTER., which states that the characters are Unicode. These data types can store symbols as well as numbers and letters.
Variable-length character data types are called VARCHAR or VARCHAR2 (var stands for variable). CHAR and VARCHAR share some major similarities:
- Both allow users to set maximum string lengths when the type is defined.
- Both have a default value of “1”.
- Some RDBMSes allow you to define the VARCHAR type without specifying length, but once again this is not advisable.
To create a table with the DDL statement we write:
CREATE TABLE COUNTRY ( ID NUMBER, NAME VARCHAR(48) );
Again, we’ve defined the maximum string length as 48 to accommodate the full official name of the UK. The RDBMS does not pad the remaining string space with anything to make it take up 48 characters. So what you put in is what you take up, space-wise! If you put
Poland in a VARCHAR data type, it will only need storage for 6 characters, not 48 as with the CHAR example.
You still need to avoid going over the limit with VARCHAR. The RDBMS will still return an error if you enter a country name that’s longer than 48 characters. So profile (explore the size of) your data beforehand to develop a robust data model.
It’s good to know that in Oracle and MS SQL, the maximum column size for this data type is 4,000. Also, the other representation of this data type is NVARCHAR, which stores the characters as Unicode. You can put all sorts of symbols into an NVARCHAR column.
VERY LARGE CHARACTERS
Sometimes you just need to go beyond! When working with very large blocks of text, such as text files, XML files, white papers, articles, or books, you can define a special data type to store them. However, this kind of data type is not part of the SQL standard, so it differs more by RDBMS than other types. For example:
- In Oracle, this data type is CLOB (Character Large Object), which allows 4 GB per row cell.
- In PostgreSQL, it’s TEXT, and its size has no upper limit.
- In MS SQL, it’s VARCHAR(max), which allows 2 GB per row cell.
Because this data type is so large, some RDBMSes do not support certain operations on it, such as index creation, that they do on CHAR and VARCHAR data types.
Use this data type when and only when you have a very good reason. Do not use it to get around normalizing your model. Don’t use it to de-normalize an existing model. (If you don’t know what database normalization is, you can explore the topic here.)
Let’s quickly summarize the different CHARACTER types:
|VENDOR||FIXED SIZE||VARIABLE SIZE||LARGE SIZE|
|Oracle||Char, Nchar||Varchar, Nvarchar||Clob|
|MSSQL||Char, Nchar||Varchar, Nvarchar||Varchar(max)|
|PostgreSQL||Character, Char||Varchar, Character varying||Text|