SQL Indexing 101


Indexes are one of the most misused and misunderstood entities in physical database design. A good understanding of indexes and how they solve database performance problems is necessary for any database novice. In this article, we’ll look at basic database indexes and their role in database development.

To picture what an index is, consider a textbook. At the end of most textbooks is an index listing all the terms one can find in the text and the pages on which they appear. The same is true of a database index—its purpose is to store the locations of indexed data, with the indexed data stored in a certain order, so we can access the data faster. An index is stored as a separate structure inside a database; it does not alter the structure of a database table in any way when it’s created.

The most important index in modern RDBMSes is known as a B-tree index. There are other types of indexes, too, like bitmap indexes and bloom indexes, but we’re not going to worry about those in this article. Let’s get started!

B-tree indexes

Why are the most common database indexes known as B-tree indexes? Well, quite simply, the “B” stands for balanced. B-tree indexes are balanced in the sense that the depth of any given node in such a “balanced” tree is equal for every indexed data location. In other words, with B-tree indexing, it takes the same amount of time to access the data stored in every location in the table, regardless of that data’s depth in the table.

To see how B-trees are constructed, let’s look at a simple customers table for a pet store:

Let’s explore a simple B-tree structure for an index on the id column, which holds unique integer values from 1 through 34. In our case, the root of the B-tree consists of two values: 32 and 17. This root branches out to two nodes that consist of four values each (32, 28, 24, 21 on the left and 17, 13, 9, 4 on the right). In turn, each of these two nodes branches out to four leaf nodes, for a total of eight leaves in the tree.

How does SQL indexing work?

Let’s say you need to retrieve every column of the row with an id of 10. If the database uses the index to conduct this search, it first starts with the root node and asks if the value 10 is less than or equal to or greater than 17. The database finds that 10 is less than 17, so it branches to the right. In the branch node, we see that 10 is between 13 and 9, so we’re routed to the leaf node that starts with 13. In this leaf node, we find our id and the corresponding address of the row. We see that to get to our desired row, the database had to use only three jumps.

Primary key indexes

Creating primary keys for a table will assign a unique index to the column(s) on which the primary key was constructed. Primary keys are tied to basic CREATE TABLE commands. If you’re not familiar with primary keys, don’t panic. They’re easier than they seem.

To create a UNIQUE index for the name column of the customers table, we would write:

CREATE UNIQUE INDEX name_unique_index ON customers (name);

You can probably guess that this index doesn’t make sense, since you can have customers who share the same name. A better fit for this table would be the creation of a NON-UNIQUE index. The syntax is even simpler; you just omit the UNIQUE keyword.

CREATE INDEX name_index ON customers (name);

Final thoughts

Indexing a database model is a primary database developer task but in some organizations, indexing is also done by database administrators, as they know the inner workings of the database best. The developers need to know how to index properly, as indexes speed up data retrieval. But bad indexing can break applications. The best way to learn SQL indexing is to open a database terminal and start executing queries. Try querying a table with and without indexes; the difference in speed will be quite noticeable, especially for large datasets.

Aldo Zelen

Aldo is a data architect with a passion for the cloud. From leading a team of data professionals to coding a data warehouse in the cloud, Aldo has experience with the whole lifecycle of data-intensive projects. Aldo spends his free time forecasting geopolitical events in forecasting tournaments.

comments powered by Disqus