One of the things you'll care a lot about when working with databases is how quickly you'll be able to find and retrieve data that you want. Not only might it make for a poor experience if a user has to wait too long to complete a task (even a few seconds can be considered unacceptably long!), sometimes even a mere fraction of a second can have a real difference especially for mission-critical applications.
An index helps with finding data rows quickly and efficiently. In this scenario, we'll try our hand at creating indexes for a sample table with data already loaded in. We'll also use the EXPLAIN (ANALYZE) command to take a look at how indexes affect querying. If you'd like to review EXPLAIN first, feel free to go through that scenario before coming back to this one.
The data we're using is storm event data in the United States from 2018 (downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/). The scenario will load the PostgreSQL instance and connect you to the database using the following credentials:
- Username: groot
- Password: password
- Database: workshop
Indexes are extremely handy, but they don't come completely free. Part of maintaining a database is putting in the time and effort to determine when it's appropriate to create an index, and when it might actually be more detrimental.
The official docs have more helpful information on how indexes work:
And if you're interested, the official docs also dive deeper into index usage which we had mentioned on the previous page.
The next scenario will go into a bit more detail on the different types of indexes in PostgreSQL. See you there!
Enjoy learning about PostgreSQL? Sign up for our newsletter and get the latest tips from us each month.
Intro to Indexes
About Database Indexes
A common analogy used to describe the role of database indexes is the index section at the back of a book. The index helps you quickly find specific terms or key words as used throughout the book. In the context of databases, the index stores information on where a data row is located in a table. The database goes to the index and then uses that information to retrieve the requested data.
Each relational database management system may have its own particular ways of how exactly an index is implemented, but generally speaking:
- Each index works on one relation1 - more specifically, the index is based on a column (or columns) in that relation.
- Indexes are automatically created for primary key and unique constraint columns.
- The database query planner takes into account the indexes available when determining the best path for executing a query.
It's also worth noting that an index not only helps with retrieving data (i.e. the WHERE part of the query), it might even help with joins, or sorting the results set (i.e. the ORDER BY clause).
Indexes are their own data structures, and they're also stored on disk along with data tables and other objects. Postgres supports several different index types. The most common type is b-tree (which stands for "Balanced Tree"). B-tree is used by default when you create a new index and don't specify an index type.
Query tuning and optimizing is a pretty big topic, and something that database administrators focus heavily on. With that said, anyone using a database system can benefit from having general knowledge on the role that indexes play.
1Note: In the next steps we'll be working specifically with table indexes. Indexes can be also be created for materialized views, so we say "relation" here.