Difficulty: Intermediate
Estimated Time: 20 minutes

In this learning unit, you will:

  • Understand the purpose and use cases of secondary indexes
  • Recognize the differences between 2i and SASI
  • Explore several examples of using secondary indexes
  • Learn about limitations of secondary indexes

This scenario is also available on our datastax.com/dev site, where you can find many more resources to help you succeed with Apache Cassandra™.

Did you know?

You can use Cassandra as a service in the cloud. Nothing to install, no credit card required. Sign up and launch your database with a few clicks at astra.datastax.com!

In this scenario, you learned about:

  • Purpose and use cases of secondary indexes
  • Differences between 2i and SASI
  • Several examples of using secondary indexes
  • Limitations of secondary indexes

Using Secondary Indexes in Apache Cassandra™

Step 1 of 11

Secondary indexes

Each table only supports a limited set of queries based on its primary key definition. Additional queries can be supported by creating new tables with different primary keys, materialized views or secondary indexes. A secondary index can be created on a table column to enable querying data based on values stored in this column. Internally, a secondary index is represented by additional data structures that are created and automatically maintained on each cluster node. There are two types of secondary indexes:

  • Regular secondary index (2i): a secondary index that uses hash tables to index data and supports equality (=) predicates.
  • SSTable-attached secondary index (SASI): an experimental and more efficient secondary index that uses B+ trees to index data and can support equality (=), inequality (<, <=, >, >=) and even text pattern matching (LIKE).

While secondary indexes may seem very convenient, they have serious performance limitations and their use in production should be limited to the following two rather specific cases:

  • Real-time transactional query: retrieving rows from a large multi-row partition, when both a partition key and an indexed column are used in a query.
  • Expensive analytical query: retrieving a large number of rows from potentially all partitions, when only an indexed low-cardinality column is used in a query. A low-cardinality column is characterized by a large number of duplicates stored in it and a limited data range for its possible values.

For all other use cases, which usually involve high-cardinality columns, you should always prefer tables and materialized views to secondary indexes.

The general recommendation is to have at most one secondary index per table. And most tables do not need any. We provide more details about the secondary index limitations later in this presentation.