Each table only supports a limited set of queries based on its primary key definition. Additional queries can be supported by creating new tables with different primary keys, materialized views or secondary indexes. A secondary index can be created on a table column to enable querying data based on values stored in this column. Internally, a secondary index is represented by additional data structures that are created and automatically maintained on each cluster node. There are two types of secondary indexes:
- Regular secondary index (2i): a secondary index that uses hash tables to index data and supports equality (
- SSTable-attached secondary index (SASI): an experimental and more efficient secondary index that uses B+ trees to index data and can support equality (
=), inequality (
>=) and even text pattern matching (
While secondary indexes may seem very convenient, they have serious performance limitations and their use in production should be limited to the following two rather specific cases:
- Real-time transactional query: retrieving rows from a large multi-row partition, when both a partition key and an indexed column are used in a query.
- Expensive analytical query: retrieving a large number of rows from potentially all partitions, when only an indexed low-cardinality column is used in a query. A low-cardinality column is characterized by a large number of duplicates stored in it and a limited data range for its possible values.
For all other use cases, which usually involve high-cardinality columns, you should always prefer tables and materialized views to secondary indexes.
The general recommendation is to have at most one secondary index per table. And most tables do not need any. We provide more details about the secondary index limitations later in this presentation.