Welcome!
Using Advanced Data Types in Apache Cassandra™
In this scenario, you will learn about:
- Universally unique identifier (UUID) data types
- Collection data types
- Tuple data type
- User-defined types (UDTs)
- Counter data type
This scenario is also available on our datastax.com/dev site, where you can find many more resources to help you succeed with Apache Cassandra™.
Congratulations!
You've completed the scenario!
Scenario Rating
In this scenario, you learned about:
- Universally unique identifier (UUID) data types
- Collection data types
- Tuple data type
- User-defined types (UDTs)
- Counter data type
Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands
cat scrapbook_datastax_cassandra-intro/advanced-data-types_container.tar | docker load
docker run -it /datastax_cassandra-intro/advanced-data-types:
Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

Steps
Using Advanced Data Types in Apache Cassandra™
CQL data types
You may have already used many CQL data types in your table definitions. Most of them, including
TEXT
, VARCHAR
, ASCII
, TINYINT
, SMALLINT
, INT
, BIGINT
, VARINT
,
FLOAT
, DOUBLE
, DECIMAL
, TIME
, TIMESTAMP
, DATE
, DURATION
, BOOLEAN
, BLOB
, and INET
,
are easy to understand and use. There are also several "more advanced"
CQL data types that we cover in this presentation:
- Universally unique identifier (UUID) data types:
UUID
andTIMEUUID
- Collection data types:
SET
,LIST
andMAP
- Tuple data type:
TUPLE
- User-defined types (UDTs):
CREATE TYPE
,ALTER TYPE
,DROP TYPE
andDESCRIBE TYPE
- Counter data type:
COUNTER
Let's get started ...
Start the CQL shell:
cqlsh
Create the keyspace:
CREATE KEYSPACE killr_video
WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC-Houston': 1 };
Set the current working keyspace:
USE killr_video;
UUIDs
A universally unique identifier (UUID) is a 128-bit number that can be automatically generated and used to identify an entity or relationship in a Cassandra database. UUIDs provide an efficient way to assign unique identifiers and help to prevent accidental upserts or eliminate race conditions.
Cassandra Query Language supports the following two UUID data types:
UUID
is a Version 4 UUID that is randomly generated. To generate a value of typeUUID
, you can use functionuuid()
.TIMEUUID
is a Version 1 UUID that is generated based on a MAC address and a timestamp. To generate a value of typeTIMEUUID
, you can use functionnow()
. When needed, the timestamp component of aTIMEUUID
value can be extracted using functionsunixTimestampOf()
ordateOf()
. Moreover,TIMEUUID
values in clustering columns are automatically ordered based on their underlying timestamps and can also be retrieved based on the timestamps using functionsminTimeuuid(timestamp)
andmaxTimeuuid(timestamp)
.
As an example, let's create table users
with partition key id
of type UUID
and insert two rows:
CREATE TABLE users (
id UUID,
name TEXT,
age INT,
PRIMARY KEY ((id))
);
INSERT INTO users (id, name, age)
VALUES (7902a572-e7dc-4428-b056-0571af415df3,
'Joe', 25);
INSERT INTO users (id, name, age)
VALUES (uuid(), 'Jen', 27);
SELECT * FROM users;
Next, create table movies
with partition key id
of type UUID
and insert the following two rows:
id | title | year | duration |
---|---|---|---|
5069cc15-4300-4595-ae77-381c3af5dc5e | Alice in Wonderland | 2010 | 108 |
uuid() | Alice in Wonderland | 1951 | 75 |
Solution
CREATE TABLE movies (
id UUID,
title TEXT,
year INT,
duration INT,
PRIMARY KEY ((id))
);
INSERT INTO movies (id, title, year, duration)
VALUES (5069cc15-4300-4595-ae77-381c3af5dc5e,
'Alice in Wonderland', 2010, 108);
INSERT INTO movies (id, title, year, duration)
VALUES (uuid(), 'Alice in Wonderland', 1951, 75);
SELECT * FROM movies;
Finally, study this more advanced example, where TIMEUUID
is used to both guarantee uniqueness and
provide a timestamp for each row in table comments_by_user
:
CREATE TABLE comments_by_user (
user_id UUID,
comment_id TIMEUUID,
movie_id UUID,
comment TEXT,
PRIMARY KEY ((user_id), comment_id, movie_id)
);
INSERT INTO comments_by_user (user_id, comment_id, movie_id, comment)
VALUES (7902a572-e7dc-4428-b056-0571af415df3,
63b00000-bfde-11d3-8080-808080808080,
5069cc15-4300-4595-ae77-381c3af5dc5e,
'First watched in 2000');
INSERT INTO comments_by_user (user_id, comment_id, movie_id, comment)
VALUES (7902a572-e7dc-4428-b056-0571af415df3,
9ab0c000-f668-11de-8080-808080808080,
5069cc15-4300-4595-ae77-381c3af5dc5e,
'Watched again in 2010');
INSERT INTO comments_by_user (user_id, comment_id, movie_id, comment)
VALUES (7902a572-e7dc-4428-b056-0571af415df3,
now(),
5069cc15-4300-4595-ae77-381c3af5dc5e,
'Watched again today');
SELECT comment_id, dateOf(comment_id) AS date, comment
FROM comments_by_user
WHERE user_id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT comment_id, dateOf(comment_id) AS date, comment
FROM comments_by_user
WHERE user_id = 7902a572-e7dc-4428-b056-0571af415df3
AND comment_id > maxTimeuuid('1999-01-01 00:00+0000')
AND comment_id < minTimeuuid('2019-01-01 00:00+0000')
ORDER BY comment_id DESC;
Sets
A set is an unordered collection of distinct values. In Cassandra, sets are intended for
storing a small number of values of the same type. To define a set data type,
Cassandra Query Language provides construct SET<type>
, where type
can refer to a CQL data type like
INT
, DATE
, UUID
and so forth.
As an example, alter table movies
to add column production
of type SET<TEXT>
:
ALTER TABLE movies ADD production SET<TEXT>;
SELECT title, year, production FROM movies;
Add three production companies for one of the movies:
UPDATE movies
SET production = { 'Walt Disney Pictures',
'Roth Films' }
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
UPDATE movies
SET production = production + { 'Team Todd' }
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
SELECT title, year, production FROM movies;
Next, alter table movies
to add column genres
and
add values Adventure, Family and Fantasy for one of the movies:
Solution
ALTER TABLE movies ADD genres SET<TEXT>;
UPDATE movies
SET genres = { 'Adventure', 'Family', 'Fantasy' }
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
SELECT title, year, genres FROM movies;
Lists
A list is an ordered collection of values, where the same value may occur more than once.
In Cassandra, lists are intended for
storing a small number of values of the same type. To define a list data type,
Cassandra Query Language provides construct LIST<type>
, where type
can refer to a CQL data type like
INT
, DATE
, UUID
and so forth.
As an example, alter table users
to add column searches
of type LIST<TEXT>
:
ALTER TABLE users ADD searches LIST<TEXT>;
SELECT id, name, searches FROM users;
Add three latest search leterals for one of the users:
UPDATE users
SET searches = [ 'Alice in Wonderland' ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET searches = searches + [ 'Comedy movies' ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET searches = searches + [ 'Alice in Wonderland' ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT id, name, searches FROM users;
Delete the oldest search literal and add a new one:
DELETE searches[0] FROM users
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET searches = searches + [ 'New releases' ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT id, name, searches FROM users;
Next, alter table users
to add column emails
and
add two email addresses for one of the users:
Solution
ALTER TABLE users ADD emails LIST<TEXT>;
UPDATE users
SET emails = [ '[email protected]',
'[email protected]' ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT id, name, emails FROM users;
Maps
A map is a collection of key-value pairs, where each pair has a unique key.
In Cassandra, maps are intended for
storing a small number of key-value pairs of the same type. To define a map data type,
Cassandra Query Language provides construct MAP<type1,type2>
, where type1
and type2
can refer to same or different CQL data types, including
INT
, DATE
, UUID
and so forth.
As an example, alter table users
to add column sessions
of type MAP<TIMEUUID,INT>
:
ALTER TABLE users ADD sessions MAP<TIMEUUID,INT>;
SELECT name, sessions FROM users;
Add two sessions with TIMEUUID
keys and INT
duration values for one of the users:
UPDATE users
SET sessions = { now(): 32,
e22deb70-b65f-11ea-9aac-99396fc4f757: 7 }
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, sessions FROM users;
Update a duration value for one of the sessions:
UPDATE users
SET sessions[e22deb70-b65f-11ea-9aac-99396fc4f757] = 9
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, sessions FROM users;
Next, alter table users
to add column preferences
and
add key-value pairs (color-scheme: dark) and (quality: auto) for one of the users:
Solution
ALTER TABLE users ADD preferences MAP<TEXT,TEXT>;
UPDATE users
SET preferences['color-scheme'] = 'dark'
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET preferences['quality'] = 'auto'
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, preferences FROM users;
Nested collections
It is also possible to define collection data types that contain nested collections, such as list of maps or
set of sets of sets. A nested collection definition has to be designated as FROZEN
, which means that a nested collection
is stored as a single blob value and manipulated as a whole. In other words, when an individual element
of a frozen collection needs to be updated, the entire collection must be overwritten. As a result, nested collections
are generally less efficient unless they hold immutable or rarely changing data.
Let's alter table movies
again to be able to store movie casts and crews in column crew
of type MAP<TEXT,FROZEN<LIST<TEXT>>>
:
ALTER TABLE movies
ADD crew MAP<TEXT,FROZEN<LIST<TEXT>>>;
SELECT title, year, crew FROM movies;
Add a movie crew:
UPDATE movies
SET crew = {
'cast': ['Johnny Depp', 'Mia Wasikowska'],
'directed by': ['Tim Burton']
}
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
SELECT title, year, crew FROM movies;
Tuples
A tuple is a fixed-length list, where values can be of different types.
To define a tuple data type,
Cassandra Query Language provides construct TUPLE<type1,type2,..., typeN>
,
where type1
, type2
, ... , typeN
can refer to same or different CQL data types, including
INT
, DATE
, UUID
and so forth.
Here is an example of a TUPLE
column:
ALTER TABLE users ADD full_name TUPLE<TEXT,TEXT,TEXT>;
UPDATE users
SET full_name = ('Joe', 'The', 'Great')
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, full_name FROM users;
Unlike UDTs, individual tuple components cannot be updated without updating the whole tuple. Therefore, you should always prefer UDTs to tuples.
UDTs
A user-defined type (UDT) is a custom data type composed of one or more named and typed fields.
UDT fields can be of simple types, collection types or even other UDTs.
Cassandra Query Language provides statements CREATE TYPE
, ALTER TYPE
, DROP TYPE
and DESCRIBE TYPE
to work with UDTs.
Let's define UDT ADDRESS
with four fields:
CREATE TYPE ADDRESS (
street TEXT,
city TEXT,
state TEXT,
postal_code TEXT
);
Alter table users
to add column address
of type ADDRESS
:
ALTER TABLE users ADD address ADDRESS;
SELECT name, address FROM users;
Add an address for one of the users:
UPDATE users
SET address = { street: '1100 Congress Ave',
city: 'Austin',
state: 'Texas',
postal_code: '78701' }
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, address FROM users
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET address.state = 'TX'
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name,
address.street AS street,
address.city AS city,
address.state AS state,
address.postal_code AS zip
FROM users
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
Next, alter table users
to add column previous_addresses
and
add at least two previous addresses for one of the users:
Solution
ALTER TABLE users
ADD previous_addresses LIST<FROZEN<ADDRESS>>;
UPDATE users
SET previous_addresses = [
{ street: '10th and L St',
city: 'Sacramento',
state: 'CA',
postal_code: '95814' } ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, previous_addresses FROM users
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
UPDATE users
SET previous_addresses = previous_addresses + [
{ street: 'State St and Washington Ave',
city: 'Albany',
state: 'NY',
postal_code: '12224' } ]
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
SELECT name, address, previous_addresses FROM users
WHERE id = 7902a572-e7dc-4428-b056-0571af415df3;
Counters
A distributed counter is a 64-bit signed integer, whose value can be efficiently modified by concurrent transactions
without causing race conditions.
Counters are useful for keeping track of various statistics by counting events or adding up integer values.
To define a counter column, Cassandra Query Language provides data type COUNTER
. There are
a number of restrictions on how counters can be used in Cassandra:
- A counter cannot be set or reset. It can only be incremented or decremented.
- A counter value does not exist until the first increment or decrement operation is performed.
The first operation assumes the initial counter value of
0
. - A table with one or more counter columns cannot have non-counter columns other than primary key columns. Counter columns cannot be primary key columns.
As an example, let's create a new table to keep track of movie rating statistics:
CREATE TABLE movie_stats (
id UUID,
num_ratings COUNTER,
sum_ratings COUNTER,
PRIMARY KEY ((id))
);
Update the counters to account for two ratings of 7 and 9 for the same movie:
UPDATE movie_stats
SET num_ratings = num_ratings + 1,
sum_ratings = sum_ratings + 7
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
UPDATE movie_stats
SET num_ratings = num_ratings + 1,
sum_ratings = sum_ratings + 9
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
SELECT * FROM movie_stats;
Next, alter table movie_stats
to add another COUNTER
column to keep track of how many times each movie was watched and
increment the counter value three times for one of the movies:
Solution
ALTER TABLE movie_stats ADD num_views COUNTER;
UPDATE movie_stats
SET num_views = num_views + 1
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
UPDATE movie_stats
SET num_views = num_views + 1
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
UPDATE movie_stats
SET num_views = num_views + 1
WHERE id = 5069cc15-4300-4595-ae77-381c3af5dc5e;
SELECT * FROM movie_stats;
It is important to understand that, unlike updates on other data type columns, counter increments and decrements are not idempotent. An idempotent operation produces the same result when executed multiple times. It is safe to retry a timed-out idempotent operation. However, in case of counters, replaying an increment or decrement operation may result in overcounting or undercounting. Therefore, counters should only be used when an absolute precision is not required.
Test Your Understanding
Here is a short quiz for you.