Difficulty: Intermediate
Estimated Time: 25 minutes

With the 4.0 release of Apache Cassandra™, there have been improvements to incremental repair: the process is now more reliable and robust, easier to manage and ready for use in production.

In this scenario you will:

  • populate a small cluster with data and introduce the need for repair
  • execute incremental repair and observe the data being reconciled between nodes
  • learn how to manage repairs on a Cassandra cluster.

In this scenario you have:

  • introduced the need for data repair on a small Cassandra cluster
  • performed the repair and observed the data being in agreement again within the cluster
  • learned about incremental repair in Cassandra 4.0

Repair Improvements

Step 1 of 5

Setup & create data

This scenario requires a two-node cluster, that is being created for you, which may take about three minutes. Wait until the terminal prints a message such as "Cassandra Cluster with nodes <IP_node_1> and <IP_node_2> has started" before proceeding.

Verify the cluster is up and running with

nodetool status

The output should list two nodes, each in the UN (Up, Normal) status.

Initialization

Let us initialize all terminals of the scenario (including connecting to the two nodes) by running the following:

echo Initializing terminal 2
ssh $HOST1_IP
echo Initializing terminal 3
ssh $HOST1_IP
echo Initializing terminal 4
ssh $HOST1_IP
echo Initializing terminal 5
ssh $HOST2_IP
echo Initializing terminal 6
ssh $HOST2_IP
echo Initializing terminal 7
ssh $HOST2_IP

Schema creation

Open the CQL Shell on both nodes:

cqlsh $HOST1_IP    # Node1
cqlsh $HOST2_IP    # Node2

The following commands can be run on either CQL shells - we will work on Node1. First let us create a keyspace with replication factor of two, i.e. such that all rows be replicated on both nodes:

CREATE KEYSPACE chemistry WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 2};

Set the newly-created keyspace as the default one for subsequent commands:

USE chemistry;

Finally create a table for storing the periodic table of elements:

CREATE TABLE elements (
    symbol TEXT PRIMARY KEY,
    name TEXT,
    atomic_mass DOUBLE,
    atomic_number INT
);

Data insertion

A CSV file with the first hundred or so elements and their properties is provided for use; to load its contents into the elements table, run the following CQL command:

COPY elements FROM 'elements.csv' WITH HEADER=TRUE;

To verify the insertion has succeeded, let's try to query the table (from Node2, why not?). Let's look at some of the rows,

USE chemistry;
SELECT * FROM elements LIMIT 10;

and then count how many are there:

SELECT COUNT(*) FROM elements;

Recap

We have created a table and inserted about a hundred rows in it; the table is replicated, in its entirety, on each of the two nodes that form the cluster.

Terminal
Node1 Admin
Node1 Console
Node1 CQLSH
Node2 Admin
Node2 Console
Node2 CQLSH