Cassandra clusters in production must perform anti-entropy (repair) operations in order to maintain data consistency. Reaper is the preferred solution for anti-entropy operations in Cassandra at scale.
What are anti-entropy (repair) operations in Caasandra?
Every Cassandra cluster will need repairs to maintain data consistency. Tombstones (deletions) or unavailable nodes are common causes of data inconsistency. The Casssandra repair process compares data between replicas and updates all replicas to the latest data. View the Cassandra docs for more details on anti-entropy repair.
This scenario adds Reaper to a Cassandra cluster running in Kubernetes with the cass operator.
In this scenario, we'll:
- Set up a Cassandra cluster running on Kubernetes
- Install and configure Reaper
- Install the example Pet Clinic app
- View and manage anti-entropy operations in the Cassandra cluster
Let's get started!
In this scenarion, we learned how to:
- Install Reaper in a Kubernetes cluster
- Configure Reaper to use Cassandra as a backing store
- Connect Reaper to Cassandra to control repairs
- Perform immediate and scheduled repairs in Cassandra
The setup for Reaper was relatively complex. One of the advantages of using K8ssandra to run Cassandra in Kubernetes is that it takes care of configuring the most commonly used tools for managing Cassandra in production; Reaper, Prometheus, Grafana and Medusa.
Automated Repair - Reaper
Create a Cassandra Cluster
In this step you will create a three node Cassandra cluster. In the terminal you will see Katacoda setting up the environment by installing kind and Helm, creating the Kubernetes cluster, configuuring the Ingress and installing the Cassandra Operator.
While you wait for the environment, click to open
cassandra-cluster.yaml in the Katacoda editor.
Reaper uses Java Management Extensions(JMX) to interact with Cassandra clusters. In this scenario, Reaper will be running separately from the Cassandra cluster in its own pod. Therefore, we need to enable remote JMX access in the Cassandra cluster. The configuration file sets the
LOCAL_JMX environment variable to no and puts the JMX credentials in
Wait until you see that the environment has been created.
Start the cluster creation.
kubectl apply -f cassandra-cluster.yaml
watch command with
kubectl to view pod info including status. (The
watch command will update this info every 2s.)
watch kubectl get pods
The Cassandra cluster will be fully up and running when
READY state of all three cluster-dc1 pods is
Pro Tip: It may take 4-5 minutes before all three Cassandra nodes are up and running.
Click to send a
Ctrl-C to stop monitoring the pod state.