Managing Cassandra backups can be daunting. But Medusa, a component in the K8ssandra ecosystem, simplifies your life considerably!
In this scenario, we will:
- Install Minio, an AWS S3 compatible local storage manager
- Install K8ssandra with Medusa enabled
- Deploy the sample Pet Clinic app
- Perform a backup
- Modify the contents of the Cassandra database using the sample app
- Restore the database contents to its original state
Let's get started!
This scenario showed you how backup Cassandra using Medusa in K8ssandra.
In this scenario, we:
- Installed Minio, an AWS S3 compatible local storage manager
- Installed K8ssandra with Medusa enabled
- Deployed the sample Pet Clinic app
- Performed a backup
- Modified the Cassandra database using the sample app
- Restored the database to its original state
Medusa makes it easy to backup your Cassandra database!
Automating Backup of Cassandra Clusters in Kubernetes with Medusa
Set up the environment
In this first step we'll get set up by creating a Kubernetes cluster.
Here are the specific pieces we are setting up:
What is kubectl?
kubectl is the command line interface to Kubernetes. It is a very versatile command with many sub-commands and options. Read more here.
What is KinD?
KinD is development tool we are using to create a Kubernetes cluster running inside a Docker container. As you know, most people use Kubernetes to manage systems of Docker containers. So, KinD is a Docker container that runs Kubernetes to manage other Docker containers - it's a bit recursive.
We use KinD so we can create a many-node Kubernetes cluster on a single machine. KinD is great because it's relatively light-weight, easy to install and easy to use.
For your reference, here are the commands we used to install KinD.
curl -Lo ./kind https://github.com/kubernetes-sigs/kind/releases/download/v0.7.0/kind-$(uname)-amd64 chmod +x ./kind mv ./kind /usr/local/bin/kind
Read more here.
What is Helm?
Helm is a package manager (like apt or yum) for Kubernetes. Helm allows you to install and update Kubernetes applications. Helm uses charts. A chart is a specification file we use to tell Helm how to do the installation. Helm downloads the charts from a Helm repo. You can read more here.
- four-node Kubernetes cluster using KinD
Why do we need a four-node Kubernetes cluster?
First, it is important to understand we are working with two types of clusters: a Kubernetes cluster and a Cassandra cluster. The Kubernetes cluster is a set of machines called Kubernetes nodes. The Cassandra cluster is a set of those Kubernetes nodes that host and run the Cassandra software.
From the Kubernetes website: A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.
We are setting up a four-node Kubernetes cluster so that we have one admin node and three worker nodes. We'll use one worker nodes for the Cassandra cluster, another worker node for the Pet Clinic frontend software and yet another node for the Pet Clinic backend software. We are using KinD to create the Kubernetes cluster. You can review the KinD cluster configuration in this file.Open
In this file you see that we are creating four nodes: a single control-plane node and three worker nodes. Every Kubernetes cluster needs at least one control-plane node to manage the cluster. The worker nodes are where we deploy our Kubernetes resources. The other details in the file are specific to KinD.
- Nginx ingress controller
What is an ingress and how does it fit into the Kubernetes architecture?
An ingress provides access from outside a Kubernetes cluster to the components inside the cluster. The controller we are deploying manages instances of ingresses. We'll deploy an instance of an ingress when we install the app.
An ingress usually sits in front of a Kubernetes service.
As a brief refresher, the Kubernetes architecture consists of:
- Containers - usually Docker containers that provide an isolated environment for a program
- Pods - encapsulate one or more containers
- Deployments - encapsulate the replication of pods
- Services - often work as load balancers for a deployment of pods
- Nodes - machines for hosting Pods
Here's a diagram of these components that shows the position of the ingress. Note that we left out the nodes because the diagram gets too cluttered, but you can imagine that Kubernetes maps the various components to nodes/machines within the cluster (you can click on the image to enlarge it).
Note: In this scenario we are using the Nginix ingress controller. By default, K8ssandra uses the Traefik ingress controller for many of its default settings, which include host-based URLs. However, Katacoda (which controls the VM we are using for this exercise) has a proxy, which means we can't used host-based URLs. Instead, we use path-based URLs, which Nginx easily supports. By default, Traefik doesn't support path-based urls. So, to keep things as simple as possible, we are using the Nginix ingress controller.
It's a fair amount of work to configure and deploy all the resources necessary for this scenario, so please be patient as it completes.
When all five installation steps are complete, you can proceed to the next step.