Getting Started with Litmus 😮😎
Litmus is a toolset to do cloud-native chaos engineering. Litmus provides tools to orchestrate chaos on Kubernetes to help SREs find weaknesses in their deployments. SREs use Litmus to run chaos experiments initially in the staging environment and eventually in production to find bugs, vulnerabilities. Fixing the weaknesses leads to increased resilience of the system.
Litmus takes a cloud-native approach to create, manage and monitor chaos. Chaos is orchestrated using the following Kubernetes Custom Resource Definitions (CRDs):
- ChaosEngine: A resource to link a Kubernetes application or Kubernetes node to a ChaosExperiment. ChaosEngine is watched by Litmus' Chaos-Operator which then invokes Chaos-Experiments
- ChaosExperiment: A resource to group the configuration parameters of a chaos experiment. ChaosExperiment CRs are created by the operator when experiments are invoked by ChaosEngine.
- ChaosResult: A resource to hold the results of a chaos-experiment. The Chaos-exporter reads the results and exports the metrics into a configured Prometheus server.
You should use Litmus because:
- Litmus provides chaos CRDs to manage chaos. Using chaos API, orchestration, scheduling and complex workflow management can be done declaratively.
- Most of the generic chaos experiments are readily available for you to get started with your initial chaos engineering needs.
- SDK is available in GO, Python and Ansible. A basic experiment structure is created quickly using SDK and developers and SREs just need to add the chaos logic into to make a new experiment.
- Simple to complex chaos workflows are easy to construct. Use GitOps and the chaos workflows to scale your chaos engineering efforts and increase the resilience of your Kubernetes platform.
In this scenario you will learn how to:
- Setup and install Litmus onto Kubernetes.
- Install Litmus experiments, RBAC and prepare the Chaos Engine.
- Deliver chaos experiments.
- Observe the chaos engine exercise your experiments.
Run your first chaos experiment 📹
Litmus is a toolset to do cloud-native chaos engineering. Litmus provides tools to orchestrate chaos on Kubernetes to help developers and SREs find weaknesses in their application deployments. Litmus can be used to run chaos experiments initially in the staging environment and eventually in production to find bugs, vulnerabilities. Fixing the weaknesses leads to increased resilience of the system. Litmus adopts a “Kubernetes-native” approach to define chaos intent in a declarative manner via custom resources.
The project is under active development as a Sandbox project with CNCF. This Katacoda scenario will be updated as it evolves.
With these steps you have learned:
✅ Setup and install Litmus onto Kubernetes.
✅ Install Litmus experiments, RBAC and prepare the Chaos Engine.
✅ Deliver chaos experiments.
✅ Observe the chaos engine exercise your experiments.
Do let us know how was your experience with Litmus and what are your suggestions on improving the same by filling up this form
In the last year we've seen Chaos Engineering move from a much talked-about idea to an accepted, mainstream approach to improving and assuring distributed system resilience. As organizations large and small begin to implement Chaos Engineering as an operational process, we're learning how to apply these techniques safely at scale. The approach is definitely not for everyone, and to be effective and safe, it requires organizational support at scale. -- ThoughtWorks Radar
Getting Started with Litmus
Setting up application against which chaos would be run
Setting up Nginx
We are going to apply Chaos to
nginx in this scenario. You can apply chaos to any other application but for this scenario we are going to consider
nginx as the application that we're going to apply chaos on.
You might need to wait for a few seconds to a minute for your dev environment to set up.
Next Let's deploy the nginx app on the default namespace
kubectl create deploy nginx --image=nginx
Verify if the pods are in running state
kubectl get pods --show-labels
You should be able to see something similar to this with a different hash attached to your pod label.
nginx-86c57db685-vpr22 1/1 Running 0 3m15s app=nginx,pod-template-hash=86c57db685