Scaling Your Applications, Automatically
Almost always there will be more than once instance of each of your applications on Kubernetes. Multiple instances provide both fault tolerance and increased traffic serving when the demand for your service increases. After all, why did you move your applications to a distributed platform like Kubernetes? Because you want to leverage large amounts of CPU, memory, and I/O across your cluster. However, as you know these resources cost money so you only want your service replications to increase when the demand increases. When service demand is low the instances should scale down to save you money, and lessen your carbon footprint.
There are three types of scaling in Kubernetes:
This scenario shows you how to achieve Horizontal Pod Scaling, automatically. While you can scale manually, you really want scaling to be automatic based on demand, so the complete name for thi Kubernetes feature is the Horizontal Pod Autoscaler (HPA).
Basic automatic scaling is simply achieved by declaring the CPU threshold and the minimum and maximum number of Pods to scale up and the minimum Pod count down. Exceeding the CPU threshold is monitored by observing the current CPU load metric and triggering scaling events when the activity goes up or down within a specified period. It's essentially a control loop comparing metrics against declared states.
In the following steps you will learn how to:
- install the metrics-server for gathering metrics,
- install a Pod that can be scaled,
- define the scaling rules and the number of pods to scale up and down,
- increase service demand to trigger scaling up,
- observe scaling up and down.
The scenario introduced the fundamental techniques to scale up and down your Pods in a Kubernetes cluster using the Horizontal Pod AutoScaler (HPA). There are more complex rules that can be applied to the HPA triggering logic and the HPA can reference metrics from other metrics registeries such as Prometheus. The HPA uses the standardized Custom Metrics API to reference metrics from different sources.
With these steps you have learned how to:
- ✔ install the metrics-server for gathering metrics,
- ✔ install a Pod that can be scaled,
- ✔ define the scaling rules and the number of Pods to scale up and down,
- ✔ increase service demand to trigger scaling up,
- ✔ observe scaling up and down.
- Kubernetes Metrics Server
- Horizontal Pod Autoscaler Walkthrough
- Horizontal Pod Scaling
- Cluster Node Scaling
- Vertical Pod Scaling
- Resource quotas
- Load testing tool, Locus
- Locust Helm Chart
Troubleshooting your Horizontal Pod Autoscaler
Your Kubernetes Cluster
For this scenario, Katacoda has just started a fresh Kubernetes cluster for you. Verify it's ready for your use.
kubectl version --short && \
kubectl get componentstatus && \
kubectl get nodes && \
The Helm package manager used for installing applications on Kubernetes is also available.
helm version --short
You can administer your cluster with the
kubectl CLI tool or use the visual Kubernetes Dashboard. Use this script to access the protected Dashboard.