Performance and Scaling – Arvind Raja

Kubernetes Pod Disruption Budgets: Ensuring Application Availability

Introduction

When managing Kubernetes clusters, rolling updates, node drains, and scaling events can cause temporary downtime for applications. In a production environment, even a brief outage can impact users.

This is where Pod Disruption Budgets (PDBs) come in!

A Pod Disruption Budget ensures that a minimum number of pods remain available during voluntary disruptions like:
Node upgrades
Cluster maintenance
Manual pod evictions

By implementing PDBs, we can prevent downtime while still allowing controlled disruptions for cluster maintenance. Let’s see how to build a highly available application setup using PDBs in Kubernetes.

Step 1: Deploying a Sample Application

Let’s start with a simple Nginx deployment with three replicas.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

Check if all pods are running:

kubectl get pods -l app=nginx

Step 2: Creating a Pod Disruption Budget (PDB)

Now, let’s create a PDB to ensure that at least one pod is always running during disruptions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
  namespace: default
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: nginx

Click Here to Copy YAML

Apply the PDB:

kubectl apply -f nginx-pdb.yaml

Verify the PDB:

kubectl get poddisruptionbudget

Expected output:

NAME     MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS AGE
nginx-pdb   1              N/A               2               10s

This means at least 1 pod must always be running, and up to 2 pods can be disrupted at a time.

Step 3: Testing the Pod Disruption Budget

Let’s try to evict a pod and see how the PDB enforces availability:

kubectl drain <node-name> --ignore-daemonsets --force

If this eviction violates the PDB, Kubernetes will block the eviction to maintain the availability constraint.

To manually evict a pod:

kubectl delete pod <pod-name> --grace-period=0 --force

If this violates the PDB rules, Kubernetes will prevent the pod deletion.

Conclusion

Kubernetes Pod Disruption Budgets help maintain application availability during voluntary disruptions.
They ensure that a minimum number of pods always remain available.
Useful for high-availability applications and stateful workloads like databases.

With PDBs, you can perform cluster upgrades and maintenance without worrying about breaking your application’s availability!

Would you use PDBs in your setup? Let me know your thoughts in the comments!

Vertical Pod Autoscaling: Optimizing Resource Allocation

Introduction

Efficient resource allocation is crucial for maintaining performance and cost-effectiveness in Kubernetes. Traditional resource allocation requires developers to manually specify CPU and memory limits, often leading to over-provisioning or under-provisioning. The Vertical Pod Autoscaler (VPA) solves this issue by dynamically adjusting resource requests based on actual usage, ensuring that workloads run efficiently.

In this blog post, we will explore:

What is Vertical Pod Autoscaler (VPA)?
How does VPA work?
Step-by-step guide to implementing VPA in Kubernetes
YAML configurations and commands
Final thoughts on using VPA for optimal resource management

What is Vertical Pod Autoscaler (VPA)?

Vertical Pod Autoscaler (VPA) is a Kubernetes component that automatically adjusts the resource requests (CPU and memory) of pods. It continuously monitors the actual resource usage and updates the resource requests accordingly. This prevents over-provisioning (which leads to wasted resources) and under-provisioning (which can cause application crashes due to resource exhaustion).

Key Components of VPA:

Recommender – Analyzes past and current resource usage and provides recommendations for resource allocation.
Updater – Ensures that pods are restarted when their resource requirements deviate significantly from the recommended values.
Admission Controller – Modifies new pod resource requests based on the latest recommendations.

Deploying Vertical Pod Autoscaler in Kubernetes

Step 1: Install VPA in Your Cluster

To install VPA, clone the official Kubernetes autoscaler repository:

git clone https://github.com/kubernetes/autoscaler.git

Change to the VPA directory:

cd autoscaler/vertical-pod-autoscaler/

Deploy VPA components using the provided script:

./hack/vpa-up.sh

This command installs the necessary components into your Kubernetes cluster.

Step 2: Verify VPA Installation

After installation, check that VPA components are running:

kubectl get pods -n kube-system | grep vpa

Expected output:

vpa-admission-controller-xxxx Running
vpa-recommender-xxxx Running
vpa-updater-xxxx Running

Applying VPA to a Sample Deployment

Step 3: Deploy a Sample Application

Create a simple Nginx deployment without predefined CPU and memory requests.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
      - name: sample-container
        image: nginx

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f sample-deployment.yaml

Step 4: Deploy a VPA Resource

Create a VPA resource to manage the sample deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: sample-app-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       sample-app
  updatePolicy:
    updateMode: "Auto"

Click Here to Copy YAML

Apply the VPA configuration:

kubectl apply -f sample-vpa.yaml

Step 5: Monitor VPA Recommendations

Check the resource recommendations given by VPA:

kubectl describe vpa sample-app-vpa

This will show the recommended CPU and memory requests based on actual usage patterns.

Conclusion

Vertical Pod Autoscaler (VPA) ensures that Kubernetes workloads receive the right amount of resources, eliminating the guesswork involved in manual resource allocation. By dynamically adjusting CPU and memory requests, VPA enhances performance, reduces infrastructure costs, and prevents application failures due to resource starvation.

If you’re managing workloads that have fluctuating resource demands, integrating VPA into your Kubernetes setup can significantly improve cluster efficiency.

Start using VPA today and take your Kubernetes resource management to the next level! Drop your thoughts in the comments!

Implementing Rate Limiting in Kubernetes with NGINX Ingress

Introduction

In modern cloud-native applications, APIs are critical components that need to be protected from excessive requests to prevent abuse and ensure fair resource distribution. Rate limiting helps safeguard services from malicious attacks, accidental overloads, and unfair resource consumption.

In this post, we’ll explore how to implement rate limiting in Kubernetes using NGINX Ingress Controller annotations.

Why Rate Limiting Matters?

Prevents API abuse – Stops excessive requests from a single user.
Enhances reliability – Ensures fair usage of backend services.
Improves security – Mitigates potential DoS (Denial of Service) attacks.
Optimizes performance – Avoids unnecessary overloading of backend applications.

Prerequisites

Before implementing rate limiting, ensure you have the following:

A running Kubernetes cluster (Minikube, RKE2, or self-managed).
NGINX Ingress Controller installed.
An existing application exposed via an Ingress resource.

Step 1: Deploy the NGINX Ingress Controller

If you haven’t already installed the NGINX Ingress Controller, deploy it using Helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
    --namespace kube-system

Verify the deployment:

kubectl get pods -n kube-system | grep nginx-ingress

Once running, proceed to set up rate limiting.

Step 2: Deploy a Sample API

For demonstration purposes, let’s deploy a simple echo server as our backend API.

Deploy the API Pod & Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo-server
  labels:
    app: echo-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: echo-server
  template:
    metadata:
      labels:
        app: echo-server
    spec:
      containers:
      - name: echo-server
        image: k8s.gcr.io/echoserver:1.10
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: echo-server
spec:
  selector:
    app: echo-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP

Click Here to Copy YAML

Apply the deployment and service:

kubectl apply -f echo-server.yaml

Step 3: Configure Rate Limiting with an Ingress Resource

Create an Ingress Resource with Rate Limiting

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo-ingress
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "5"
    nginx.ingress.kubernetes.io/limit-burst: "10"
    nginx.ingress.kubernetes.io/limit-connections: "20"
spec:
  ingressClassName: nginx
  rules:
  - host: echo.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: echo-server
            port:
              number: 80

Click Here to Copy YAML

Apply the Ingress:

kubectl apply -f echo-ingress.yaml

To test locally, add an entry to /etc/hosts:

echo "127.0.0.1 echo.local" | sudo tee -a /etc/hosts

Step 4: Testing the Rate Limits

Use curl to send multiple requests and observe the rate limits in action.

for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" http://echo.local; done

If the rate limit is exceeded, you will start receiving 429 Too Many Requests responses.

Alternatively, use hey to simulate a load test:

hey -n 100 -c 10 http://echo.local

NGINX will enforce the limits defined in the Ingress annotations.

Step 5: Monitoring Rate Limiting Logs

To verify that rate limiting is working, check the logs of the NGINX Ingress Controller:

kubectl logs -n kube-system -l app.kubernetes.io/name=ingress-nginx

Look for logs indicating 429 Too Many Requests responses.

Conclusion

Implementing rate limiting in Kubernetes with NGINX Ingress is a powerful way to protect APIs from abuse while ensuring fair resource usage. By leveraging NGINX annotations, we can dynamically control:

Request rates
Burst handling
Concurrent connections

This setup is essential for production-grade applications, preventing DDoS attacks, and maintaining system stability.

Have you implemented rate limiting in your Kubernetes clusters? Share your experience in the comments!

Setting Up Cluster Autoscaler in Minikube for Development Testing

Introduction

Kubernetes Cluster Autoscaler automatically adjusts the number of nodes in your cluster based on pending workloads. While in production, this typically requires a cloud provider, Minikube provides a way to simulate autoscaling for development and testing.

In this guide, we’ll configure Cluster Autoscaler on Minikube, simulate scaling behaviors, and observe how it increases node capacity when needed.

The Problem: Autoscaling in Development Environments

In production, Kubernetes clusters dynamically scale nodes to handle workload spikes.
In local development, Minikube runs a single node by default, making it challenging to test Cluster Autoscaler.
Solution: Use Minikube’s multi-node feature and the Cluster Autoscaler to simulate real-world autoscaling scenarios.

Step 1: Start Minikube with Multiple Nodes

Since Minikube doesn’t support real autoscaling, we manually start it with multiple nodes to allow Cluster Autoscaler to scale between them.

minikube start --nodes 2

Verify the nodes are running:

kubectl get nodes

Step 2: Install Metrics Server

Cluster Autoscaler relies on resource metrics to make scaling decisions. Install the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify that the Metrics Server is running:

kubectl get deployment metrics-server -n kube-system

Step 3: Deploy Cluster Autoscaler

Now, deploy the Cluster Autoscaler to monitor and scale nodes.

Cluster Autoscaler Deployment YAML

Create a file called cluster-autoscaler.yaml and add:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        - name: cluster-autoscaler
          image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=minikube
            - --skip-nodes-with-local-storage=false
            - --skip-nodes-with-system-pods=false
          resources:
            requests:
              cpu: 100m
              memory: 300Mi
            limits:
              cpu: 500m
              memory: 500Mi

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f cluster-autoscaler.yaml

Check logs to ensure it’s running:

kubectl logs -f deployment/cluster-autoscaler -n kube-system

Step 4: Create a Workload that Triggers Scaling

Now, deploy a workload that requires more resources than currently available, forcing the Cluster Autoscaler to scale up.

Resource-Intensive Deployment YAML

Create a file high-memory-app.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-memory-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: high-memory-app
  template:
    metadata:
      labels:
        app: high-memory-app
    spec:
      containers:
        - name: stress
          image: polinux/stress
          command: ["stress"]
          args: ["--vm", "2", "--vm-bytes", "500M", "--timeout", "60s"]
          resources:
            requests:
              memory: "600Mi"
              cpu: "250m"
            limits:
              memory: "800Mi"
              cpu: "500m"

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f high-memory-app.yaml

Check if the pods are pending:

kubectl get pods -o wide

If you see pending pods, it means the Cluster Autoscaler should trigger node scaling.

Observing Autoscaler in Action

Now, let’s check how the autoscaler responds:

kubectl get nodes
kubectl get pods -A
kubectl logs -f deployment/cluster-autoscaler -n kube-system

You should see the Cluster Autoscaler increasing the node count to accommodate the pending pods. Once the workload decreases, it should scale down unused nodes.

Why Does This Matter?

Understand autoscaler behavior before deploying to production
Validate custom scaling policies in a local development setup
Optimize resource allocation for cost and performance efficiency

Even though Minikube doesn’t create new cloud nodes dynamically, this method helps developers test scaling triggers and behaviors before running on real cloud environments.

Conclusion: Build Smarter Autoscaling Strategies

Testing Cluster Autoscaler in Minikube provides valuable insights into Kubernetes scaling before moving to production. If you’re developing autoscaling-sensitive applications, mastering this setup ensures better efficiency, cost savings, and resilience.

Have you tested autoscaling in Minikube? Drop your thoughts in the comments!

Implementing Horizontal Pod Autoscaling Based on Custom Metrics

Introduction

Kubernetes provides Horizontal Pod Autoscaling (HPA) based on CPU and memory usage. However, many applications require scaling based on custom business metrics, such as:

Request throughput (e.g., HTTP requests per second)
Queue length in message brokers (e.g., Kafka, RabbitMQ)
Database load (e.g., active connections)

In this guide, we will configure HPA using custom metrics from Prometheus and expose them using the Prometheus Adapter.

Prerequisites

A running Kubernetes cluster
Prometheus installed for metric collection
Prometheus Adapter for exposing metrics

Step 1: Deploy Prometheus in Kubernetes

We use the kube-prometheus-stack Helm chart to install Prometheus:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Verify the installation:

kubectl get pods -n monitoring

Step 2: Deploy an Application with Custom Metrics

We will deploy an NGINX application that exposes custom HTTP request metrics.

Create the Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: "500m"
            memory: "256Mi"
          requests:
            cpu: "250m"
            memory: "128Mi"

Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-deployment.yaml

Expose the Application

Create a service to expose NGINX:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP

Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-service.yaml

Step 3: Configure Prometheus to Scrape Custom Metrics

Edit the prometheus.yaml config to scrape NGINX metrics:

scrape_configs:
  - job_name: "nginx"
    static_configs:
      - targets: ["nginx-service.default.svc.cluster.local:80"]

Apply the updated Prometheus config:

kubectl apply -f prometheus.yaml

Verify the metrics in Prometheus UI:

kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090

Open http://localhost:9090, and search for http_requests_total.

Step 4: Install Prometheus Adapter

Prometheus Adapter exposes custom metrics for Kubernetes autoscalers. Install it using Helm:

helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring

Verify the installation:

kubectl get pods -n monitoring | grep prometheus-adapter

Check if custom metrics are available:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Step 5: Create Horizontal Pod Autoscaler (HPA)

We now create an HPA that scales NGINX based on request rate.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: http_requests_total
      target:
        type: Value
        value: 100

Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-hpa.yaml

Check HPA status:

kubectl get hpa nginx-hpa

Step 6: Load Test and Observe Scaling

Use hey or wrk to simulate traffic:

hey -n 1000 -c 50 http://nginx-service.default.svc.cluster.local

Check if new pods are created:

kubectl get pods

Conclusion

By integrating Prometheus Adapter with Kubernetes HPA, we can scale applications based on business-specific metrics like request rates, queue lengths, or latency. This approach ensures better resource efficiency and application performance in cloud-native environments.

If you’re working with Kubernetes, stop relying only on CPU-based autoscaling! Custom metrics give you precision and efficiency. Drop your thoughts in the comments!