Implementing Horizontal Pod Autoscaling Based on Custom Metrics

Introduction

Kubernetes provides Horizontal Pod Autoscaling (HPA) based on CPU and memory usage. However, many applications require scaling based on custom business metrics, such as:

✅ Request throughput (e.g., HTTP requests per second)
✅ Queue length in message brokers (e.g., Kafka, RabbitMQ)
✅ Database load (e.g., active connections)

In this guide, we will configure HPA using custom metrics from Prometheus and expose them using the Prometheus Adapter.

Prerequisites

  • A running Kubernetes cluster
  • Prometheus installed for metric collection
  • Prometheus Adapter for exposing metrics

Step 1: Deploy Prometheus in Kubernetes

We use the kube-prometheus-stack Helm chart to install Prometheus:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Verify the installation:

kubectl get pods -n monitoring

Step 2: Deploy an Application with Custom Metrics

We will deploy an NGINX application that exposes custom HTTP request metrics.

Create the Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: "500m"
            memory: "256Mi"
          requests:
            cpu: "250m"
            memory: "128Mi"
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-deployment.yaml

Expose the Application

Create a service to expose NGINX:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-service.yaml

Step 3: Configure Prometheus to Scrape Custom Metrics

Edit the prometheus.yaml config to scrape NGINX metrics:

scrape_configs:
  - job_name: "nginx"
    static_configs:
      - targets: ["nginx-service.default.svc.cluster.local:80"]

Apply the updated Prometheus config:

kubectl apply -f prometheus.yaml

Verify the metrics in Prometheus UI:

kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090

Open http://localhost:9090, and search for http_requests_total.

Step 4: Install Prometheus Adapter

Prometheus Adapter exposes custom metrics for Kubernetes autoscalers. Install it using Helm:

helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring

Verify the installation:

kubectl get pods -n monitoring | grep prometheus-adapter

Check if custom metrics are available:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Step 5: Create Horizontal Pod Autoscaler (HPA)

We now create an HPA that scales NGINX based on request rate.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: http_requests_total
      target:
        type: Value
        value: 100
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-hpa.yaml

Check HPA status:

kubectl get hpa nginx-hpa

Step 6: Load Test and Observe Scaling

Use hey or wrk to simulate traffic:

hey -n 1000 -c 50 http://nginx-service.default.svc.cluster.local

Check if new pods are created:

kubectl get pods

Conclusion

By integrating Prometheus Adapter with Kubernetes HPA, we can scale applications based on business-specific metrics like request rates, queue lengths, or latency. This approach ensures better resource efficiency and application performance in cloud-native environments.

If you’re working with Kubernetes, stop relying only on CPU-based autoscaling! Custom metrics give you precision and efficiency. Drop your thoughts in the comments! 👇

Leave a comment