Introduction
Kubernetes provides Horizontal Pod Autoscaling (HPA) based on CPU and memory usage. However, many applications require scaling based on custom business metrics, such as:
Request throughput (e.g., HTTP requests per second)
Queue length in message brokers (e.g., Kafka, RabbitMQ)
Database load (e.g., active connections)
In this guide, we will configure HPA using custom metrics from Prometheus and expose them using the Prometheus Adapter.
Prerequisites
- A running Kubernetes cluster
- Prometheus installed for metric collection
- Prometheus Adapter for exposing metrics
Step 1: Deploy Prometheus in Kubernetes
We use the kube-prometheus-stack Helm chart to install Prometheus:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
Verify the installation:
kubectl get pods -n monitoring
Step 2: Deploy an Application with Custom Metrics
We will deploy an NGINX application that exposes custom HTTP request metrics.
Create the Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
resources:
limits:
cpu: "500m"
memory: "256Mi"
requests:
cpu: "250m"
memory: "128Mi"
Apply it:
kubectl apply -f nginx-deployment.yaml
Expose the Application
Create a service to expose NGINX:
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: default
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
Apply it:
kubectl apply -f nginx-service.yaml
Step 3: Configure Prometheus to Scrape Custom Metrics
Edit the prometheus.yaml config to scrape NGINX metrics:
scrape_configs:
- job_name: "nginx"
static_configs:
- targets: ["nginx-service.default.svc.cluster.local:80"]
Apply the updated Prometheus config:
kubectl apply -f prometheus.yaml
Verify the metrics in Prometheus UI:
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090
Open http://localhost:9090, and search for http_requests_total.
Step 4: Install Prometheus Adapter
Prometheus Adapter exposes custom metrics for Kubernetes autoscalers. Install it using Helm:
helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring
Verify the installation:
kubectl get pods -n monitoring | grep prometheus-adapter
Check if custom metrics are available:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .
Step 5: Create Horizontal Pod Autoscaler (HPA)
We now create an HPA that scales NGINX based on request rate.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: http_requests_total
target:
type: Value
value: 100
Apply it:
kubectl apply -f nginx-hpa.yaml
Check HPA status:
kubectl get hpa nginx-hpa
Step 6: Load Test and Observe Scaling
Use hey or wrk to simulate traffic:
hey -n 1000 -c 50 http://nginx-service.default.svc.cluster.local
Check if new pods are created:
kubectl get pods
Conclusion
By integrating Prometheus Adapter with Kubernetes HPA, we can scale applications based on business-specific metrics like request rates, queue lengths, or latency. This approach ensures better resource efficiency and application performance in cloud-native environments.
If you’re working with Kubernetes, stop relying only on CPU-based autoscaling! Custom metrics give you precision and efficiency. Drop your thoughts in the comments! 