Introduction
In modern cloud-native applications, APIs are critical components that need to be protected from excessive requests to prevent abuse and ensure fair resource distribution. Rate limiting helps safeguard services from malicious attacks, accidental overloads, and unfair resource consumption.
In this post, we’ll explore how to implement rate limiting in Kubernetes using NGINX Ingress Controller annotations.
Why Rate Limiting Matters?
- Prevents API abuse – Stops excessive requests from a single user.
- Enhances reliability – Ensures fair usage of backend services.
- Improves security – Mitigates potential DoS (Denial of Service) attacks.
- Optimizes performance – Avoids unnecessary overloading of backend applications.
Prerequisites
Before implementing rate limiting, ensure you have the following:
- A running Kubernetes cluster (Minikube, RKE2, or self-managed).
- NGINX Ingress Controller installed.
- An existing application exposed via an Ingress resource.
Step 1: Deploy the NGINX Ingress Controller
If you haven’t already installed the NGINX Ingress Controller, deploy it using Helm:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace kube-system
Verify the deployment:
kubectl get pods -n kube-system | grep nginx-ingress
Once running, proceed to set up rate limiting.
Step 2: Deploy a Sample API
For demonstration purposes, let’s deploy a simple echo server as our backend API.
Deploy the API Pod & Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-server
labels:
app: echo-server
spec:
replicas: 2
selector:
matchLabels:
app: echo-server
template:
metadata:
labels:
app: echo-server
spec:
containers:
- name: echo-server
image: k8s.gcr.io/echoserver:1.10
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: echo-server
spec:
selector:
app: echo-server
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
Apply the deployment and service:
kubectl apply -f echo-server.yaml
Step 3: Configure Rate Limiting with an Ingress Resource
Create an Ingress Resource with Rate Limiting
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo-ingress
annotations:
nginx.ingress.kubernetes.io/limit-rps: "5"
nginx.ingress.kubernetes.io/limit-burst: "10"
nginx.ingress.kubernetes.io/limit-connections: "20"
spec:
ingressClassName: nginx
rules:
- host: echo.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo-server
port:
number: 80
Apply the Ingress:
kubectl apply -f echo-ingress.yaml
To test locally, add an entry to /etc/hosts:
echo "127.0.0.1 echo.local" | sudo tee -a /etc/hosts
Step 4: Testing the Rate Limits
Use curl to send multiple requests and observe the rate limits in action.
for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" http://echo.local; done
If the rate limit is exceeded, you will start receiving 429 Too Many Requests responses.
Alternatively, use hey to simulate a load test:
hey -n 100 -c 10 http://echo.local
NGINX will enforce the limits defined in the Ingress annotations.
Step 5: Monitoring Rate Limiting Logs
To verify that rate limiting is working, check the logs of the NGINX Ingress Controller:
kubectl logs -n kube-system -l app.kubernetes.io/name=ingress-nginx
Look for logs indicating 429 Too Many Requests responses.
Conclusion
Implementing rate limiting in Kubernetes with NGINX Ingress is a powerful way to protect APIs from abuse while ensuring fair resource usage. By leveraging NGINX annotations, we can dynamically control:
Request rates
Burst handling
Concurrent connections
This setup is essential for production-grade applications, preventing DDoS attacks, and maintaining system stability.
Have you implemented rate limiting in your Kubernetes clusters? Share your experience in the comments!