Implementing Rate Limiting in Kubernetes with NGINX Ingress

Introduction

In modern cloud-native applications, APIs are critical components that need to be protected from excessive requests to prevent abuse and ensure fair resource distribution. Rate limiting helps safeguard services from malicious attacks, accidental overloads, and unfair resource consumption.

In this post, we’ll explore how to implement rate limiting in Kubernetes using NGINX Ingress Controller annotations.

Why Rate Limiting Matters?

Prevents API abuse – Stops excessive requests from a single user.
Enhances reliability – Ensures fair usage of backend services.
Improves security – Mitigates potential DoS (Denial of Service) attacks.
Optimizes performance – Avoids unnecessary overloading of backend applications.

Prerequisites

Before implementing rate limiting, ensure you have the following:

A running Kubernetes cluster (Minikube, RKE2, or self-managed).
NGINX Ingress Controller installed.
An existing application exposed via an Ingress resource.

Step 1: Deploy the NGINX Ingress Controller

If you haven’t already installed the NGINX Ingress Controller, deploy it using Helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
    --namespace kube-system

Verify the deployment:

kubectl get pods -n kube-system | grep nginx-ingress

Once running, proceed to set up rate limiting.

Step 2: Deploy a Sample API

For demonstration purposes, let’s deploy a simple echo server as our backend API.

Deploy the API Pod & Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo-server
  labels:
    app: echo-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: echo-server
  template:
    metadata:
      labels:
        app: echo-server
    spec:
      containers:
      - name: echo-server
        image: k8s.gcr.io/echoserver:1.10
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: echo-server
spec:
  selector:
    app: echo-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP

Click Here to Copy YAML

Apply the deployment and service:

kubectl apply -f echo-server.yaml

Step 3: Configure Rate Limiting with an Ingress Resource

Create an Ingress Resource with Rate Limiting

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo-ingress
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "5"
    nginx.ingress.kubernetes.io/limit-burst: "10"
    nginx.ingress.kubernetes.io/limit-connections: "20"
spec:
  ingressClassName: nginx
  rules:
  - host: echo.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: echo-server
            port:
              number: 80

Click Here to Copy YAML

Apply the Ingress:

kubectl apply -f echo-ingress.yaml

To test locally, add an entry to /etc/hosts:

echo "127.0.0.1 echo.local" | sudo tee -a /etc/hosts

Step 4: Testing the Rate Limits

Use curl to send multiple requests and observe the rate limits in action.

for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" http://echo.local; done

If the rate limit is exceeded, you will start receiving 429 Too Many Requests responses.

Alternatively, use hey to simulate a load test:

hey -n 100 -c 10 http://echo.local

NGINX will enforce the limits defined in the Ingress annotations.

Step 5: Monitoring Rate Limiting Logs

To verify that rate limiting is working, check the logs of the NGINX Ingress Controller:

kubectl logs -n kube-system -l app.kubernetes.io/name=ingress-nginx

Look for logs indicating 429 Too Many Requests responses.

Conclusion

Implementing rate limiting in Kubernetes with NGINX Ingress is a powerful way to protect APIs from abuse while ensuring fair resource usage. By leveraging NGINX annotations, we can dynamically control:

Request rates
Burst handling
Concurrent connections

This setup is essential for production-grade applications, preventing DDoS attacks, and maintaining system stability.

Have you implemented rate limiting in your Kubernetes clusters? Share your experience in the comments!