Building Event-Driven Architectures with Kubernetes and NATS

Introduction

Modern cloud-native applications demand scalability, flexibility, and resilience. Traditional request-response communication patterns often lead to tight coupling between services, making them hard to scale independently. Event-driven architectures (EDA) solve this by enabling asynchronous, loosely coupled communication between microservices.

In this article, we will explore how to build an event-driven system using NATS (a lightweight, high-performance messaging system) on Kubernetes. We will:

  • Deploy a NATS messaging broker
  • Create a publisher service that emits events
  • Develop a subscriber service that listens and processes events
  • Demonstrate event-driven communication with Kubernetes

Prerequisites

Before we begin, ensure you have the following:

  • A running Kubernetes cluster (Minikube, k3s, or a self-managed cluster)
  • kubectl installed and configured
  • Helm installed for deploying NATS
  • Docker installed for building container images

Step 1: Deploy NATS on Kubernetes

We will use Helm to deploy NATS.

Install NATS using Helm

helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update
helm install nats nats/nats --namespace default

Verify that the NATS pods are running:

kubectl get pods -l app.kubernetes.io/name=nats

Step 2: Create a Publisher Service

Our publisher will send messages to NATS on a specific subject.

Publisher Code (publisher.py)

import nats
import asyncio

async def main():
    nc = await nats.connect("nats://nats.default.svc.cluster.local:4222")
    await nc.publish("events.data", b"Hello, this is an event message!")
    print("Message sent!")
    await nc.close()

if __name__ == "__main__":
    asyncio.run(main())
Click Here to Copy Python Code

Dockerfile for Publisher

FROM python:3.8
WORKDIR /app
COPY publisher.py .
RUN pip install nats-py
CMD ["python", "publisher.py"]

Build the Image

docker build -t mygrpc-server:latest .

Publisher Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: publisher
spec:
  replicas: 1
  selector:
    matchLabels:
      app: publisher
  template:
    metadata:
      labels:
        app: publisher
    spec:
      containers:
        - name: publisher
          image: mygrpc-server:latest
          env:
            - name: NATS_SERVER
              value: "nats://nats.default.svc.cluster.local:4222"
Click Here to Copy YAML

Deploy the publisher:

kubectl apply -f publisher-deployment.yaml

Step 3: Create a Subscriber Service

Our subscriber listens to the events.data subject and processes messages.

Subscriber Code (subscriber.py)

import nats
import asyncio

async def message_handler(msg):
    subject = msg.subject
    data = msg.data.decode()
    print(f"Received message on {subject}: {data}")

async def main():
    nc = await nats.connect("nats://nats.default.svc.cluster.local:4222")
    await nc.subscribe("events.data", cb=message_handler)
    print("Listening for events...")
    while True:
        await asyncio.sleep(1)

if __name__ == "__main__":
    asyncio.run(main())
Click Here to Copy Python Code

Dockerfile for Subscriber

FROM python:3.8
WORKDIR /app
COPY subscriber.py .
RUN pip install nats-py
CMD ["python", "subscriber.py"]

Build the Image

docker build -t mygrpc-subscriber:latest .

Subscriber Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: subscriber
spec:
  replicas: 1
  selector:
    matchLabels:
      app: subscriber
  template:
    metadata:
      labels:
        app: subscriber
    spec:
      containers:
        - name: subscriber
          image: mygrpc-subscriber:latest
          env:
            - name: NATS_SERVER
              value: "nats://nats.default.svc.cluster.local:4222"
Click Here to Copy YAML

Deploy the subscriber:

kubectl apply -f subscriber-deployment.yaml

Step 4: Test the Event-Driven Architecture

Once all components are deployed, check logs for event propagation.

  1. Check the subscriber logs:
kubectl logs -l app=subscriber
  1. Trigger the publisher manually:
kubectl delete pod -l app=publisher
  1. Observe subscriber receiving events: If everything is set up correctly, the subscriber should print:
Received message on events.data: Hello, this is an event message!

Conclusion

We successfully built an event-driven system using Kubernetes and NATS. This architecture allows microservices to communicate asynchronously, improving scalability, resilience, and maintainability.

Key takeaways:

  • NATS simplifies pub-sub messaging in Kubernetes.
  • Event-driven patterns decouple services and improve scalability.
  • Kubernetes provides a flexible infrastructure to deploy and manage such systems.

This architecture can be extended with multiple subscribers, durable streams, and event filtering for more advanced use cases. If you have any questions let me know in the comments!👇

Practical gRPC Communication Between Kubernetes Services

Introduction

Microservices architectures require efficient communication. While REST APIs are widely used, gRPC is a better alternative when high performance, streaming capabilities, and strict API contracts are required.

In this guide, we’ll set up gRPC communication between two services in Kubernetes:

  1. grpc-server → A gRPC server that provides a simple API.
  2. grpc-client → A client that interacts with the gRPC server.

This tutorial covers everything from scratch, including .proto files, Docker images, Kubernetes manifests, and testing.

Prerequisites

  • Kubernetes cluster (Minikube, Kind, or self-hosted)
  • kubectl installed
  • Docker installed
  • Basic knowledge of gRPC and Protobuf

Step 1: Define the gRPC Service Using Protocol Buffers

Create a file called service.proto:

syntax = "proto3";

package grpcservice;

// Define the gRPC Service
service Greeter {
    rpc SayHello (HelloRequest) returns (HelloReply);
}

// Define Request message
message HelloRequest {
    string name = 1;
}

// Define Response message
message HelloReply {
    string message = 1;
}
Click Here to Copy Python Code

Step 2: Implement the gRPC Server and Client

gRPC Server (Python)

Create a file called server.py:

import grpc
from concurrent import futures
import time
import service_pb2
import service_pb2_grpc

class GreeterServicer(service_pb2_grpc.GreeterServicer):
    def SayHello(self, request, context):
        return service_pb2.HelloReply(message=f"Hello, {request.name}!")

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    service_pb2_grpc.add_GreeterServicer_to_server(GreeterServicer(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    print("gRPC Server is running on port 50051...")
    server.wait_for_termination()

if __name__ == "__main__":
    serve()
Click Here to Copy Python Code

gRPC Client (Python)

Create a file called client.py:

import grpc
import service_pb2
import service_pb2_grpc

def run():
    channel = grpc.insecure_channel('grpc-server:50051')
    stub = service_pb2_grpc.GreeterStub(channel)
    response = stub.SayHello(service_pb2.HelloRequest(name="Kubernetes"))
    print("Server response:", response.message)

if __name__ == "__main__":
    run()
Click Here to Copy Python Code

Step 3: Create Docker Images

Create a Dockerfile for both the server and client.

Dockerfile for gRPC Server

FROM python:3.8
WORKDIR /app
COPY server.py service_pb2.py service_pb2_grpc.py .
RUN pip install grpcio grpcio-tools
CMD ["python", "server.py"]

Dockerfile for gRPC Client

FROM python:3.8
WORKDIR /app
COPY client.py service_pb2.py service_pb2_grpc.py .
RUN pip install grpcio grpcio-tools
CMD ["python", "client.py"]

Build and Push Images

Run the following commands:

docker build -t mygrpc-server:latest -f Dockerfile .
docker build -t mygrpc-client:latest -f Dockerfile .

Step 4: Deploy gRPC Services in Kubernetes

Deployment and Service for gRPC Server

Create grpc-server-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grpc-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grpc-server
  template:
    metadata:
      labels:
        app: grpc-server
    spec:
      containers:
        - name: grpc-server
          image: mygrpc-server:latest
          ports:
            - containerPort: 50051
---
apiVersion: v1
kind: Service
metadata:
  name: grpc-server
spec:
  selector:
    app: grpc-server
  ports:
    - protocol: TCP
      port: 50051
      targetPort: 50051
Click Here to Copy YAML

Deployment for gRPC Client

Create grpc-client-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grpc-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grpc-client
  template:
    metadata:
      labels:
        app: grpc-client
    spec:
      containers:
        - name: grpc-client
          image: mygrpc-client:latest
Click Here to Copy YAML

Step 5: Apply the Manifests

Deploy everything to Kubernetes:

kubectl apply -f grpc-server-deployment.yaml
kubectl apply -f grpc-client-deployment.yaml

Check the status:

kubectl get pods

Step 6: Testing gRPC Communication

To see if the client successfully communicates with the server:

kubectl logs -l app=grpc-client

If everything works, you should see an output like:

Server response: Hello, Kubernetes!

Step 7: Exposing gRPC Service Externally (Optional)

If you want to expose the gRPC service externally using an Ingress, create grpc-ingress.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grpc-ingress
spec:
  rules:
  - host: grpc.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grpc-server
            port:
              number: 50051
Click Here to Copy YAML

Apply the ingress:

kubectl apply -f grpc-ingress.yaml

Conclusion

gRPC on Kubernetes ensures fast, efficient, and scalable communication.
We built a gRPC server and client, deployed them on Kubernetes, and established seamless service-to-service communication.
This setup is ideal for high-performance microservices architectures.

Are you using gRPC in Kubernetes? Share your experience in the comments!👇

Serverless on Kubernetes: Setting Up Knative Serving

Introduction

In modern cloud-native environments, developers seek the best of both worlds: the flexibility of Kubernetes and the simplicity of serverless computing. Knative Serving brings serverless capabilities to Kubernetes, enabling auto-scaling, scale-to-zero, and request-driven execution.

Why Knative?

  • Auto-scaling – Scale pods based on incoming traffic.
  • Scale-to-zero – When no traffic exists, Knative frees up resources.
  • Traffic Splitting – Deploy multiple versions and roll out updates safely.
  • Event-driven – Respond dynamically to requests without managing infra manually.

In this guide, we’ll install Knative Serving on Kubernetes and deploy a simple serverless application.

Prerequisites

Ensure you have:
 âś… A running Kubernetes cluster (Minikube, K3s, or any managed Kubernetes).
 âś… kubectl installed and configured.
 âś… A valid domain name (or use a local DNS setup like sslip.io).

Step 1: Installing Knative Serving

Install the Required CRDs

Knative requires custom resources for managing serving components. Apply them to your cluster.

kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-crds.yaml

Install the Knative Core Components

kubectl apply -f https://github.com/knative/serving/releases/latest/download/serving-core.yaml

Install a Networking Layer

Knative requires an ingress to route traffic. We’ll use Kourier (a lightweight option).

kubectl apply -f https://github.com/knative/net-kourier/releases/latest/download/kourier.yaml
kubectl patch configmap/config-network --namespace knative-serving --type merge --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'

Verify Installation

kubectl get pods -n knative-serving

All Knative components should be in Running state.

Step 2: Deploying a Serverless Application

We’ll deploy a simple Hello World application using Knative Serving.

Create a Knative Service

Define a KnativeService resource for our application.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello-world
  namespace: default
spec:
  template:
    spec:
      containers:
        - image: gcr.io/knative-samples/helloworld-go
          env:
            - name: TARGET
              value: "Knative on Kubernetes"
Click Here to Copy YAML

Apply the manifest:

kubectl apply -f hello-world.yaml

Step 3: Testing the Serverless Application

Get the External URL

Run the following command to get the app URL:

kubectl get ksvc hello-world

You should see an output like:

NAME          URL                                      LATESTCREATED   LATESTREADY
hello-world   http://hello-world.default.example.com   hello-world-00001   hello-world-00001

To test it:

curl http://hello-world.default.example.com

You should see “Hello Knative on Kubernetes!”

Step 4: Auto-Scaling & Scale-to-Zero in Action

Knative automatically scales up when there’s traffic and scales down to zero when idle.

Send multiple requests to trigger scaling:

hey -n 100 -c 10 http://hello-world.default.example.com

Watch the pods scaling:

kubectl get pods -w

Wait a few minutes and check again:

kubectl get pods

If no traffic exists, the app scales to zero, freeing up cluster resources!

Conclusion

With Knative Serving, we have transformed Kubernetes into a serverless powerhouse!

  • Deployed request-driven applications
  • Enabled automatic scaling and scale-to-zero
  • Simplified service deployment with KnativeService

Knative gives us the best of Kubernetes and serverless—scalability, flexibility, and resource efficiency. Now, you can deploy event-driven applications without worrying about infrastructure overhead.

Follow for more Kubernetes and cloud-native insights! Drop your thoughts in the comments!👇

Implementing the Circuit Breaker Pattern in Kubernetes

Introduction

In a microservices architecture, services communicate with each other over the network, which introduces latency, failures, and timeouts. If one service fails, it can cause cascading failures, leading to a complete system outage. The Circuit Breaker Pattern helps prevent these failures from propagating, ensuring system resilience.

In this blog, we’ll set up circuit breaking in Kubernetes using Istio, implement failure handling, and demonstrate how to recover from failures gracefully.

Why Circuit Breakers?

The Problem

  • Unstable services: If a dependent service is slow or failing, all requests pile up, increasing resource consumption.
  • Cascading failures: A single failing service can bring down the entire system.
  • Poor user experience: Without intelligent request handling, users experience timeouts and failures.

The Solution: Circuit Breaker Pattern

  • Detects when a service is slow or failing and temporarily stops sending requests.
  • Prevents resource exhaustion by limiting concurrent requests.
  • Automatically recovers once the service is stable.

Step 1: Setting Up Circuit Breaking in Kubernetes

Prerequisites

  • A running Kubernetes cluster (Minikube, kind, or any managed K8s).
  • Istio Service Mesh installed.

Step 2: Deploying Sample Microservices

We will deploy two microservices:

  • Product Service (simulates a reliable service).
  • Order Service (calls the Product Service, sometimes failing).

Deploy the Product Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: product-service
  template:
    metadata:
      labels:
        app: product-service
    spec:
      containers:
      - name: product-service
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: product-service
spec:
  selector:
    app: product-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
Click Here to Copy YAML

Deploy the Order Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: httpd
        ports:
        - containerPort: 80
        env:
        - name: PRODUCT_SERVICE_URL
          value: "http://product-service"
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  selector:
    app: order-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
Click Here to Copy YAML

Step 3: Enabling Circuit Breaking with Istio

Now, let’s limit requests to the Product Service using Istio’s DestinationRule.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: product-service-circuit-breaker
spec:
  host: product-service
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutive5xxErrors: 2
      interval: 5s
      baseEjectionTime: 30s
Click Here to Copy YAML

What’s Happening Here?

  • Limits max requests per connection to prevent overload.
  • If Istio detects 2 consecutive 5xx errors, it ejects the service for 30 seconds.
  • Prevents an unhealthy service from taking down dependent services.

Step 4: Testing the Circuit Breaker

To test the circuit breaker, simulate failures in the Product Service by sending multiple requests:

kubectl exec -it $(kubectl get pod -l app=order-service -o jsonpath='{.items[0].metadata.name}') -- curl -X GET http://product-service

Now, if the Product Service fails multiple times, Istio stops sending requests temporarily, preventing further failures.

Conclusion

Circuit Breakers are essential for building fault-tolerant microservices.
Prevents cascading failures by intelligently rejecting requests.
Enhances system resilience by allowing only healthy services to process traffic.
Automatically recovers when services are back online.

Using Istio’s built-in circuit breaking, Kubernetes workloads can self-heal and prevent system-wide outages!

Would you use Circuit Breakers in your production environment? Let’s discuss! 👇

Building a Microservices Architecture with Kubernetes: A Complete Example

Introduction

Modern applications demand scalability, flexibility, and resilience. Microservices architecture allows teams to break down monolithic applications into smaller, independent services that can be deployed, scaled, and managed separately.

In this blog, we’ll build a complete microservices-based application on Kubernetes, covering:

  • Defining multiple microservices
  • Exposing them via Kubernetes Services
  • Managing inter-service communication
  • Deploying and scaling them efficiently

Step 1: Define Our Microservices

For this example, we’ll create two services:

  • Product Service: Handles product details.
  • Order Service: Manages order placements and communicates with the Product service.

Deployment for Product Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: product-service
  labels:
    app: product-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: product-service
  template:
    metadata:
      labels:
        app: product-service
    spec:
      containers:
      - name: product-service
        image: nginx:latest
        ports:
        - containerPort: 80
Click Here to Copy YAML

Service for Product Service

apiVersion: v1
kind: Service
metadata:
  name: product-service
spec:
  selector:
    app: product-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
Click Here to Copy YAML

Deployment for Order Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  labels:
    app: order-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      containers:
      - name: order-service
        image: httpd:latest
        env:
        - name: PRODUCT_SERVICE_URL
          value: "http://product-service"
        ports:
        - containerPort: 80
Click Here to Copy YAML

Service for Order Service

apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  selector:
    app: order-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
Click Here to Copy YAML

Step 2: Deploy and Verify

Apply all YAML files:

kubectl apply -f product-service.yaml
kubectl apply -f order-service.yaml

Check if pods are running:

kubectl get pods

Verify services:

kubectl get svc

Step 3: Expose Services to External Users

To make the services accessible externally, use an Ingress resource.

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: microservices-ingress
spec:
  rules:
  - host: myapp.local
    http:
      paths:
      - path: /products
        pathType: Prefix
        backend:
          service:
            name: product-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 80
Click Here to Copy YAML

Apply it:

kubectl apply -f ingress.yaml

Step 4: Scaling Microservices

Need to scale services? Just increase the replica count!

kubectl scale deployment product-service --replicas=5
kubectl scale deployment order-service --replicas=5

Verify scaling:

kubectl get deployments

Step 5: Observability and Logging

To monitor microservices performance, use Prometheus and Grafana for metrics and ELK Stack for centralized logging.

Example: Enable logs for a pod

kubectl logs -f <pod-name>

Conclusion

Microservices architecture, combined with Kubernetes, enables scalable, resilient, and manageable applications. By breaking monoliths into independent services, we:
âś… Improve scalability and fault tolerance
âś… Enable faster deployments and updates
âś… Simplify inter-service communication with Kubernetes Services

Start deploying microservices today and scale your applications like a pro! 

What challenges have you faced while working with microservices on Kubernetes? Let’s discuss in the comments!👇

Kubernetes Pod Disruption Budgets: Ensuring Application Availability

Introduction

When managing Kubernetes clusters, rolling updates, node drains, and scaling events can cause temporary downtime for applications. In a production environment, even a brief outage can impact users.

This is where Pod Disruption Budgets (PDBs) come in!

A Pod Disruption Budget ensures that a minimum number of pods remain available during voluntary disruptions like:
âś… Node upgrades
âś… Cluster maintenance
âś… Manual pod evictions

By implementing PDBs, we can prevent downtime while still allowing controlled disruptions for cluster maintenance. Let’s see how to build a highly available application setup using PDBs in Kubernetes.

Step 1: Deploying a Sample Application

Let’s start with a simple Nginx deployment with three replicas.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
Click Here to Copy YAML

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

Check if all pods are running:

kubectl get pods -l app=nginx

Step 2: Creating a Pod Disruption Budget (PDB)

Now, let’s create a PDB to ensure that at least one pod is always running during disruptions.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: nginx-pdb
  namespace: default
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: nginx
Click Here to Copy YAML

Apply the PDB:

kubectl apply -f nginx-pdb.yaml

Verify the PDB:

kubectl get poddisruptionbudget

Expected output:

NAME     MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS AGE
nginx-pdb   1              N/A               2               10s

This means at least 1 pod must always be running, and up to 2 pods can be disrupted at a time.

Step 3: Testing the Pod Disruption Budget

Let’s try to evict a pod and see how the PDB enforces availability:

kubectl drain <node-name> --ignore-daemonsets --force

If this eviction violates the PDB, Kubernetes will block the eviction to maintain the availability constraint.

To manually evict a pod:

kubectl delete pod <pod-name> --grace-period=0 --force

If this violates the PDB rules, Kubernetes will prevent the pod deletion.

Conclusion

Kubernetes Pod Disruption Budgets help maintain application availability during voluntary disruptions.
They ensure that a minimum number of pods always remain available.
Useful for high-availability applications and stateful workloads like databases.

With PDBs, you can perform cluster upgrades and maintenance without worrying about breaking your application’s availability! 

Would you use PDBs in your setup? Let me know your thoughts in the comments! 👇

Vertical Pod Autoscaling: Optimizing Resource Allocation

Introduction

Efficient resource allocation is crucial for maintaining performance and cost-effectiveness in Kubernetes. Traditional resource allocation requires developers to manually specify CPU and memory limits, often leading to over-provisioning or under-provisioning. The Vertical Pod Autoscaler (VPA) solves this issue by dynamically adjusting resource requests based on actual usage, ensuring that workloads run efficiently.

In this blog post, we will explore:

  • What is Vertical Pod Autoscaler (VPA)?
  • How does VPA work?
  • Step-by-step guide to implementing VPA in Kubernetes
  • YAML configurations and commands
  • Final thoughts on using VPA for optimal resource management

What is Vertical Pod Autoscaler (VPA)?

Vertical Pod Autoscaler (VPA) is a Kubernetes component that automatically adjusts the resource requests (CPU and memory) of pods. It continuously monitors the actual resource usage and updates the resource requests accordingly. This prevents over-provisioning (which leads to wasted resources) and under-provisioning (which can cause application crashes due to resource exhaustion).

Key Components of VPA:

  • Recommender – Analyzes past and current resource usage and provides recommendations for resource allocation.
  • Updater – Ensures that pods are restarted when their resource requirements deviate significantly from the recommended values.
  • Admission Controller – Modifies new pod resource requests based on the latest recommendations.

Deploying Vertical Pod Autoscaler in Kubernetes

Step 1: Install VPA in Your Cluster

To install VPA, clone the official Kubernetes autoscaler repository:

git clone https://github.com/kubernetes/autoscaler.git

Change to the VPA directory:

cd autoscaler/vertical-pod-autoscaler/

Deploy VPA components using the provided script:

./hack/vpa-up.sh

This command installs the necessary components into your Kubernetes cluster.

Step 2: Verify VPA Installation

After installation, check that VPA components are running:

kubectl get pods -n kube-system | grep vpa

Expected output:

vpa-admission-controller-xxxx Running
vpa-recommender-xxxx Running
vpa-updater-xxxx Running

Applying VPA to a Sample Deployment

Step 3: Deploy a Sample Application

Create a simple Nginx deployment without predefined CPU and memory requests.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
      - name: sample-container
        image: nginx
Click Here to Copy YAML

Apply the deployment:

kubectl apply -f sample-deployment.yaml

Step 4: Deploy a VPA Resource

Create a VPA resource to manage the sample deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: sample-app-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       sample-app
  updatePolicy:
    updateMode: "Auto"
Click Here to Copy YAML

Apply the VPA configuration:

kubectl apply -f sample-vpa.yaml

Step 5: Monitor VPA Recommendations

Check the resource recommendations given by VPA:

kubectl describe vpa sample-app-vpa

This will show the recommended CPU and memory requests based on actual usage patterns.

Conclusion

Vertical Pod Autoscaler (VPA) ensures that Kubernetes workloads receive the right amount of resources, eliminating the guesswork involved in manual resource allocation. By dynamically adjusting CPU and memory requests, VPA enhances performance, reduces infrastructure costs, and prevents application failures due to resource starvation.

If you’re managing workloads that have fluctuating resource demands, integrating VPA into your Kubernetes setup can significantly improve cluster efficiency.

Start using VPA today and take your Kubernetes resource management to the next level! Drop your thoughts in the comments! 👇

Implementing Rate Limiting in Kubernetes with NGINX Ingress

Introduction

In modern cloud-native applications, APIs are critical components that need to be protected from excessive requests to prevent abuse and ensure fair resource distribution. Rate limiting helps safeguard services from malicious attacks, accidental overloads, and unfair resource consumption.

In this post, we’ll explore how to implement rate limiting in Kubernetes using NGINX Ingress Controller annotations.

Why Rate Limiting Matters?

  • Prevents API abuse – Stops excessive requests from a single user.
  • Enhances reliability – Ensures fair usage of backend services.
  • Improves security – Mitigates potential DoS (Denial of Service) attacks.
  • Optimizes performance – Avoids unnecessary overloading of backend applications.

Prerequisites

Before implementing rate limiting, ensure you have the following:

  • A running Kubernetes cluster (Minikube, RKE2, or self-managed).
  • NGINX Ingress Controller installed.
  • An existing application exposed via an Ingress resource.

Step 1: Deploy the NGINX Ingress Controller

If you haven’t already installed the NGINX Ingress Controller, deploy it using Helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx \
    --namespace kube-system

Verify the deployment:

kubectl get pods -n kube-system | grep nginx-ingress

Once running, proceed to set up rate limiting.

Step 2: Deploy a Sample API

For demonstration purposes, let’s deploy a simple echo server as our backend API.

Deploy the API Pod & Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo-server
  labels:
    app: echo-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: echo-server
  template:
    metadata:
      labels:
        app: echo-server
    spec:
      containers:
      - name: echo-server
        image: k8s.gcr.io/echoserver:1.10
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: echo-server
spec:
  selector:
    app: echo-server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: ClusterIP
Click Here to Copy YAML

Apply the deployment and service:

kubectl apply -f echo-server.yaml

Step 3: Configure Rate Limiting with an Ingress Resource

Create an Ingress Resource with Rate Limiting

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: echo-ingress
  annotations:
    nginx.ingress.kubernetes.io/limit-rps: "5"
    nginx.ingress.kubernetes.io/limit-burst: "10"
    nginx.ingress.kubernetes.io/limit-connections: "20"
spec:
  ingressClassName: nginx
  rules:
  - host: echo.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: echo-server
            port:
              number: 80
Click Here to Copy YAML

Apply the Ingress:

kubectl apply -f echo-ingress.yaml

To test locally, add an entry to /etc/hosts:

echo "127.0.0.1 echo.local" | sudo tee -a /etc/hosts

Step 4: Testing the Rate Limits

Use curl to send multiple requests and observe the rate limits in action.

for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" http://echo.local; done

If the rate limit is exceeded, you will start receiving 429 Too Many Requests responses.

Alternatively, use hey to simulate a load test:

hey -n 100 -c 10 http://echo.local

NGINX will enforce the limits defined in the Ingress annotations.

Step 5: Monitoring Rate Limiting Logs

To verify that rate limiting is working, check the logs of the NGINX Ingress Controller:

kubectl logs -n kube-system -l app.kubernetes.io/name=ingress-nginx

Look for logs indicating 429 Too Many Requests responses.

Conclusion

Implementing rate limiting in Kubernetes with NGINX Ingress is a powerful way to protect APIs from abuse while ensuring fair resource usage. By leveraging NGINX annotations, we can dynamically control:

âś… Request rates
âś… Burst handling
âś… Concurrent connections

This setup is essential for production-grade applications, preventing DDoS attacks, and maintaining system stability.

Have you implemented rate limiting in your Kubernetes clusters? Share your experience in the comments!👇

Setting Up Cluster Autoscaler in Minikube for Development Testing

Introduction

Kubernetes Cluster Autoscaler automatically adjusts the number of nodes in your cluster based on pending workloads. While in production, this typically requires a cloud provider, Minikube provides a way to simulate autoscaling for development and testing.

In this guide, we’ll configure Cluster Autoscaler on Minikube, simulate scaling behaviors, and observe how it increases node capacity when needed.

The Problem: Autoscaling in Development Environments

In production, Kubernetes clusters dynamically scale nodes to handle workload spikes.
In local development, Minikube runs a single node by default, making it challenging to test Cluster Autoscaler.
Solution: Use Minikube’s multi-node feature and the Cluster Autoscaler to simulate real-world autoscaling scenarios.

Step 1: Start Minikube with Multiple Nodes

Since Minikube doesn’t support real autoscaling, we manually start it with multiple nodes to allow Cluster Autoscaler to scale between them.

minikube start --nodes 2

Verify the nodes are running:

kubectl get nodes

Step 2: Install Metrics Server

Cluster Autoscaler relies on resource metrics to make scaling decisions. Install the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify that the Metrics Server is running:

kubectl get deployment metrics-server -n kube-system

Step 3: Deploy Cluster Autoscaler

Now, deploy the Cluster Autoscaler to monitor and scale nodes.

Cluster Autoscaler Deployment YAML

Create a file called cluster-autoscaler.yaml and add:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        - name: cluster-autoscaler
          image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=minikube
            - --skip-nodes-with-local-storage=false
            - --skip-nodes-with-system-pods=false
          resources:
            requests:
              cpu: 100m
              memory: 300Mi
            limits:
              cpu: 500m
              memory: 500Mi
Click Here to Copy YAML

Apply the deployment:

kubectl apply -f cluster-autoscaler.yaml

Check logs to ensure it’s running:

kubectl logs -f deployment/cluster-autoscaler -n kube-system

Step 4: Create a Workload that Triggers Scaling

Now, deploy a workload that requires more resources than currently available, forcing the Cluster Autoscaler to scale up.

Resource-Intensive Deployment YAML

Create a file high-memory-app.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-memory-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: high-memory-app
  template:
    metadata:
      labels:
        app: high-memory-app
    spec:
      containers:
        - name: stress
          image: polinux/stress
          command: ["stress"]
          args: ["--vm", "2", "--vm-bytes", "500M", "--timeout", "60s"]
          resources:
            requests:
              memory: "600Mi"
              cpu: "250m"
            limits:
              memory: "800Mi"
              cpu: "500m"
Click Here to Copy YAML

Apply the deployment:

kubectl apply -f high-memory-app.yaml

Check if the pods are pending:

kubectl get pods -o wide

If you see pending pods, it means the Cluster Autoscaler should trigger node scaling.

Observing Autoscaler in Action

Now, let’s check how the autoscaler responds:

kubectl get nodes
kubectl get pods -A
kubectl logs -f deployment/cluster-autoscaler -n kube-system

You should see the Cluster Autoscaler increasing the node count to accommodate the pending pods. Once the workload decreases, it should scale down unused nodes.

Why Does This Matter?

Understand autoscaler behavior before deploying to production
Validate custom scaling policies in a local development setup
Optimize resource allocation for cost and performance efficiency

Even though Minikube doesn’t create new cloud nodes dynamically, this method helps developers test scaling triggers and behaviors before running on real cloud environments.

Conclusion: Build Smarter Autoscaling Strategies

Testing Cluster Autoscaler in Minikube provides valuable insights into Kubernetes scaling before moving to production. If you’re developing autoscaling-sensitive applications, mastering this setup ensures better efficiency, cost savings, and resilience.

Have you tested autoscaling in Minikube? Drop your thoughts in the comments!👇

Implementing Horizontal Pod Autoscaling Based on Custom Metrics

Introduction

Kubernetes provides Horizontal Pod Autoscaling (HPA) based on CPU and memory usage. However, many applications require scaling based on custom business metrics, such as:

âś… Request throughput (e.g., HTTP requests per second)
âś… Queue length in message brokers (e.g., Kafka, RabbitMQ)
âś… Database load (e.g., active connections)

In this guide, we will configure HPA using custom metrics from Prometheus and expose them using the Prometheus Adapter.

Prerequisites

  • A running Kubernetes cluster
  • Prometheus installed for metric collection
  • Prometheus Adapter for exposing metrics

Step 1: Deploy Prometheus in Kubernetes

We use the kube-prometheus-stack Helm chart to install Prometheus:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Verify the installation:

kubectl get pods -n monitoring

Step 2: Deploy an Application with Custom Metrics

We will deploy an NGINX application that exposes custom HTTP request metrics.

Create the Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: "500m"
            memory: "256Mi"
          requests:
            cpu: "250m"
            memory: "128Mi"
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-deployment.yaml

Expose the Application

Create a service to expose NGINX:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
spec:
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-service.yaml

Step 3: Configure Prometheus to Scrape Custom Metrics

Edit the prometheus.yaml config to scrape NGINX metrics:

scrape_configs:
  - job_name: "nginx"
    static_configs:
      - targets: ["nginx-service.default.svc.cluster.local:80"]

Apply the updated Prometheus config:

kubectl apply -f prometheus.yaml

Verify the metrics in Prometheus UI:

kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090

Open http://localhost:9090, and search for http_requests_total.

Step 4: Install Prometheus Adapter

Prometheus Adapter exposes custom metrics for Kubernetes autoscalers. Install it using Helm:

helm install prometheus-adapter prometheus-community/prometheus-adapter --namespace monitoring

Verify the installation:

kubectl get pods -n monitoring | grep prometheus-adapter

Check if custom metrics are available:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .

Step 5: Create Horizontal Pod Autoscaler (HPA)

We now create an HPA that scales NGINX based on request rate.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: http_requests_total
      target:
        type: Value
        value: 100
Click Here to Copy YAML

Apply it:

kubectl apply -f nginx-hpa.yaml

Check HPA status:

kubectl get hpa nginx-hpa

Step 6: Load Test and Observe Scaling

Use hey or wrk to simulate traffic:

hey -n 1000 -c 50 http://nginx-service.default.svc.cluster.local

Check if new pods are created:

kubectl get pods

Conclusion

By integrating Prometheus Adapter with Kubernetes HPA, we can scale applications based on business-specific metrics like request rates, queue lengths, or latency. This approach ensures better resource efficiency and application performance in cloud-native environments.

If you’re working with Kubernetes, stop relying only on CPU-based autoscaling! Custom metrics give you precision and efficiency. Drop your thoughts in the comments! 👇