Practical DevOps Skills – Arvind Raja

Managing Kubernetes Costs: Practical Techniques for Resource Optimization

Introduction

Kubernetes provides powerful scalability, but if not managed properly, cloud costs can quickly escalate. To maintain cost efficiency while ensuring performance, organizations must monitor and optimize their Kubernetes workloads effectively.

This guide covers practical techniques with YAML configurations and commands to help you reduce Kubernetes costs, right-size workloads, and optimize scaling.

Step 1: Monitor Costs and Resource Utilization

Enable Kubernetes Resource Requests and Limits

Setting resource requests and limits ensures that workloads use only the necessary CPU and memory.

Resource Requests and Limits Example (deployment.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
  namespace: cost-optimization
spec:
  replicas: 3
  selector:
    matchLabels:
      app: optimized-app
  template:
    metadata:
      labels:
        app: optimized-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f deployment.yaml

Step 2: Optimize Kubernetes Workloads

Right-Size Pods and Nodes with Auto-Scaling

Enable Horizontal Pod Autoscaler (HPA)

Automatically scale pods based on CPU usage.

kubectl autoscale deployment optimized-app --cpu-percent=50 --min=2 --max=10

Enable Vertical Pod Autoscaler (VPA)

Dynamically adjust pod resource requests.

VPA Configuration (vpa.yaml)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: optimized-app-vpa
  namespace: cost-optimization
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: optimized-app
  updatePolicy:
    updateMode: "Auto"

Click Here to Copy YAML

Apply VPA:

kubectl apply -f vpa.yaml

Step 3: Implement Cost-Effective Scaling Strategies

Enable Cluster Autoscaler

Ensure your cluster scales up/down based on workload demand.

For Minikube:

minikube addons enable cluster-autoscaler

For a production cluster (on AWS or GCP), configure Cluster Autoscaler based on your cloud provider’s managed service.

Step 4: Optimize Storage and Networking Costs

Use Efficient Storage Classes

Select appropriate StorageClasses to avoid overpaying for unnecessary IOPS or performance levels.

Example of a cost-effective StorageClass (storage.yaml)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cost-efficient-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3  # Use gp3 instead of gp2 for cost savings

Click Here to Copy YAML

Apply the StorageClass:

kubectl apply -f storage.yaml

Step 5: Remove Unused Resources

Identify and Delete Unused Resources

Find and remove unused PersistentVolumes, LoadBalancers, and Unused Pods.

List unused Persistent Volumes:

kubectl get pv | grep "Released"

Delete Unused Load Balancers:

kubectl delete svc <service-name>

Conclusion

By implementing resource requests and limits, auto-scaling, cost-efficient storage, and monitoring unused resources, you can significantly reduce Kubernetes costs without sacrificing performance.

Start optimizing your Kubernetes costs today and take control of your cloud expenses!

What strategies do you use for Kubernetes cost optimization? Let’s discuss in the comments!

Building a Complete Developer Platform on Kubernetes

Introduction

A Developer Platform on Kubernetes empowers developers with self-service deployments, while platform teams maintain security, governance, and operational stability.

In this guide, we will:

Set up a Kubernetes cluster
Create namespaces for teams
Implement RBAC for security
Enable GitOps-based deployments
Set up monitoring and logging

Step 1: Set Up a Kubernetes Cluster

For local testing, use Minikube or kind:

minikube start --cpus 4 --memory 8g

For production, use a self-managed Kubernetes cluster.

Ensure kubectl and helm are installed:

kubectl version --client
helm version

Step 2: Create Namespaces for Teams

We create isolated namespaces for different teams to deploy applications.

Namespace YAML

apiVersion: v1
kind: Namespace
metadata:
  name: dev-team
---
apiVersion: v1
kind: Namespace
metadata:
  name: qa-team

Click Here to Copy YAML

Apply Namespace Configuration

kubectl apply -f namespaces.yaml

Step 3: Implement Role-Based Access Control (RBAC)

RBAC ensures developers have limited access to their namespaces.

RBAC YAML

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: dev-team
  name: developer-role
rules:
  - apiGroups: [""]
    resources: ["pods", "deployments", "services"]
    verbs: ["get", "list", "create", "update", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: dev-team
  name: developer-binding
subjects:
  - kind: User
    name: dev-user
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer-role
  apiGroup: rbac.authorization.k8s.io

Click Here to Copy YAML

Apply RBAC Configuration

kubectl apply -f rbac.yaml

Step 4: Deploy an Application

Developers can now deploy applications in their namespace.

Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
  namespace: dev-team
spec:
  replicas: 2
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
        - name: sample-app
          image: nginx
          ports:
            - containerPort: 80

Click Here to Copy YAML

Apply Deployment

kubectl apply -f deployment.yaml

Step 5: Set Up GitOps for Continuous Deployment

We use ArgoCD for GitOps-based automated deployments.

Install ArgoCD

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Step 6: Configure Monitoring & Logging

To enable observability, we set up Prometheus & Grafana.

Install Prometheus & Grafana using Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

Access Grafana

kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring

Username: admin
Password: Run this command to get it:

kubectl get secret --namespace monitoring monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Conclusion

Developers get self-service access to deploy apps.
RBAC ensures security and access control.
GitOps with ArgoCD enables automated deployments.
Monitoring stack ensures observability.

By building this Internal Developer Platform on Kubernetes, we empower developers while maintaining operational control.

How do you manage developer workflows in Kubernetes? Drop your thoughts below!

Blue-Green Deployments in Kubernetes: A Step-by-Step Guide

Introduction

In today’s fast-paced DevOps world, zero-downtime deployments are a necessity. Traditional deployments often result in downtime, failed updates, or rollback complexities. Enter Blue-Green Deployments, a strategy that eliminates these risks by maintaining two production environments—one active (Blue) and one staged (Green).

In this post, we’ll walk through how to implement a Blue-Green Deployment in Kubernetes while ensuring:
Zero-downtime application updates
Instant rollback in case of failure
Seamless traffic switching between versions

Since you don’t have access to Docker Hub or GitHub for image pulls, we’ll build and use a local image with Minikube instead.

How Blue-Green Deployment Works

Blue (Active Version): The currently running stable application version.
Green (New Version): The newly deployed application version, tested before switching live traffic.
Traffic Switching: Once the Green version is verified, traffic is routed to it, making it the new Blue version.
Rollback Option: If issues arise, traffic can be switched back to the previous version instantly.

Setting Up Blue-Green Deployment in Kubernetes

Step 1: Build the Application Image Locally

Since you don’t have access to external image registries, we’ll use Minikube’s local image repository for this setup.

First, create a simple Flask-based web app to simulate different versions:

Create app.py

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello from the Blue Version!"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Click Here to Copy Python Code

Create Dockerfile

FROM python:3.8-slim
WORKDIR /app
COPY app.py .
RUN pip install flask
CMD ["python", "app.py"]

Build and Load the Image into Minikube

eval $(minikube docker-env
docker build -t myapp:blue .

Step 2: Deploy the Blue Version

Create blue-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: blue-app
  labels:
    app: myapp
    version: blue
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:blue
        ports:
        - containerPort: 5000

Click Here to Copy YAML

Apply the Deployment

kubectl apply -f blue-deployment.yaml

Create service.yaml to expose Blue version

apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  selector:
    app: myapp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
  type: LoadBalancer

Click Here to Copy YAML

Apply the Service

kubectl apply -f service.yaml
minikube service myapp-service

Now, you can access http:// and see:

“Hello from the Blue Version!”

Step 3: Deploy the Green Version

Modify app.py to change the response message:

Modify app.py

@app.route("/")
def home():
    return "Hello from the Green Version!"

Click Here to Copy Python Code

Rebuild and Load the Green Version

docker build -t myapp:green .

Create green-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: green-app
  labels:
    app: myapp
    version: green
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:green
        ports:
        - containerPort: 5000

Click Here to Copy YAML

Apply Green Deployment

kubectl apply -f green-deployment.yaml

At this point, both Blue (old version) and Green (new version) exist, but traffic still flows to Blue.

Step 4: Switch Traffic to the Green Version

To shift traffic, modify the Service Selector to point to the Green version instead of Blue.

Update service.yaml

spec:
  selector:
    app: myapp
    version: green

Apply the Service Update

kubectl apply -f service.yaml

Now, accessing http:// will show:

“Hello from the Green Version!”

Traffic has successfully switched from Blue to Green!

Step 5: Rollback to Blue

If issues arise in the Green version, simply change the Service selector back:

Modify service.yaml again

spec:
  selector:
    app: myapp
    version: blue

Apply the Rollback

kubectl apply -f service.yaml

Now, traffic is back to the Blue Version instantly—without redeploying anything!

Conclusion

Blue-Green Deployments in Kubernetes offer a seamless way to update applications with zero downtime. The ability to instantly switch traffic between versions ensures quick rollbacks, reducing deployment risks.

No downtime during deployments
Easy rollback if issues arise
No impact on end-users

How do you handle deployments in Kubernetes? Let me know in the comments!

Implementing Chaos Engineering in Kubernetes with Chaos Mesh

Introduction

In distributed systems, failures are inevitable. Chaos Engineering is a proactive approach to testing system resilience by introducing controlled disruptions. Chaos Mesh is an open-source Chaos Engineering platform specifically designed for Kubernetes, enabling the simulation of various faults like pod crashes, network delays, and CPU stress.

This blog post will walk you through:

Installing Chaos Mesh on your Kubernetes cluster
Deploying a sample application
Executing chaos experiments to test system resilience
Observing and understanding system behavior under stress

Installing Chaos Mesh

Chaos Mesh can be installed using Helm, a package manager for Kubernetes.

Prerequisites

A running Kubernetes cluster
Helm installed on your local machine

Step 1: Add the Chaos Mesh Helm Repository

Add the official Chaos Mesh Helm repository:

helm repo add chaos-mesh https://charts.chaos-mesh.org

Step 2: Create the Chaos Mesh Namespace

It’s recommended to install Chaos Mesh in a dedicated namespace:

kubectl create namespace chaos-mesh

Step 3: Install Chaos Mesh

Install Chaos Mesh using Helm:

helm install chaos-mesh chaos-mesh/chaos-mesh -n chaos-mesh

This command deploys Chaos Mesh components, including the controller manager and Chaos Dashboard, into your Kubernetes cluster.

Step 4: Verify the Installation

Check the status of the Chaos Mesh pods:

kubectl get pods -n chaos-mesh

All pods should be in the Running state.

For more detailed installation instructions and configurations, refer to the official Chaos Mesh documentation.

Deploying a Sample Application

To demonstrate Chaos Mesh’s capabilities, we’ll deploy a simple Nginx application.

Step 1: Create the Deployment YAML

Create a file named nginx-deployment.yaml with the following content:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80

Click Here to Copy YAML

Step 2: Deploy the Application

Apply the deployment to your Kubernetes cluster:

kubectl apply -f nginx-deployment.yaml

Step 3: Verify the Deployment

Ensure that the Nginx pods are running:

kubectl get pods -l app=nginx

You should see three running Nginx pods.

Running Chaos Experiments

With Chaos Mesh installed and a sample application deployed, we can now introduce controlled faults to observe how the system responds.

Experiment 1: Pod Failure

This experiment will randomly terminate one of the Nginx pods.

Step 1: Create the PodChaos YAML

Create a file named pod-failure.yaml with the following content:

apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-failure
  namespace: chaos-mesh
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces:
      - default
    labelSelectors:
      app: nginx
  duration: 30s
  scheduler:
    cron: "@every 1m"

Click Here to Copy YAML

Step 2: Apply the Chaos Experiment

Apply the experiment to your cluster:

kubectl apply -f pod-failure.yaml

Step 3: Monitor the Experiment

Observe the behavior of the Nginx pods:

kubectl get pods -l app=nginx -w

You should see pods being terminated and restarted as per the experiment’s configuration.

For more details on simulating pod faults, refer to the Chaos Mesh documentation.

Experiment 2: Network Delay

This experiment introduces a network latency of 200ms to the Nginx pods.

Step 1: Create the NetworkChaos YAML

Create a file named network-delay.yaml with the following content:

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay
  namespace: chaos-mesh
spec:
  action: delay
  mode: all
  selector:
    namespaces:
      - default
    labelSelectors:
      app: nginx
  delay:
    latency: "200ms"
    correlation: "25"
    jitter: "50ms"
  duration: "30s"
  scheduler:
    cron: "@every 2m"

Click Here to Copy YAML

Step 2: Apply the Chaos Experiment

Apply the experiment to your cluster:

kubectl apply -f network-delay.yaml

Conclusion

Chaos Engineering is essential for improving the resilience of Kubernetes applications. With Chaos Mesh, you can simulate real-world failures in a controlled environment, ensuring that your applications can withstand unexpected disruptions.

By implementing Chaos Mesh and running experiments like pod failures and network delays, teams can proactively identify weaknesses and enhance system stability.

Start incorporating Chaos Engineering into your Kubernetes workflow today and build systems that are truly resilient!

What’s your experience with Chaos Engineering? Drop your thoughts below!

Building a Self-healing Kubernetes Application on Kubernetes

Introduction

In traditional application deployments, when a service crashes, it often requires manual intervention to restart, debug, and restore functionality. This leads to downtime, frustrated users, and operational overhead.

What if your application could automatically recover from failures?

With Kubernetes’ self-healing mechanisms, we can ensure applications restart automatically when they fail—without human intervention.

The Solution: Kubernetes Self-healing Mechanisms

Kubernetes provides several built-in mechanisms to maintain high availability and auto-recovery:

Pod Restart Policies → Automatically restarts failed containers.
ReplicaSets → Ensures a specified number of pod replicas are always running.
Node Failure Recovery → Reschedules pods to healthy nodes if a node crashes.
Persistent Storage (Optional) → Ensures data persists even when pods restart.

Step 1: Creating a Simple Web Application

We’ll create a basic Python Flask application that runs inside a Kubernetes pod. This will simulate a real-world web service.

Create a Python Web App

Create a file named app.py:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello, Kubernetes! Your app is self-healing."

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Click Here to Copy Python Code

Step 2: Creating a Docker Image

Build the Docker Image Inside Minikube

Set Minikube’s Docker environment:

eval $(minikube docker-env)

Now, build the image inside Minikube:

docker build -t self-healing-app:v1 .

Verify that the image exists:

docker images | grep self-healing-app

Step 3: Deploying to Kubernetes

Now, let’s create Kubernetes resources.

Create a Deployment

Create a file deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: self-healing-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: self-healing-app
  template:
    metadata:
      labels:
        app: self-healing-app
    spec:
      restartPolicy: Always
      containers:
      - name: self-healing-container
        image: self-healing-app:v1

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f deployment.yaml

Verify the running pods:

kubectl get pods

Step 4: Exposing the Application

To access the application, expose it as a KubernetesService.

Create service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: self-healing-service
spec:
  type: NodePort
  selector:
    app: self-healing-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000

Click Here to Copy YAML

Apply the service:

kubectl apply -f service.yaml

Find the external access URL:

minikube service self-healing-service --url

Access your app in a browser or via curl:

curl <MINIKUBE_SERVICE_URL>

Step 5: Simulating Failures

Kill a Running Pod

Run:

kubectl delete pod -l app=self-healing-app

Kubernetes will automatically recreate the pod within seconds.

Simulate a Node Failure

If you’re using a multi-node cluster, cordon and drain a node:

kubectl cordon <NODE_NAME>
kubectl drain <NODE_NAME> --ignore-daemonsets --force

Pods will automatically be rescheduled on healthy nodes.

Conclusion

By leveraging Kubernetes built-in self-healing features, we’ve created a system that:

Automatically recovers from failures without manual intervention.
Ensures high availability using multiple replicas.
Prevents downtime, keeping the application running smoothly.

This approach reduces operational overhead and enhances reliability. Let me know if you have any questions in the comments!