Building a Self-healing Kubernetes Application on Kubernetes

Introduction

In traditional application deployments, when a service crashes, it often requires manual intervention to restart, debug, and restore functionality. This leads to downtime, frustrated users, and operational overhead.

What if your application could automatically recover from failures?

With Kubernetes’ self-healing mechanisms, we can ensure applications restart automatically when they fail—without human intervention.

The Solution: Kubernetes Self-healing Mechanisms

Kubernetes provides several built-in mechanisms to maintain high availability and auto-recovery:

Pod Restart Policies → Automatically restarts failed containers.
ReplicaSets → Ensures a specified number of pod replicas are always running.
Node Failure Recovery → Reschedules pods to healthy nodes if a node crashes.
Persistent Storage (Optional) → Ensures data persists even when pods restart.

Step 1: Creating a Simple Web Application

We’ll create a basic Python Flask application that runs inside a Kubernetes pod. This will simulate a real-world web service.

Create a Python Web App

Create a file named app.py:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Hello, Kubernetes! Your app is self-healing."

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Click Here to Copy Python Code

Step 2: Creating a Docker Image

Build the Docker Image Inside Minikube

Set Minikube’s Docker environment:

eval $(minikube docker-env)

Now, build the image inside Minikube:

docker build -t self-healing-app:v1 .

Verify that the image exists:

docker images | grep self-healing-app

Step 3: Deploying to Kubernetes

Now, let’s create Kubernetes resources.

Create a Deployment

Create a file deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: self-healing-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: self-healing-app
  template:
    metadata:
      labels:
        app: self-healing-app
    spec:
      restartPolicy: Always
      containers:
      - name: self-healing-container
        image: self-healing-app:v1

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f deployment.yaml

Verify the running pods:

kubectl get pods

Step 4: Exposing the Application

To access the application, expose it as a KubernetesService.

Create service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: self-healing-service
spec:
  type: NodePort
  selector:
    app: self-healing-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000

Click Here to Copy YAML

Apply the service:

kubectl apply -f service.yaml

Find the external access URL:

minikube service self-healing-service --url

Access your app in a browser or via curl:

curl <MINIKUBE_SERVICE_URL>

Step 5: Simulating Failures

Kill a Running Pod

Run:

kubectl delete pod -l app=self-healing-app

Kubernetes will automatically recreate the pod within seconds.

Simulate a Node Failure

If you’re using a multi-node cluster, cordon and drain a node:

kubectl cordon <NODE_NAME>
kubectl drain <NODE_NAME> --ignore-daemonsets --force

Pods will automatically be rescheduled on healthy nodes.

Conclusion

By leveraging Kubernetes built-in self-healing features, we’ve created a system that:

Automatically recovers from failures without manual intervention.
Ensures high availability using multiple replicas.
Prevents downtime, keeping the application running smoothly.

This approach reduces operational overhead and enhances reliability. Let me know if you have any questions in the comments!

skills

AppArmor, ArgoCD, Audit Logs, Azure, Bash Scripting, Bitbucket, CI/CD Pipelines, Cilium, ConfigMaps, Containerd, Docker, Falco, Git, GitHub, GitLab, GitLab CI, GitHub Actions, Grafana, Helm, Image Policy Webhooks, Infisical, Kubernetes, Kubesec, Kustomize, Linkerd, Linux, Loki, Longhorn, MongoDB, Multitasking, Network Policies, OPA, Pod Security Standards, Problem Solving, Prometheus, Python, RBAC, Rook & Ceph, Runtime Classes, Seccomp, Secrets, Shell Scripting, System Analysis, System Monitoring, Tekton, Terraform, Trivy.

Blog Posts

Kubernetes: The Cornerstone of Modern Container Orchestration
Introduction In today’s fast-paced world of cloud-native technologies, Kubernetes has become synonymous with container orchestration. Initially developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF), Kubernetes is an open-source platform designed to automate the deployment, scaling, … Continue reading "Kubernetes: The Cornerstone of Modern Container Orchestration"
Setting Up a Secure Multi-tenant Kubernetes Cluster in Minikube
Introduction In Kubernetes, multi-tenancy enables multiple teams or projects to share the same cluster while maintaining isolation and security. However, ensuring proper access control and preventing resource conflicts is a challenge. This guide walks you through setting up a secure … Continue reading "Setting Up a Secure Multi-tenant Kubernetes Cluster in Minikube"
Implementing Pod Security Standards in Kubernetes: A Practical Guide
Introduction Securing Kubernetes workloads is critical to prevent security breaches and container escapes. Kubernetes Pod Security Standards (PSS) provide a framework for defining and enforcing security settings for Pods at different levels—Privileged, Baseline, and Restricted. In this guide, you’ll learn … Continue reading "Implementing Pod Security Standards in Kubernetes: A Practical Guide"
Automating Container Security Scans with Trivy in GitHub Actions
Introduction Ensuring security in containerized applications is a critical aspect of modern DevOps workflows. To enhance security and streamline vulnerability detection, I integrated Trivy into my GitHub repository, enabling automated security scanning within the CI/CD pipeline. Objective To automate vulnerability … Continue reading "Automating Container Security Scans with Trivy in GitHub Actions"
Mastering Kubernetes Network Security with NetworkPolicies
Introduction Did you know? By default, every pod in Kubernetes can talk to any other pod—leading to unrestricted internal communication and potential security risks. This is a major concern in production environments where microservices demand strict access controls. So, how … Continue reading "Mastering Kubernetes Network Security with NetworkPolicies"