Monitoring and Observability – Arvind Raja

Creating Custom Grafana Dashboards for Kubernetes Resource Monitoring

Introduction

In modern DevOps workflows, monitoring Kubernetes clusters is crucial to ensure optimal performance, resource allocation, and overall system health. While tools like Prometheus and Grafana provide powerful insights, default dashboards may not always meet the needs of different teams.

In this post, I’ll walk you through the process of creating custom Grafana dashboards to monitor Kubernetes resources, making monitoring data more accessible and actionable for different stakeholders.

Why Custom Dashboards?

A one-size-fits-all dashboard doesn’t always work in dynamic environments. Different teams require different levels of detail:

Developers might want insights into application performance and error rates.
SREs and Ops teams need deep infrastructure metrics like CPU, memory, and pod statuses.
Management and Business teams may prefer high-level overviews of system health.

By creating role-specific visualizations, we can provide each team with the data they need.

Setting Up Grafana for Kubernetes Monitoring

Step 1: Install Prometheus and Grafana in Kubernetes

If you haven’t already installed Prometheus and Grafana, you can deploy them using Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

After installation, forward the Grafana service to access the UI:

kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring

Now, open http://localhost:3000 and log in with:

Username: admin
Password: Retrieve it using:

kubectl get secret --namespace monitoring monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Step 2: Configure Prometheus as the Data Source

Once inside Grafana:

Go to Configuration → Data Sources.
Click Add data source and select Prometheus.
Set the URL to http://monitoring-prometheus-oper-prometheus.monitoring.svc.cluster.local:9090.
Click Save & Test to verify the connection.

Now, Grafana can fetch Kubernetes metrics from Prometheus.

Step 3: Creating a Custom Dashboard

We’ll manually create a Kubernetes Resource Monitoring dashboard in Grafana.

Adding a CPU Usage Panel

Go to Dashboards → New Dashboard.
Click Add a New Panel.
Under Query, select Prometheus as the data source.
Enter the following PromQL query to monitor CPU usage per namespace:

sum(rate(container_cpu_usage_seconds_total{namespace!='', container!=''}[5m])) by (namespace)

In the Legend format, enter {{namespace}} to label the graph properly.
Click Apply.

Adding a Memory Usage Panel

Add another panel in the same dashboard.
Use the following PromQL query to monitor Memory usage per namespace:

sum(container_memory_usage_bytes{namespace!='', container!=''}) by (namespace)

Set the Legend format to {{namespace}}.
Click Apply.

Saving the Dashboard

Click Save Dashboard.
Enter a name like Kubernetes Resource Monitoring.
Click Save.

Step 4: Viewing the Dashboard

Once saved, the dashboard should display real-time CPU and memory usage graphs categorized by namespaces.

SREs can track high CPU-consuming namespaces to optimize resource allocation.
Developers can monitor application memory usage to debug performance issues.
Managers can get an overview of cluster health at a glance.

By creating custom visualizations, we make Kubernetes monitoring more actionable and role-specific.

Conclusion

In this post, we explored how to create a custom Kubernetes monitoring dashboard in Grafana. By leveraging Prometheus metrics, we designed role-specific panels for CPU and memory usage, making monitoring more insightful and efficient.

Stay tuned for more Kubernetes insights! If you found this helpful, share your thoughts in the comments.

Practical Kubernetes Tracing with Jaeger

Introduction

In modern microservices architectures, debugging performance issues can be challenging. Requests often travel across multiple services, making it difficult to identify bottlenecks. Jaeger, an open-source distributed tracing system, helps solve this problem by providing end-to-end request tracing across services.

In this blog post, we will explore how to:
Deploy Jaeger in Kubernetes
Set up distributed tracing without building custom images
Use an OpenTelemetry-enabled NGINX for tracing

Step 1: Deploying Jaeger in Kubernetes

The easiest way to deploy Jaeger in Kubernetes is by using Helm.

Installing Jaeger Using Helm

To install Jaeger in the observability namespace, run:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
helm install jaeger jaegertracing/jaeger \
  --namespace observability --create-namespace
  --set query.service.httpPort=16686

Verify the Deployment

Check if Jaeger is running:

kubectl get pods -n observability
kubectl get svc -n observability

You should see services like jaeger-collector and jaeger-query.

Step 2: Deploying an NGINX-Based Application with OpenTelemetry

Instead of building a custom image, we use an OpenTelemetry-enabled NGINX container that automatically sends traces to Jaeger.

Creating the Deployment

Here’s the YAML configuration for an NGINX service that integrates with Jaeger:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-tracing
  namespace: observability
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-tracing
  template:
    metadata:
      labels:
        app: nginx-tracing
    spec:
      containers:
        - name: nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://jaeger-collector:4317"
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-tracing
  namespace: observability
spec:
  selector:
    app: nginx-tracing
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Click Here to Copy YAML

Deploying the Application

Apply the deployment:

kubectl apply -f nginx-tracing.yaml

Step 3: Accessing the Application

To expose the NGINX service locally, run:

kubectl port-forward svc/nginx-tracing 8080:80 -n observability

Now, visit http://localhost:8080 in your browser.

Step 4: Viewing Traces in Jaeger

To access the Jaeger UI, forward the query service port:

kubectl port-forward svc/jaeger 16686:16686 -n observability

Now, open http://localhost:16686 and search for traces from NGINX.

Conclusion

In this guide, we:
Deployed Jaeger using Helm for distributed tracing.
Used an OpenTelemetry-enabled NGINX image to send traces without building custom images.
Accessed the Jaeger UI to visualize trace data.

Why is tracing important in your Kubernetes setup? Share your thoughts below!

Building a Comprehensive Logging Stack with Loki and Grafana

Logs are the backbone of observability in microservices. But traditional logging systems can be complex, expensive, and inefficient at handling high-volume logs. This is where Grafana Loki comes in!

Loki is a lightweight, cost-effective logging solution designed to work seamlessly with Grafana. Unlike Elasticsearch-based solutions, Loki indexes metadata instead of the actual log content, making it faster and more scalable for Kubernetes environments.

What we will achieve in this guide:

Deploy Loki for log aggregation
Install Promtail for log collection
Visualize logs in Grafana
Enable log queries for efficient debugging

Let’s get started!

Deploying Loki with Helm

The easiest way to install Loki in Kubernetes is via Helm, which automates resource creation and configuration.

Step 1: Add the Grafana Helm Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Step 2: Install Loki in Kubernetes

helm install loki grafana/loki-stack -n logging --create-namespace

This command deploys:
Loki (log aggregator)
Promtail (log forwarder)
Grafana (log visualization)

Verify that the pods are running:

kubectl get pods -n logging

Configuring Promtail for Log Collection

Promtail collects logs from Kubernetes nodes and sends them to Loki. Let’s configure it properly.

promtail-config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: logging
data:
  promtail.yaml: |
    server:
      http_listen_port: 3101
      grpc_listen_port: 9095
    positions:
      filename: /var/log/positions.yaml
    clients:
      - url: http://loki:3100/loki/api/v1/push
    scrape_configs:
      - job_name: kubernetes-pods
        pipeline_stages:
          - cri: {}
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: .+

Click Here to Copy YAML

Apply the Promtail Configuration

kubectl apply -f promtail-config.yaml

This config scrapes logs from Kubernetes pods and sends them to Loki for indexing.

Deploying Grafana for Log Visualization

Grafana provides a user-friendly dashboard to analyze logs efficiently.

Step 1: Install Grafana via Helm

helm install grafana grafana/grafana -n logging

Step 2: Access Grafana

kubectl port-forward svc/grafana -n logging 3000:80

Now, open http://localhost:3000 in your browser.

Username: admin
Password: Retrieve using:

kubectl get secret -n logging grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Connecting Loki as a Data Source in Grafana

Once inside Grafana:

Navigate to Configuration → Data Sources
Click Add Data Source
Select Loki
Set the URL to http://loki:3100
Click Save & Test

Now, Grafana can query logs directly from Loki!

Querying and Analyzing Logs

Grafana allows you to filter logs with powerful queries. Here are some common ones:

View all logs for a specific namespace

{namespace="myapp"}

Filter logs from a specific pod

{pod="myapp-56c8d9df6d-p7tkg"}

Search logs for errors

{app="myapp"} |= "error"

LogQL (Loki Query Language) enables efficient log analysis, making debugging easier.

Verifying the Setup

Check the status of your Loki stack:

kubectl get pods -n logging

If everything is running, you successfully deployed a scalable logging system for Kubernetes!

Conclusion: Why Use Loki for Logging?

By implementing Loki with Grafana, we achieved:
Centralized logging for Kubernetes workloads
Lightweight and cost-effective log storage
Powerful query capabilities with LogQL
Seamless integration with Grafana dashboards

Unlike traditional logging stacks (like ELK), Loki eliminates the need for heavy indexing, reducing storage costs and improving query speeds.

Let me know if you have any questions in the comments!

Creating Custom Prometheus Exporters for Your Applications

Introduction

In modern cloud-native environments, monitoring is a critical aspect of maintaining application reliability and performance. Prometheus is a popular monitoring system, but its built-in exporters may not cover custom business logic or application-specific metrics.

In this guide, we will build a custom Prometheus exporter for a sample application, package it into a Docker container, and deploy it in Kubernetes. By the end of this tutorial, you’ll have a fully functional custom monitoring setup for your application.

Why Custom Prometheus Exporters?

Prometheus exporters are essential for collecting and exposing application-specific metrics. While standard exporters cover databases, queues, and system metrics, custom exporters allow you to:

Track business-specific metrics (e.g., user activity, sales data)
Gain real-time insights into application performance
Enable custom alerting based on key performance indicators

Building a Custom Prometheus Exporter

We will create a simple Python-based Prometheus exporter that exposes custom application metrics over an HTTP endpoint.

Step 1: Writing the Python Exporter

First, let’s create a simple Python script using the prometheus_client library.

Create exporter.py with the following content:

from prometheus_client import start_http_server, Counter
import time
import random

# Define a custom metric
REQUEST_COUNT = Counter("custom_app_requests_total", "Total number of processed requests")

def process_request():
    """Simulate request processing"""
    time.sleep(random.uniform(0.5, 2.0))  # Simulate latency
    REQUEST_COUNT.inc()  # Increment counter

if __name__ == "__main__":
    start_http_server(8000)  # Expose metrics on port 8000
    print("Custom Prometheus Exporter running on port 8000...")

    while True:
        process_request()

Click Here to Copy Python Code

This script exposes a custom counter metric custom_app_requests_total, which simulates incoming application requests.

Step 2: Building and Pushing the Docker Image

Now, let’s containerize our exporter for easy deployment.

Create a Dockerfile:

FROM python:3.9-slim
WORKDIR /app
COPY exporter.py /app/
RUN pip install prometheus_client
CMD ["python", "exporter.py"]

Build and push the image:

docker build -t myrepo/myapp-prometheus-exporter:latest .
docker push myrepo/myapp-prometheus-exporter:latest

Deploying in Kubernetes

Step 3: Kubernetes Deployment

To deploy our custom exporter in Kubernetes, we create a Deployment and Service.

Create exporter-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-prometheus-exporter
  labels:
    app: myapp-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp-exporter
  template:
    metadata:
      labels:
        app: myapp-exporter
    spec:
      containers:
        - name: myapp-exporter
          image: myrepo/myapp-prometheus-exporter:latest
          ports:
            - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: myapp-exporter-service
spec:
  selector:
    app: myapp-exporter
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

Click Here to Copy YAML

Apply the deployment:

kubectl apply -f exporter-deployment.yaml

Step 4: Configuring Prometheus to Scrape Custom Metrics

Next, we need to tell Prometheus to collect metrics from our exporter.

Create service-monitor.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-exporter-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: myapp-exporter
  endpoints:
    - port: "8000"
      interval: 10s

Click Here to Copy YAML

Apply the ServiceMonitor:

kubectl apply -f service-monitor.yaml

Verifying the Setup

Step 5: Checking Metrics Collection

Check if the exporter is running:

kubectl get pods -l app=myapp-exporter

Port forward and test the metrics endpoint:

kubectl port-forward svc/myapp-exporter-service 8000:8000
curl http://localhost:8000/metrics

Check if Prometheus is scraping the exporter:

kubectl port-forward svc/prometheus-service 9090:9090

Now, open http://localhost:9090 and search for custom_app_requests_total.

Conclusion

Building a custom Prometheus exporter enables deep observability for your application. By following these steps, we have:

Created a Python-based Prometheus exporter
Containerized it using Docker
Deployed it in Kubernetes
Integrated it with Prometheus using ServiceMonitor

This setup ensures that we collect meaningful application metrics, which can be visualized in Grafana dashboards and used for proactive monitoring and alerting.

Are you using custom Prometheus exporters in your projects? Let’s discuss in the comments!

Implementing the Prometheus Operator: A Complete Guide to Kubernetes Monitoring

Monitoring is the backbone of any reliable Kubernetes cluster. It ensures visibility into resource usage, application health, and potential failures. Instead of manually deploying and managing Prometheus, the Prometheus Operator simplifies and automates monitoring with custom resource definitions (CRDs).

In this guide, we will:
Deploy the Prometheus Operator in Kubernetes
Configure Prometheus, Alertmanager, and Grafana
Set up automated service discovery using ServiceMonitor
Enable alerts and notifications

Let’s get started!

Installing the Prometheus Operator Using Helm

The fastest and most efficient way to deploy the Prometheus stack is through Helm.

Step 1: Add the Prometheus Helm Repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Step 2: Install the Prometheus Operator

helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace

This command installs:
Prometheus
Alertmanager
Grafana
Node Exporters

Verify the installation:

kubectl get pods -n monitoring

Deploying a Prometheus Instance

Now, we will define a Prometheus Custom Resource to manage our Prometheus deployment.

prometheus.yaml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-instance
  namespace: monitoring
spec:
  replicas: 2
  serviceMonitorSelector: {}
  resources:
    requests:
      memory: 400Mi
      cpu: 200m

Click Here to Copy YAML

Apply the Prometheus Instance

kubectl apply -f prometheus.yaml

This will automatically create a Prometheus instance that follows Kubernetes best practices.

Configuring Service Discovery with ServiceMonitor

Prometheus requires service discovery to scrape metrics from your applications. The ServiceMonitor CRD makes this process seamless.

servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s

Click Here to Copy YAML

Apply the ServiceMonitor Configuration

kubectl apply -f servicemonitor.yaml

This ensures that Prometheus automatically discovers and scrapes metrics from services with the label app: myapp.

Setting Up Alerting with Alertmanager

Alertmanager handles alerts generated by Prometheus and routes them to email, Slack, PagerDuty, etc.

alertmanager.yaml

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: alertmanager-instance
  namespace: monitoring
spec:
  replicas: 2

Click Here to Copy YAML

Apply the Alertmanager Configuration

kubectl apply -f alertmanager.yaml

Now, we have a fully functional alerting system in place.

Accessing Grafana Dashboards

Grafana provides real-time visualization for Prometheus metrics. It is already included in the Prometheus Operator stack.

Access Grafana using port-forwarding

kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80

Open http://localhost:3000 in your browser.
Username: admin
Password: Retrieve using:

kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 --decode

Now, you can import dashboards and start visualizing Kubernetes metrics!

Verifying the Monitoring Stack

To ensure everything is running smoothly, check the pods:

kubectl get pods -n monitoring

If you see Prometheus, Alertmanager, and Grafana running, congratulations! You now have a fully automated Kubernetes monitoring stack.

Conclusion: Why Use the Prometheus Operator?

By using the Prometheus Operator, we achieved:
Simplified monitoring stack deployment
Automated service discovery for metrics collection
Centralized alerting with Alertmanager
Interactive dashboards with Grafana

With this setup, you can scale, extend, and customize monitoring based on your infrastructure needs.

Let me know if you have any questions in the comments!