All Articles

Build a Complete Kubernetes Monitoring Stack from Scratch (2026)

Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.

DevOpsBoysMar 31, 20265 min read
Share:Tweet

Logging, metrics, and alerts on Kubernetes — this project walkthrough sets up the full observability stack from scratch. By the end you'll have:

  • Prometheus collecting cluster and app metrics
  • Grafana with pre-built dashboards
  • Loki for log aggregation
  • AlertManager sending alerts to Slack

This is exactly what a production cluster needs. Let's build it.


What You Need

  • A running Kubernetes cluster (Minikube, k3s, or a cloud cluster)
  • kubectl configured
  • helm v3 installed
  • A Slack webhook URL (for alerts — optional but recommended)

Step 1: Set Up Namespaces and Helm Repos

bash
# Create a dedicated namespace
kubectl create namespace monitoring
 
# Add the Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 
# Add Grafana repo (for Loki)
helm repo add grafana https://grafana.github.io/helm-charts
 
helm repo update

Step 2: Install kube-prometheus-stack

The kube-prometheus-stack chart installs Prometheus, Grafana, AlertManager, and all the node exporters in one shot.

Create a values file:

yaml
# monitoring-values.yaml
grafana:
  enabled: true
  adminPassword: "devopsboys-admin"  # Change this!
  persistence:
    enabled: true
    size: 5Gi
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: "default"
          orgId: 1
          folder: ""
          type: file
          options:
            path: /var/lib/grafana/dashboards/default
 
prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
 
alertmanager:
  enabled: true
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 2Gi
 
nodeExporter:
  enabled: true
 
kubeStateMetrics:
  enabled: true

Install it:

bash
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values monitoring-values.yaml \
  --wait

Check everything came up:

bash
kubectl get pods -n monitoring
 
# NAME                                                  READY   STATUS    RESTARTS
# monitoring-grafana-xxx                                3/3     Running   0
# monitoring-kube-prometheus-prometheus-xxx             2/2     Running   0
# monitoring-kube-alertmanager-xxx                      2/2     Running   0
# monitoring-prometheus-node-exporter-xxx               1/1     Running   0
# monitoring-kube-state-metrics-xxx                     1/1     Running   0

Step 3: Access Grafana

bash
# Port-forward to access Grafana locally
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring

Open http://localhost:3000 — login with admin / devopsboys-admin.

You'll already have dashboards pre-installed:

  • Kubernetes / Compute Resources / Cluster — CPU/memory overview
  • Kubernetes / Networking — network traffic
  • Node Exporter Full — detailed node metrics

Step 4: Configure AlertManager with Slack

Create a Slack webhook at api.slack.com/apps. Then create the alertmanager config:

yaml
# alertmanager-config.yaml
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-monitoring-kube-prometheus-alertmanager
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m
      slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
 
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'slack-notifications'
      routes:
        - match:
            severity: critical
          receiver: 'slack-critical'
 
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - channel: '#devops-alerts'
            title: '{{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
 
      - name: 'slack-critical'
        slack_configs:
          - channel: '#devops-critical'
            title: '🚨 CRITICAL: {{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
bash
kubectl apply -f alertmanager-config.yaml

Step 5: Install Loki for Log Aggregation

Loki collects logs from all pods and makes them queryable in Grafana.

yaml
# loki-values.yaml
loki:
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  auth_enabled: false
 
promtail:
  enabled: true
  config:
    clients:
      - url: http://loki-gateway/loki/api/v1/push
bash
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --values loki-values.yaml \
  --set grafana.enabled=false \
  --wait

Add Loki as a datasource in Grafana:

  1. Go to Configuration → Data Sources → Add data source
  2. Select Loki
  3. URL: http://loki:3100
  4. Save & Test

Now in Explore tab, you can query logs:

{namespace="default"} |= "error"
{app="my-app"} | json | level="error"

Step 6: Add Custom Application Metrics

For your own app to expose metrics to Prometheus, add a /metrics endpoint. If you're using Python:

python
from prometheus_client import start_http_server, Counter, Histogram
 
REQUEST_COUNT = Counter('app_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('app_request_latency_seconds', 'Request latency')
 
start_http_server(8080)  # Exposes /metrics on port 8080

Then tell Prometheus to scrape it with a ServiceMonitor:

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: monitoring
  labels:
    release: monitoring  # Must match kube-prometheus-stack's label selector
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics
  namespaceSelector:
    matchNames:
      - default
bash
kubectl apply -f servicemonitor.yaml

Verify Prometheus picks it up:

bash
kubectl port-forward svc/monitoring-kube-prometheus-prometheus 9090:9090 -n monitoring
# Open http://localhost:9090/targets — your app should appear

Step 7: Create a Custom Grafana Dashboard

In Grafana, go to Dashboards → New Dashboard → Add Panel.

Panel 1: Request rate

sum(rate(app_requests_total[5m])) by (endpoint)

Panel 2: Error rate

sum(rate(app_requests_total{status=~"5.."}[5m])) / sum(rate(app_requests_total[5m]))

Panel 3: p99 latency

histogram_quantile(0.99, sum(rate(app_request_latency_seconds_bucket[5m])) by (le))

Save the dashboard and export the JSON — commit it to your Git repo. On the next cluster, import it directly.


Step 8: Set Up Useful Alerts

Add PrometheusRule for common alerts:

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
  namespace: monitoring
  labels:
    release: monitoring
spec:
  groups:
    - name: app.rules
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(app_requests_total{status=~"5.."}[5m]))
            / sum(rate(app_requests_total[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate detected"
            description: "Error rate is {{ $value | humanizePercentage }} over 5 minutes"
 
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"
bash
kubectl apply -f alerts.yaml

Full Stack Summary

┌─────────────────────────────────────────────┐
│              Your Kubernetes Cluster         │
│                                             │
│  App Pods ──metrics──► Prometheus           │
│  App Pods ──logs────► Loki (via Promtail)   │
│  Nodes ──────────────► Node Exporter        │
│  K8s API ────────────► kube-state-metrics   │
│                                             │
│  Prometheus ──────► AlertManager ──► Slack  │
│  Prometheus + Loki ──► Grafana Dashboards   │
└─────────────────────────────────────────────┘

Resources to Go Deeper


This stack takes about an hour to set up the first time. Once it's running, you'll have full visibility into every pod, node, and application in your cluster. Commit your values files to Git, and you can spin up the same observability stack on any new cluster in minutes.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments