Build a Complete Kubernetes Monitoring Stack from Scratch (2026)

Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.

Logging, metrics, and alerts on Kubernetes — this project walkthrough sets up the full observability stack from scratch. By the end you'll have:

Prometheus collecting cluster and app metrics
Grafana with pre-built dashboards
Loki for log aggregation
AlertManager sending alerts to Slack

This is exactly what a production cluster needs. Let's build it.

What You Need

A running Kubernetes cluster (Minikube, k3s, or a cloud cluster)
kubectl configured
helm v3 installed
A Slack webhook URL (for alerts — optional but recommended)

Step 1: Set Up Namespaces and Helm Repos

bash

# Create a dedicated namespace
kubectl create namespace monitoring
 
# Add the Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 
# Add Grafana repo (for Loki)
helm repo add grafana https://grafana.github.io/helm-charts
 
helm repo update

Step 2: Install kube-prometheus-stack

The kube-prometheus-stack chart installs Prometheus, Grafana, AlertManager, and all the node exporters in one shot.

Create a values file:

yaml

# monitoring-values.yaml
grafana:
  enabled: true
  adminPassword: "devopsboys-admin"  # Change this!
  persistence:
    enabled: true
    size: 5Gi
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: "default"
          orgId: 1
          folder: ""
          type: file
          options:
            path: /var/lib/grafana/dashboards/default
 
prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
 
alertmanager:
  enabled: true
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 2Gi
 
nodeExporter:
  enabled: true
 
kubeStateMetrics:
  enabled: true

Install it:

bash

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values monitoring-values.yaml \
  --wait

Check everything came up:

bash

kubectl get pods -n monitoring
 
# NAME                                                  READY   STATUS    RESTARTS
# monitoring-grafana-xxx                                3/3     Running   0
# monitoring-kube-prometheus-prometheus-xxx             2/2     Running   0
# monitoring-kube-alertmanager-xxx                      2/2     Running   0
# monitoring-prometheus-node-exporter-xxx               1/1     Running   0
# monitoring-kube-state-metrics-xxx                     1/1     Running   0

Step 3: Access Grafana

bash

# Port-forward to access Grafana locally
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring

Open http://localhost:3000 — login with admin / devopsboys-admin.

You'll already have dashboards pre-installed:

Kubernetes / Compute Resources / Cluster — CPU/memory overview
Kubernetes / Networking — network traffic
Node Exporter Full — detailed node metrics

Step 4: Configure AlertManager with Slack

Create a Slack webhook at api.slack.com/apps. Then create the alertmanager config:

yaml

# alertmanager-config.yaml
apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-monitoring-kube-prometheus-alertmanager
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m
      slack_api_url: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
 
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'slack-notifications'
      routes:
        - match:
            severity: critical
          receiver: 'slack-critical'
 
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - channel: '#devops-alerts'
            title: '{{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
 
      - name: 'slack-critical'
        slack_configs:
          - channel: '#devops-critical'
            title: '🚨 CRITICAL: {{ .GroupLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

bash

kubectl apply -f alertmanager-config.yaml

Step 5: Install Loki for Log Aggregation

Loki collects logs from all pods and makes them queryable in Grafana.

yaml

# loki-values.yaml
loki:
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem
  auth_enabled: false
 
promtail:
  enabled: true
  config:
    clients:
      - url: http://loki-gateway/loki/api/v1/push

bash

helm install loki grafana/loki-stack \
  --namespace monitoring \
  --values loki-values.yaml \
  --set grafana.enabled=false \
  --wait

Add Loki as a datasource in Grafana:

Go to Configuration → Data Sources → Add data source
Select Loki
URL: http://loki:3100
Save & Test

Now in Explore tab, you can query logs:

{namespace="default"} |= "error"
{app="my-app"} | json | level="error"

Step 6: Add Custom Application Metrics

For your own app to expose metrics to Prometheus, add a /metrics endpoint. If you're using Python:

python

from prometheus_client import start_http_server, Counter, Histogram
 
REQUEST_COUNT = Counter('app_requests_total', 'Total requests', ['method', 'endpoint'])
REQUEST_LATENCY = Histogram('app_request_latency_seconds', 'Request latency')
 
start_http_server(8080)  # Exposes /metrics on port 8080

Then tell Prometheus to scrape it with a ServiceMonitor:

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: monitoring
  labels:
    release: monitoring  # Must match kube-prometheus-stack's label selector
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics
  namespaceSelector:
    matchNames:
      - default

bash

kubectl apply -f servicemonitor.yaml

Verify Prometheus picks it up:

bash

kubectl port-forward svc/monitoring-kube-prometheus-prometheus 9090:9090 -n monitoring
# Open http://localhost:9090/targets — your app should appear

Step 7: Create a Custom Grafana Dashboard

In Grafana, go to Dashboards → New Dashboard → Add Panel.

Panel 1: Request rate

sum(rate(app_requests_total[5m])) by (endpoint)

Panel 2: Error rate

sum(rate(app_requests_total{status=~"5.."}[5m])) / sum(rate(app_requests_total[5m]))

Panel 3: p99 latency

histogram_quantile(0.99, sum(rate(app_request_latency_seconds_bucket[5m])) by (le))

Save the dashboard and export the JSON — commit it to your Git repo. On the next cluster, import it directly.

Step 8: Set Up Useful Alerts

Add PrometheusRule for common alerts:

yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
  namespace: monitoring
  labels:
    release: monitoring
spec:
  groups:
    - name: app.rules
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(app_requests_total{status=~"5.."}[5m]))
            / sum(rate(app_requests_total[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate detected"
            description: "Error rate is {{ $value | humanizePercentage }} over 5 minutes"
 
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"

bash

kubectl apply -f alerts.yaml

Full Stack Summary

┌─────────────────────────────────────────────┐
│              Your Kubernetes Cluster         │
│                                             │
│  App Pods ──metrics──► Prometheus           │
│  App Pods ──logs────► Loki (via Promtail)   │
│  Nodes ──────────────► Node Exporter        │
│  K8s API ────────────► kube-state-metrics   │
│                                             │
│  Prometheus ──────► AlertManager ──► Slack  │
│  Prometheus + Loki ──► Grafana Dashboards   │
└─────────────────────────────────────────────┘

Resources to Go Deeper

KodeKloud Kubernetes Monitoring Course — Hands-on labs for Prometheus, Grafana, and alerting in Kubernetes
Grafana Cloud Free Tier — Host dashboards without running your own Grafana
DigitalOcean Kubernetes $200 Credit — Run a real cluster to practice this project

This stack takes about an hour to set up the first time. Once it's running, you'll have full visibility into every pod, node, and application in your cluster. Commit your values files to Git, and you can spin up the same observability stack on any new cluster in minutes.

Build a Complete Kubernetes Monitoring Stack from Scratch (2026)

What You Need

Step 1: Set Up Namespaces and Helm Repos

Step 2: Install kube-prometheus-stack

Step 3: Access Grafana

Step 4: Configure AlertManager with Slack

Step 5: Install Loki for Log Aggregation

Step 6: Add Custom Application Metrics

Step 7: Create a Custom Grafana Dashboard

Step 8: Set Up Useful Alerts

Full Stack Summary

Resources to Go Deeper

Stay ahead of the curve

Related Articles

Prometheus High Cardinality Causing OOM — How to Find and Fix It (2026)

AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds

Build an AI-Powered SLO Breach Predictor with Claude and Prometheus

Comments