All Articles

How to Set Up Prometheus Alertmanager from Scratch (2026)

Step-by-step guide to setting up Prometheus Alertmanager for Kubernetes monitoring. Covers installation, alert rules, routing, Slack/PagerDuty integration, silencing, and production best practices.

DevOpsBoysMar 29, 20266 min read
Share:Tweet

Monitoring without alerting is just watching dashboards and hoping someone notices when things break. Prometheus collects metrics beautifully, but without Alertmanager, those metrics are passive data — they don't wake anyone up at 3 AM when the database is running out of disk.

This guide takes you from zero alerts to a production-ready alerting pipeline with Prometheus Alertmanager, including Slack notifications, PagerDuty integration, and smart routing.

Architecture Overview

Prometheus → evaluates alert rules → fires alerts → Alertmanager
                                                         │
                                    ┌────────────────────┤
                                    ↓                    ↓
                               Route: critical      Route: warning
                                    ↓                    ↓
                              PagerDuty            Slack #alerts
                              (pages oncall)       (informational)

Prometheus evaluates alert rules against metrics and sends firing alerts to Alertmanager. Alertmanager handles:

  • Routing — which alerts go where
  • Grouping — batching related alerts
  • Inhibition — suppressing less important alerts when critical ones fire
  • Silencing — temporarily muting alerts during maintenance

Step 1: Install with kube-prometheus-stack

The easiest way to get Prometheus + Alertmanager running:

bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
 
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set alertmanager.enabled=true \
  --set grafana.enabled=true

Verify:

bash
kubectl get pods -n monitoring
NAME                                                     READY   STATUS
alertmanager-monitoring-kube-prometheus-alertmanager-0    2/2     Running
prometheus-monitoring-kube-prometheus-prometheus-0        2/2     Running
monitoring-grafana-6b4d7f8c9-x2k4m                      3/3     Running
monitoring-kube-prometheus-operator-5f9d8b7c6-9j3kl      1/1     Running

Access the Alertmanager UI:

bash
kubectl port-forward -n monitoring svc/alertmanager-monitoring-kube-prometheus-alertmanager 9093:9093
# Open http://localhost:9093

Step 2: Write Alert Rules

Alert rules define when to fire. Create a PrometheusRule resource:

yaml
# alert-rules.yaml
apiVersion: monitoring.coreos.io/v1
kind: PrometheusRule
metadata:
  name: application-alerts
  namespace: monitoring
  labels:
    release: monitoring  # Must match Prometheus operator selector
spec:
  groups:
    - name: application.rules
      rules:
        # High error rate
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
            /
            sum(rate(http_requests_total[5m])) by (service)
            > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High 5xx error rate on {{ $labels.service }}"
            description: "{{ $labels.service }} has {{ $value | humanizePercentage }} error rate (>5%) for 5 minutes."
            runbook_url: "https://wiki.internal/runbooks/high-error-rate"
 
        # High latency
        - alert: HighLatencyP99
          expr: |
            histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
            > 2.0
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "P99 latency >2s on {{ $labels.service }}"
            description: "{{ $labels.service }} P99 latency is {{ $value | humanizeDuration }}."
 
        # Pod memory approaching limit
        - alert: PodMemoryHighUsage
          expr: |
            container_memory_working_set_bytes{container!=""}
            /
            container_spec_memory_limit_bytes{container!=""}
            > 0.85
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} using >85% memory limit"
            description: "{{ $labels.pod }} in {{ $labels.namespace }} is at {{ $value | humanizePercentage }} of memory limit."
 
        # Disk filling up
        - alert: DiskSpaceRunningLow
          expr: |
            (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})
            < 0.15
          for: 30m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.instance }} disk <15% free"
            description: "Root filesystem on {{ $labels.instance }} has only {{ $value | humanizePercentage }} free."
 
    - name: kubernetes.rules
      rules:
        # Pod crash loop
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"
            description: "{{ $labels.pod }} in {{ $labels.namespace }} has restarted {{ $value | humanize }} times in 15 minutes."
 
        # Deployment replica mismatch
        - alert: DeploymentReplicasMismatch
          expr: |
            kube_deployment_spec_replicas != kube_deployment_status_available_replicas
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Deployment {{ $labels.deployment }} replica mismatch"
            description: "{{ $labels.deployment }} has {{ $value }} available replicas, expected {{ $labels.spec_replicas }}."
 
        # Node not ready
        - alert: NodeNotReady
          expr: kube_node_status_condition{condition="Ready",status="true"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.node }} is not ready"
bash
kubectl apply -f alert-rules.yaml

Verify rules are loaded:

bash
kubectl port-forward -n monitoring svc/prometheus-monitoring-kube-prometheus-prometheus 9090:9090
# Check http://localhost:9090/rules

Step 3: Configure Alertmanager Routing

Now configure where alerts go. Create an Alertmanager config:

yaml
# alertmanager-config.yaml
apiVersion: monitoring.coreos.io/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alert-routing
  namespace: monitoring
  labels:
    release: monitoring
spec:
  route:
    groupBy: ['alertname', 'namespace', 'service']
    groupWait: 30s       # Wait 30s to batch initial alerts
    groupInterval: 5m    # Wait 5m between batched notifications
    repeatInterval: 4h   # Resend every 4h if still firing
    receiver: slack-warnings
    routes:
      # Critical alerts → PagerDuty (page someone)
      - matchers:
          - name: severity
            value: critical
        receiver: pagerduty-critical
        repeatInterval: 1h
 
      # Warning alerts → Slack (informational)
      - matchers:
          - name: severity
            value: warning
        receiver: slack-warnings
 
  receivers:
    - name: slack-warnings
      slackConfigs:
        - apiURL:
            name: slack-webhook-url
            key: url
          channel: '#alerts-warning'
          sendResolved: true
          title: '{{ if eq .Status "firing" }}🔥{{ else }}✅{{ end }} {{ .CommonLabels.alertname }}'
          text: |
            *Status:* {{ .Status | toUpper }}
            *Alert:* {{ .CommonLabels.alertname }}
            *Severity:* {{ .CommonLabels.severity }}
            {{ range .Alerts }}
            *Description:* {{ .Annotations.description }}
            *Details:*
            {{ range .Labels.SortedPairs }} • *{{ .Name }}:* {{ .Value }}
            {{ end }}
            {{ end }}
 
    - name: pagerduty-critical
      pagerdutyConfigs:
        - routingKey:
            name: pagerduty-routing-key
            key: key
          severity: '{{ if eq .CommonLabels.severity "critical" }}critical{{ else }}warning{{ end }}'
          description: '{{ .CommonLabels.alertname }}: {{ .CommonAnnotations.summary }}'
          details:
            - key: namespace
              value: '{{ .CommonLabels.namespace }}'
            - key: service
              value: '{{ .CommonLabels.service }}'

Create the secrets:

bash
# Slack webhook
kubectl create secret generic slack-webhook-url \
  -n monitoring \
  --from-literal=url='https://hooks.slack.com/services/T00/B00/xxx'
 
# PagerDuty routing key
kubectl create secret generic pagerduty-routing-key \
  -n monitoring \
  --from-literal=key='your-pagerduty-integration-key'
bash
kubectl apply -f alertmanager-config.yaml

Step 4: Test Your Alerts

Fire a Test Alert

bash
# Send a test alert directly to Alertmanager
curl -X POST http://localhost:9093/api/v2/alerts \
  -H "Content-Type: application/json" \
  -d '[{
    "labels": {
      "alertname": "TestAlert",
      "severity": "warning",
      "service": "test-service"
    },
    "annotations": {
      "summary": "This is a test alert",
      "description": "Testing the alerting pipeline"
    },
    "startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
  }]'

Check Slack — you should see the notification within 30 seconds.

Trigger a Real Alert

Create a pod that will crash-loop:

bash
kubectl run crasher --image=busybox --restart=Always -- /bin/sh -c "exit 1"
# Wait 15 minutes for PodCrashLooping alert to fire
# Then clean up:
kubectl delete pod crasher

Step 5: Silencing and Inhibition

Silence Alerts During Maintenance

Through the Alertmanager UI (port 9093), or via API:

bash
# Silence all alerts for a specific namespace for 2 hours
curl -X POST http://localhost:9093/api/v2/silences \
  -H "Content-Type: application/json" \
  -d '{
    "matchers": [
      {"name": "namespace", "value": "staging", "isRegex": false}
    ],
    "startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
    "endsAt": "'$(date -u -d "+2 hours" +%Y-%m-%dT%H:%M:%SZ)'",
    "createdBy": "admin",
    "comment": "Staging maintenance window"
  }'

Inhibition Rules

Suppress warning alerts when critical alerts are already firing for the same service:

yaml
# In Alertmanager config
inhibitRules:
  - sourceMatch:
      - name: severity
        value: critical
    targetMatch:
      - name: severity
        value: warning
    equal: ['alertname', 'namespace', 'service']

If HighErrorRate is critical and firing, don't also send the HighLatencyP99 warning for the same service — the latency is probably caused by the errors.

Alert Rule Best Practices

1. Always Use for Duration

Never alert on instantaneous spikes:

yaml
# Bad: fires on a single metric scrape spike
- alert: HighCPU
  expr: node_cpu_seconds_total > 0.9
 
# Good: must sustain for 5 minutes
- alert: HighCPU
  expr: avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance) > 0.9
  for: 5m

2. Include Runbook URLs

Every alert should link to a runbook:

yaml
annotations:
  runbook_url: "https://wiki.internal/runbooks/{{ $labels.alertname }}"

3. Use Meaningful Labels

yaml
labels:
  severity: critical    # For routing decisions
  team: platform        # For ownership
  sla_tier: tier1       # For priority

4. Don't Alert on Symptoms and Causes

If you alert on "database connection pool exhausted" (cause), don't also alert on "API latency high" (symptom). Use inhibition rules.

Quick Reference

bash
# Check firing alerts
kubectl port-forward -n monitoring svc/alertmanager-monitoring-kube-prometheus-alertmanager 9093:9093
# Visit http://localhost:9093/#/alerts
 
# Check alert rules in Prometheus
kubectl port-forward -n monitoring svc/prometheus-monitoring-kube-prometheus-prometheus 9090:9090
# Visit http://localhost:9090/rules
 
# View Alertmanager config
kubectl get secret alertmanager-monitoring-kube-prometheus-alertmanager -n monitoring -o jsonpath='{.data.alertmanager\.yaml}' | base64 -d
 
# Check Alertmanager logs
kubectl logs -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager-0 -c alertmanager

For mastering Prometheus and Kubernetes monitoring, KodeKloud's observability courses provide hands-on labs where you can practice building alert rules and testing alerting pipelines in real clusters.


A dashboard without alerts is just television. Make your monitoring actionable.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments