How to Set Up Prometheus Alertmanager from Scratch (2026)
Step-by-step guide to setting up Prometheus Alertmanager for Kubernetes monitoring. Covers installation, alert rules, routing, Slack/PagerDuty integration, silencing, and production best practices.
Monitoring without alerting is just watching dashboards and hoping someone notices when things break. Prometheus collects metrics beautifully, but without Alertmanager, those metrics are passive data — they don't wake anyone up at 3 AM when the database is running out of disk.
This guide takes you from zero alerts to a production-ready alerting pipeline with Prometheus Alertmanager, including Slack notifications, PagerDuty integration, and smart routing.
Architecture Overview
Prometheus → evaluates alert rules → fires alerts → Alertmanager
│
┌────────────────────┤
↓ ↓
Route: critical Route: warning
↓ ↓
PagerDuty Slack #alerts
(pages oncall) (informational)
Prometheus evaluates alert rules against metrics and sends firing alerts to Alertmanager. Alertmanager handles:
- Routing — which alerts go where
- Grouping — batching related alerts
- Inhibition — suppressing less important alerts when critical ones fire
- Silencing — temporarily muting alerts during maintenance
Step 1: Install with kube-prometheus-stack
The easiest way to get Prometheus + Alertmanager running:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set alertmanager.enabled=true \
--set grafana.enabled=trueVerify:
kubectl get pods -n monitoringNAME READY STATUS
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running
monitoring-grafana-6b4d7f8c9-x2k4m 3/3 Running
monitoring-kube-prometheus-operator-5f9d8b7c6-9j3kl 1/1 Running
Access the Alertmanager UI:
kubectl port-forward -n monitoring svc/alertmanager-monitoring-kube-prometheus-alertmanager 9093:9093
# Open http://localhost:9093Step 2: Write Alert Rules
Alert rules define when to fire. Create a PrometheusRule resource:
# alert-rules.yaml
apiVersion: monitoring.coreos.io/v1
kind: PrometheusRule
metadata:
name: application-alerts
namespace: monitoring
labels:
release: monitoring # Must match Prometheus operator selector
spec:
groups:
- name: application.rules
rules:
# High error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
> 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High 5xx error rate on {{ $labels.service }}"
description: "{{ $labels.service }} has {{ $value | humanizePercentage }} error rate (>5%) for 5 minutes."
runbook_url: "https://wiki.internal/runbooks/high-error-rate"
# High latency
- alert: HighLatencyP99
expr: |
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
> 2.0
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency >2s on {{ $labels.service }}"
description: "{{ $labels.service }} P99 latency is {{ $value | humanizeDuration }}."
# Pod memory approaching limit
- alert: PodMemoryHighUsage
expr: |
container_memory_working_set_bytes{container!=""}
/
container_spec_memory_limit_bytes{container!=""}
> 0.85
for: 15m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} using >85% memory limit"
description: "{{ $labels.pod }} in {{ $labels.namespace }} is at {{ $value | humanizePercentage }} of memory limit."
# Disk filling up
- alert: DiskSpaceRunningLow
expr: |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})
< 0.15
for: 30m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} disk <15% free"
description: "Root filesystem on {{ $labels.instance }} has only {{ $value | humanizePercentage }} free."
- name: kubernetes.rules
rules:
# Pod crash loop
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash looping"
description: "{{ $labels.pod }} in {{ $labels.namespace }} has restarted {{ $value | humanize }} times in 15 minutes."
# Deployment replica mismatch
- alert: DeploymentReplicasMismatch
expr: |
kube_deployment_spec_replicas != kube_deployment_status_available_replicas
for: 15m
labels:
severity: warning
annotations:
summary: "Deployment {{ $labels.deployment }} replica mismatch"
description: "{{ $labels.deployment }} has {{ $value }} available replicas, expected {{ $labels.spec_replicas }}."
# Node not ready
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is not ready"kubectl apply -f alert-rules.yamlVerify rules are loaded:
kubectl port-forward -n monitoring svc/prometheus-monitoring-kube-prometheus-prometheus 9090:9090
# Check http://localhost:9090/rulesStep 3: Configure Alertmanager Routing
Now configure where alerts go. Create an Alertmanager config:
# alertmanager-config.yaml
apiVersion: monitoring.coreos.io/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alert-routing
namespace: monitoring
labels:
release: monitoring
spec:
route:
groupBy: ['alertname', 'namespace', 'service']
groupWait: 30s # Wait 30s to batch initial alerts
groupInterval: 5m # Wait 5m between batched notifications
repeatInterval: 4h # Resend every 4h if still firing
receiver: slack-warnings
routes:
# Critical alerts → PagerDuty (page someone)
- matchers:
- name: severity
value: critical
receiver: pagerduty-critical
repeatInterval: 1h
# Warning alerts → Slack (informational)
- matchers:
- name: severity
value: warning
receiver: slack-warnings
receivers:
- name: slack-warnings
slackConfigs:
- apiURL:
name: slack-webhook-url
key: url
channel: '#alerts-warning'
sendResolved: true
title: '{{ if eq .Status "firing" }}🔥{{ else }}✅{{ end }} {{ .CommonLabels.alertname }}'
text: |
*Status:* {{ .Status | toUpper }}
*Alert:* {{ .CommonLabels.alertname }}
*Severity:* {{ .CommonLabels.severity }}
{{ range .Alerts }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* {{ .Value }}
{{ end }}
{{ end }}
- name: pagerduty-critical
pagerdutyConfigs:
- routingKey:
name: pagerduty-routing-key
key: key
severity: '{{ if eq .CommonLabels.severity "critical" }}critical{{ else }}warning{{ end }}'
description: '{{ .CommonLabels.alertname }}: {{ .CommonAnnotations.summary }}'
details:
- key: namespace
value: '{{ .CommonLabels.namespace }}'
- key: service
value: '{{ .CommonLabels.service }}'Create the secrets:
# Slack webhook
kubectl create secret generic slack-webhook-url \
-n monitoring \
--from-literal=url='https://hooks.slack.com/services/T00/B00/xxx'
# PagerDuty routing key
kubectl create secret generic pagerduty-routing-key \
-n monitoring \
--from-literal=key='your-pagerduty-integration-key'kubectl apply -f alertmanager-config.yamlStep 4: Test Your Alerts
Fire a Test Alert
# Send a test alert directly to Alertmanager
curl -X POST http://localhost:9093/api/v2/alerts \
-H "Content-Type: application/json" \
-d '[{
"labels": {
"alertname": "TestAlert",
"severity": "warning",
"service": "test-service"
},
"annotations": {
"summary": "This is a test alert",
"description": "Testing the alerting pipeline"
},
"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}]'Check Slack — you should see the notification within 30 seconds.
Trigger a Real Alert
Create a pod that will crash-loop:
kubectl run crasher --image=busybox --restart=Always -- /bin/sh -c "exit 1"
# Wait 15 minutes for PodCrashLooping alert to fire
# Then clean up:
kubectl delete pod crasherStep 5: Silencing and Inhibition
Silence Alerts During Maintenance
Through the Alertmanager UI (port 9093), or via API:
# Silence all alerts for a specific namespace for 2 hours
curl -X POST http://localhost:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{
"matchers": [
{"name": "namespace", "value": "staging", "isRegex": false}
],
"startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
"endsAt": "'$(date -u -d "+2 hours" +%Y-%m-%dT%H:%M:%SZ)'",
"createdBy": "admin",
"comment": "Staging maintenance window"
}'Inhibition Rules
Suppress warning alerts when critical alerts are already firing for the same service:
# In Alertmanager config
inhibitRules:
- sourceMatch:
- name: severity
value: critical
targetMatch:
- name: severity
value: warning
equal: ['alertname', 'namespace', 'service']If HighErrorRate is critical and firing, don't also send the HighLatencyP99 warning for the same service — the latency is probably caused by the errors.
Alert Rule Best Practices
1. Always Use for Duration
Never alert on instantaneous spikes:
# Bad: fires on a single metric scrape spike
- alert: HighCPU
expr: node_cpu_seconds_total > 0.9
# Good: must sustain for 5 minutes
- alert: HighCPU
expr: avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance) > 0.9
for: 5m2. Include Runbook URLs
Every alert should link to a runbook:
annotations:
runbook_url: "https://wiki.internal/runbooks/{{ $labels.alertname }}"3. Use Meaningful Labels
labels:
severity: critical # For routing decisions
team: platform # For ownership
sla_tier: tier1 # For priority4. Don't Alert on Symptoms and Causes
If you alert on "database connection pool exhausted" (cause), don't also alert on "API latency high" (symptom). Use inhibition rules.
Quick Reference
# Check firing alerts
kubectl port-forward -n monitoring svc/alertmanager-monitoring-kube-prometheus-alertmanager 9093:9093
# Visit http://localhost:9093/#/alerts
# Check alert rules in Prometheus
kubectl port-forward -n monitoring svc/prometheus-monitoring-kube-prometheus-prometheus 9090:9090
# Visit http://localhost:9090/rules
# View Alertmanager config
kubectl get secret alertmanager-monitoring-kube-prometheus-alertmanager -n monitoring -o jsonpath='{.data.alertmanager\.yaml}' | base64 -d
# Check Alertmanager logs
kubectl logs -n monitoring alertmanager-monitoring-kube-prometheus-alertmanager-0 -c alertmanagerFor mastering Prometheus and Kubernetes monitoring, KodeKloud's observability courses provide hands-on labs where you can practice building alert rules and testing alerting pipelines in real clusters.
A dashboard without alerts is just television. Make your monitoring actionable.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build a Complete Kubernetes Monitoring Stack from Scratch (2026)
Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.
Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)
Grafana Loki is the Prometheus-inspired log aggregation system built for Kubernetes. This guide covers architecture, installation, LogQL queries, and production best practices.