Prometheus Alerts Not Firing: Every Cause and Fix
Your Prometheus alert should have fired 30 minutes ago but nothing happened. Here's every reason alerts silently fail — routing, inhibition, receivers, and rule syntax.
You wrote an alert rule. The condition is clearly met. But no notification came. No PagerDuty, no Slack, nothing.
This is one of the most frustrating problems in observability — silent alert failures. Here's every cause and how to diagnose each one.
The Alert Pipeline
Before debugging, understand the full path an alert takes:
Prometheus Rule → PENDING → FIRING
↓
Alertmanager
↓
Route matching → Receiver
↓
Slack / PagerDuty / Email
Failure can happen at any stage. Work through them in order.
Step 1 — Is the Alert Even Firing in Prometheus?
Go to your Prometheus UI → Alerts tab.
Three states:
- Inactive — condition not met (check your PromQL)
- Pending — condition met but waiting for
forduration - Firing — alert is firing, Alertmanager should have received it
# Port-forward Prometheus UI locally
kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoringIf Inactive: Your PromQL expression is wrong. Test it in the Graph tab:
# Example alert rule
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5mTest the expression rate(http_requests_total{status=~"5.."}[5m]) > 0.05 directly in the Graph tab. If it returns no results, the condition isn't met or the metric doesn't exist.
# Check if metric exists
curl -s http://localhost:9090/api/v1/label/__name__/values | jq '.data[]' | grep http_requestsStep 2 — Alert Is Pending But Never Fires
If an alert is stuck in Pending, the for duration hasn't elapsed yet. This is normal — it's the grace period to avoid flapping alerts.
Problem: for: 1h — your alert fires after 1 hour of continuous condition. If the condition recovers and re-triggers, the clock resets.
Fix: Reduce the for duration for critical alerts:
- alert: PodCrashLoopBackOff
expr: kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1
for: 5m # Not 1h
labels:
severity: criticalStep 3 — Alert Is Firing But Alertmanager Didn't Receive It
Check Alertmanager is receiving alerts from Prometheus:
# Check Alertmanager targets in Prometheus
# Go to: Prometheus UI → Status → Targets
# Look for alertmanager target — it should be UP
# Or check via API
curl http://localhost:9090/api/v1/alertmanagersCommon issue: Alertmanager URL misconfigured in Prometheus config:
# prometheus.yaml
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093 # Must match the service name and port# Verify alertmanager service
kubectl get svc -n monitoring | grep alertmanagerStep 4 — Alertmanager Received Alert But No Notification
Check the Alertmanager UI:
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoringGo to http://localhost:9093. Under Alerts, you should see incoming alerts.
Cause A: Route Not Matching
Your alert labels don't match any route's match conditions.
# alertmanager.yaml
route:
receiver: 'null' # Default = drop everything
routes:
- match:
severity: critical # Only routes alerts with severity=critical
receiver: pagerdutyIf your alert doesn't have severity: critical, it goes to the null receiver (dropped).
Fix: Check your alert labels match the route:
# In your Prometheus rule:
labels:
severity: critical # Must match route's match condition
# In alertmanager route:
routes:
- match:
severity: critical
receiver: pagerdutyCause B: Alert Is Silenced
Check for active silences in the Alertmanager UI under Silences. Someone may have silenced the alert (common after an incident).
# List silences via API
curl http://localhost:9093/api/v2/silences | jq '.[] | select(.status.state == "active")'
# Delete all active silences (use with caution)
curl -X DELETE http://localhost:9093/api/v2/silence/<silence-id>Cause C: Alert Is Inhibited
Inhibition rules suppress alerts when a higher-severity alert is firing.
# alertmanager.yaml
inhibit_rules:
- source_match:
severity: critical
target_match:
severity: warning # Suppresses all warnings when a critical alert fires
equal: ['alertname', 'cluster']If a critical alert is firing for the same cluster, all warning alerts are suppressed.
Fix: Check if any inhibition rules are matching:
# In Alertmanager UI → Alerts tab
# Inhibited alerts show with a yellow indicatorCause D: Receiver Misconfigured
Your route matches but the receiver has a bad config (wrong webhook URL, wrong Slack channel).
# Check Alertmanager logs for errors
kubectl logs -n monitoring -l app.kubernetes.io/name=alertmanager | grep -i error
# Common errors:
# "connection refused" → wrong webhook URL
# "invalid_auth" → wrong Slack token
# "channel not found" → wrong Slack channel nameFix — Test your Slack webhook manually:
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"Test alert from Alertmanager"}' \
https://hooks.slack.com/services/YOUR/WEBHOOK/URLStep 5 — Group Wait / Group Interval Delaying Alerts
Alertmanager batches alerts before sending:
route:
group_wait: 30s # Wait 30s before sending first notification
group_interval: 5m # Wait 5m between notifications for same group
repeat_interval: 4h # Repeat notification every 4h if still firingIf you just fired an alert, group_wait: 30s means you wait 30 seconds. If another alert fires in those 30 seconds, they're batched together.
For critical alerts, reduce group_wait:
routes:
- match:
severity: critical
receiver: pagerduty
group_wait: 0s # Send immediately
group_interval: 1mFull Diagnostic Checklist
# 1. Alert visible in Prometheus UI → Alerts tab?
kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring
# 2. Alertmanager receiving alerts?
curl http://localhost:9090/api/v1/alertmanagers
# 3. Alert visible in Alertmanager UI?
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring
# 4. Any active silences?
curl http://localhost:9093/api/v2/silences
# 5. Alertmanager logs showing errors?
kubectl logs -n monitoring -l app.kubernetes.io/name=alertmanager --tail=50
# 6. Receiver config valid? (send test notification)
curl -H "Content-Type: application/json" -d \
'[{"labels":{"alertname":"TestAlert","severity":"critical"}}]' \
http://localhost:9093/api/v1/alertsPrevention: Test Your Alert Pipeline
Never trust alert configs without testing them. Add a always-firing test alert:
- alert: AlertmanagerPipelineTest
expr: vector(1) # Always true
for: 1m
labels:
severity: info
annotations:
summary: "Test alert — Alertmanager pipeline is working"If you get this notification, your entire pipeline is healthy.
For hands-on Prometheus and Alertmanager labs, KodeKloud has dedicated monitoring courses with real cluster exercises.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)
Your Prometheus /targets page shows red. Services are running but Prometheus can't scrape them. Here's every reason this happens — wrong port, NetworkPolicy blocks, ServiceMonitor label mismatch, auth — and exactly how to fix each one.
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.