Prometheus Alerts Not Firing: Every Cause and Fix

Your Prometheus alert should have fired 30 minutes ago but nothing happened. Here's every reason alerts silently fail — routing, inhibition, receivers, and rule syntax.

You wrote an alert rule. The condition is clearly met. But no notification came. No PagerDuty, no Slack, nothing.

This is one of the most frustrating problems in observability — silent alert failures. Here's every cause and how to diagnose each one.

The Alert Pipeline

Before debugging, understand the full path an alert takes:

Prometheus Rule → PENDING → FIRING
                               ↓
                        Alertmanager
                               ↓
                    Route matching → Receiver
                               ↓
                    Slack / PagerDuty / Email

Failure can happen at any stage. Work through them in order.

Step 1 — Is the Alert Even Firing in Prometheus?

Go to your Prometheus UI → Alerts tab.

Three states:

Inactive — condition not met (check your PromQL)
Pending — condition met but waiting for for duration
Firing — alert is firing, Alertmanager should have received it

bash

# Port-forward Prometheus UI locally
kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring

If Inactive: Your PromQL expression is wrong. Test it in the Graph tab:

promql

# Example alert rule
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 5m

Test the expression rate(http_requests_total{status=~"5.."}[5m]) > 0.05 directly in the Graph tab. If it returns no results, the condition isn't met or the metric doesn't exist.

bash

# Check if metric exists
curl -s http://localhost:9090/api/v1/label/__name__/values | jq '.data[]' | grep http_requests

Step 2 — Alert Is Pending But Never Fires

If an alert is stuck in Pending, the for duration hasn't elapsed yet. This is normal — it's the grace period to avoid flapping alerts.

Problem: for: 1h — your alert fires after 1 hour of continuous condition. If the condition recovers and re-triggers, the clock resets.

Fix: Reduce the for duration for critical alerts:

yaml

- alert: PodCrashLoopBackOff
  expr: kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1
  for: 5m      # Not 1h
  labels:
    severity: critical

Step 3 — Alert Is Firing But Alertmanager Didn't Receive It

Check Alertmanager is receiving alerts from Prometheus:

bash

# Check Alertmanager targets in Prometheus
# Go to: Prometheus UI → Status → Targets
# Look for alertmanager target — it should be UP
 
# Or check via API
curl http://localhost:9090/api/v1/alertmanagers

Common issue: Alertmanager URL misconfigured in Prometheus config:

yaml

# prometheus.yaml
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093   # Must match the service name and port

bash

# Verify alertmanager service
kubectl get svc -n monitoring | grep alertmanager

Step 4 — Alertmanager Received Alert But No Notification

Check the Alertmanager UI:

bash

kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring

Go to http://localhost:9093. Under Alerts, you should see incoming alerts.

Cause A: Route Not Matching

Your alert labels don't match any route's match conditions.

yaml

# alertmanager.yaml
route:
  receiver: 'null'         # Default = drop everything
  routes:
    - match:
        severity: critical   # Only routes alerts with severity=critical
      receiver: pagerduty

If your alert doesn't have severity: critical, it goes to the null receiver (dropped).

Fix: Check your alert labels match the route:

yaml

# In your Prometheus rule:
labels:
  severity: critical        # Must match route's match condition
 
# In alertmanager route:
routes:
  - match:
      severity: critical
    receiver: pagerduty

Cause B: Alert Is Silenced

Check for active silences in the Alertmanager UI under Silences. Someone may have silenced the alert (common after an incident).

bash

# List silences via API
curl http://localhost:9093/api/v2/silences | jq '.[] | select(.status.state == "active")'
 
# Delete all active silences (use with caution)
curl -X DELETE http://localhost:9093/api/v2/silence/<silence-id>

Cause C: Alert Is Inhibited

Inhibition rules suppress alerts when a higher-severity alert is firing.

yaml

# alertmanager.yaml
inhibit_rules:
  - source_match:
      severity: critical
    target_match:
      severity: warning     # Suppresses all warnings when a critical alert fires
    equal: ['alertname', 'cluster']

If a critical alert is firing for the same cluster, all warning alerts are suppressed.

Fix: Check if any inhibition rules are matching:

bash

# In Alertmanager UI → Alerts tab
# Inhibited alerts show with a yellow indicator

Cause D: Receiver Misconfigured

Your route matches but the receiver has a bad config (wrong webhook URL, wrong Slack channel).

bash

# Check Alertmanager logs for errors
kubectl logs -n monitoring -l app.kubernetes.io/name=alertmanager | grep -i error
 
# Common errors:
# "connection refused" → wrong webhook URL
# "invalid_auth" → wrong Slack token
# "channel not found" → wrong Slack channel name

Fix — Test your Slack webhook manually:

bash

curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"Test alert from Alertmanager"}' \
  https://hooks.slack.com/services/YOUR/WEBHOOK/URL

Step 5 — Group Wait / Group Interval Delaying Alerts

Alertmanager batches alerts before sending:

yaml

route:
  group_wait: 30s        # Wait 30s before sending first notification
  group_interval: 5m     # Wait 5m between notifications for same group
  repeat_interval: 4h    # Repeat notification every 4h if still firing

If you just fired an alert, group_wait: 30s means you wait 30 seconds. If another alert fires in those 30 seconds, they're batched together.

For critical alerts, reduce group_wait:

yaml

routes:
  - match:
      severity: critical
    receiver: pagerduty
    group_wait: 0s        # Send immediately
    group_interval: 1m

Full Diagnostic Checklist

bash

# 1. Alert visible in Prometheus UI → Alerts tab?
kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring
 
# 2. Alertmanager receiving alerts?
curl http://localhost:9090/api/v1/alertmanagers
 
# 3. Alert visible in Alertmanager UI?
kubectl port-forward svc/alertmanager-operated 9093:9093 -n monitoring
 
# 4. Any active silences?
curl http://localhost:9093/api/v2/silences
 
# 5. Alertmanager logs showing errors?
kubectl logs -n monitoring -l app.kubernetes.io/name=alertmanager --tail=50
 
# 6. Receiver config valid? (send test notification)
curl -H "Content-Type: application/json" -d \
  '[{"labels":{"alertname":"TestAlert","severity":"critical"}}]' \
  http://localhost:9093/api/v1/alerts

Prevention: Test Your Alert Pipeline

Never trust alert configs without testing them. Add a always-firing test alert:

yaml

- alert: AlertmanagerPipelineTest
  expr: vector(1)      # Always true
  for: 1m
  labels:
    severity: info
  annotations:
    summary: "Test alert — Alertmanager pipeline is working"

If you get this notification, your entire pipeline is healthy.

For hands-on Prometheus and Alertmanager labs, KodeKloud has dedicated monitoring courses with real cluster exercises.

Prometheus Alerts Not Firing: Every Cause and Fix

The Alert Pipeline

Step 1 — Is the Alert Even Firing in Prometheus?

Step 2 — Alert Is Pending But Never Fires

Step 3 — Alert Is Firing But Alertmanager Didn't Receive It

Step 4 — Alertmanager Received Alert But No Notification

Cause A: Route Not Matching

Cause B: Alert Is Silenced

Cause C: Alert Is Inhibited

Cause D: Receiver Misconfigured

Step 5 — Group Wait / Group Interval Delaying Alerts

Full Diagnostic Checklist

Prevention: Test Your Alert Pipeline

Stay ahead of the curve

Related Articles

Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)

AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds

AWS EKS Pods Stuck in Pending State: Causes and Fixes

Comments