Prometheus AlertManager Silence Not Working? Here's Why (and the Fix)

You created a silence in AlertManager but alerts are still firing. Here are the 6 most common reasons silences fail and exactly how to fix each one.

You're in the middle of a planned maintenance window. You created an AlertManager silence to suppress alerts for the next 2 hours. Ten minutes later, Slack is still getting paged. PagerDuty is firing. Your phone won't stop buzzing.

This is maddening. Here's what's actually going wrong.

How AlertManager Silences Work (What Most Engineers Miss)

A silence in AlertManager works by matching labels. When an alert fires, AlertManager checks: do this alert's labels match any active silence? If yes, the alert is silenced. If no, it fires.

The critical point: a silence doesn't silence an alerting rule. It silences specific label combinations.

If your alert has the label severity="critical" and your silence only matches severity="warning", the alert will still fire. Exact match. No partial match, no wildcard by default.

This single misunderstanding causes 80% of silence failures.

Reason 1: Label Mismatch

Symptom: You created a silence but alerts for the same situation keep firing.

How to diagnose:

bash

# Check what labels your firing alert actually has
curl http://alertmanager:9093/api/v2/alerts | python3 -m json.tool | grep -A 20 "your-alert-name"

Compare those labels against your silence configuration:

bash

# List active silences and their matchers
curl http://alertmanager:9093/api/v2/silences | python3 -m json.tool

Common mistake: Your alert fires with env="production" but your silence has env="prod". Different strings. No match.

Fix: Get the exact label values from the firing alert first, then create the silence:

bash

# See the exact alert with all labels
amtool --alertmanager.url=http://alertmanager:9093 alert query alertname="HighCPU"

Then create the silence matching exactly those labels:

bash

amtool --alertmanager.url=http://alertmanager:9093 silence add \
  alertname="HighCPU" \
  env="production" \
  cluster="us-east-1" \
  --duration="2h" \
  --comment="Planned maintenance window"

Reason 2: Silence Expired (Time Zone Issue)

Symptom: Silence shows as active in the UI but alerts are still firing. Or your silence expired earlier than expected.

AlertManager silences use UTC internally. If you created a silence for "14:00 to 16:00" but your browser is in IST (UTC+5:30), AlertManager may have interpreted it differently.

Fix: Always specify time in UTC or use duration-based silences:

bash

# Duration-based is safest — no timezone ambiguity
amtool silence add alertname="TargetDown" \
  --duration="3h" \
  --comment="Maintenance"
 
# If you need specific times, use UTC explicitly
amtool silence add alertname="TargetDown" \
  --starts-at="2026-06-13T08:30:00Z" \
  --ends-at="2026-06-13T10:30:00Z" \
  --comment="Maintenance UTC"

Reason 3: Regex Matcher Not Working as Expected

AlertManager silences support two matcher types: = (exact match) and =~ (regex match).

If you used the UI and created a regex matcher thinking it would match any alert with "node" in the name, it only matches if the regex is correctly anchored.

Symptom: Your regex silence matches nothing or too little.

Fix: Understand how AlertManager regex works:

yaml

# This matches: "NodeDown", "NodeHighCPU", "NodeMemory"
matchers:
  - name: alertname
    value: "Node.*"
    isRegex: true
 
# WRONG: This does NOT match "NodeDown" — regex is not automatically anchored at end
# so "Node" would match "NodeDown", but "Node.*" is explicit about what follows

Test your regex before applying:

bash

# AlertManager has a test endpoint
curl -X POST http://alertmanager:9093/api/v2/alerts/test \
  -H "Content-Type: application/json" \
  -d '[{"labels": {"alertname": "NodeDown", "severity": "critical"}}]'

Reason 4: Multiple AlertManager Instances Out of Sync

If you're running AlertManager in HA mode (multiple replicas), silences created on one instance may not have propagated to others yet.

Symptom: You see the silence in the AlertManager UI (which may be hitting instance A) but alerts from instance B still fire.

How to verify:

bash

# Check cluster peers
curl http://alertmanager-0:9093/api/v2/status | python3 -m json.tool | grep -A 10 "peers"
curl http://alertmanager-1:9093/api/v2/status | python3 -m json.tool | grep -A 10 "peers"

Fix: AlertManager HA uses a gossip protocol (memberlist). If peers aren't listed, your instances aren't communicating:

yaml

# Correct HA AlertManager config for Kubernetes
alertmanager:
  replicas: 3
  storage:
    volumeClaimTemplate:
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi
  
  # This is critical — tells instances how to find each other
  additionalPeers:
    - alertmanager-0.alertmanager:9094
    - alertmanager-1.alertmanager:9094
    - alertmanager-2.alertmanager:9094

The headless service must expose port 9094 (gossip port), not just 9093 (HTTP port).

Reason 5: Alert Has Already Been Routed Before Silence

AlertManager evaluates silences at the moment an alert arrives. If the alert arrived before you created the silence, it may already be in the notification pipeline.

Symptom: You create a silence but the current alert notification still fires. Future firings of the same alert are silenced.

This is expected behavior. The silence doesn't retroactively cancel notifications that are already queued.

Fix: For immediate suppression, inhibit the alert at the Prometheus level or use amtool to mark the alert as resolved:

bash

# Force an alert to be resolved (use carefully)
amtool --alertmanager.url=http://alertmanager:9093 alert resolve \
  --filter 'alertname=HighCPU,env=production'

Reason 6: Silence Created via API but Missing Required Fields

Programmatic silence creation (via CI/CD, scripts, or Terraform) sometimes fails silently if required fields are missing.

bash

# WRONG: Missing 'matchers' field
curl -X POST http://alertmanager:9093/api/v2/silences \
  -H "Content-Type: application/json" \
  -d '{"startsAt": "2026-06-13T10:00:00Z", "endsAt": "2026-06-13T12:00:00Z", "comment": "test"}'
 
# RIGHT: Include matchers
curl -X POST http://alertmanager:9093/api/v2/silences \
  -H "Content-Type: application/json" \
  -d '{
    "matchers": [
      {"name": "alertname", "value": "HighCPU", "isRegex": false},
      {"name": "env", "value": "production", "isRegex": false}
    ],
    "startsAt": "2026-06-13T10:00:00Z",
    "endsAt": "2026-06-13T12:00:00Z",
    "createdBy": "deployment-automation",
    "comment": "Deploy maintenance window"
  }'

The API returns 200 OK even with missing matchers, creating a silence that matches nothing. Always verify:

bash

# Verify silence was created correctly
curl http://alertmanager:9093/api/v2/silences | python3 -m json.tool | grep -A 20 "your-comment"

Quick Diagnosis Checklist

When your silence isn't working:

Get exact labels from the firing alert: amtool alert query
List active silences: amtool silence query
Compare labels — every label in the silence must match exactly
Check time zones — is your silence actually active right now in UTC?
Check HA sync — hit each AlertManager instance directly and verify the silence exists on all
Check if alert predates silence — current notification may already be in flight

The AlertManager UI has a "Test Routing" feature — use it before your next maintenance window to verify your silence actually matches the alerts you think it does.

Monitor your Kubernetes cluster properly: Prometheus + Grafana monitoring guide

Prometheus AlertManager Silence Not Working? Here's Why (and the Fix)

How AlertManager Silences Work (What Most Engineers Miss)

Reason 1: Label Mismatch

Reason 2: Silence Expired (Time Zone Issue)

Reason 3: Regex Matcher Not Working as Expected

Reason 4: Multiple AlertManager Instances Out of Sync

Reason 5: Alert Has Already Been Routed Before Silence

Reason 6: Silence Created via API but Missing Required Fields

Quick Diagnosis Checklist

Stay ahead of the curve

Related Articles

Datadog Agent Not Sending Metrics — Diagnosis and Fix Guide

Grafana Dashboard Panels Not Loading or Showing No Data Fix

Kafka Consumer Lag Keeps Growing — How to Diagnose and Fix It

Comments