Prometheus AlertManager Silence Not Working? Here's Why (and the Fix)
You created a silence in AlertManager but alerts are still firing. Here are the 6 most common reasons silences fail and exactly how to fix each one.
You're in the middle of a planned maintenance window. You created an AlertManager silence to suppress alerts for the next 2 hours. Ten minutes later, Slack is still getting paged. PagerDuty is firing. Your phone won't stop buzzing.
This is maddening. Here's what's actually going wrong.
How AlertManager Silences Work (What Most Engineers Miss)
A silence in AlertManager works by matching labels. When an alert fires, AlertManager checks: do this alert's labels match any active silence? If yes, the alert is silenced. If no, it fires.
The critical point: a silence doesn't silence an alerting rule. It silences specific label combinations.
If your alert has the label severity="critical" and your silence only matches severity="warning", the alert will still fire. Exact match. No partial match, no wildcard by default.
This single misunderstanding causes 80% of silence failures.
Reason 1: Label Mismatch
Symptom: You created a silence but alerts for the same situation keep firing.
How to diagnose:
# Check what labels your firing alert actually has
curl http://alertmanager:9093/api/v2/alerts | python3 -m json.tool | grep -A 20 "your-alert-name"Compare those labels against your silence configuration:
# List active silences and their matchers
curl http://alertmanager:9093/api/v2/silences | python3 -m json.toolCommon mistake: Your alert fires with env="production" but your silence has env="prod". Different strings. No match.
Fix: Get the exact label values from the firing alert first, then create the silence:
# See the exact alert with all labels
amtool --alertmanager.url=http://alertmanager:9093 alert query alertname="HighCPU"Then create the silence matching exactly those labels:
amtool --alertmanager.url=http://alertmanager:9093 silence add \
alertname="HighCPU" \
env="production" \
cluster="us-east-1" \
--duration="2h" \
--comment="Planned maintenance window"Reason 2: Silence Expired (Time Zone Issue)
Symptom: Silence shows as active in the UI but alerts are still firing. Or your silence expired earlier than expected.
AlertManager silences use UTC internally. If you created a silence for "14:00 to 16:00" but your browser is in IST (UTC+5:30), AlertManager may have interpreted it differently.
Fix: Always specify time in UTC or use duration-based silences:
# Duration-based is safest — no timezone ambiguity
amtool silence add alertname="TargetDown" \
--duration="3h" \
--comment="Maintenance"
# If you need specific times, use UTC explicitly
amtool silence add alertname="TargetDown" \
--starts-at="2026-06-13T08:30:00Z" \
--ends-at="2026-06-13T10:30:00Z" \
--comment="Maintenance UTC"Reason 3: Regex Matcher Not Working as Expected
AlertManager silences support two matcher types: = (exact match) and =~ (regex match).
If you used the UI and created a regex matcher thinking it would match any alert with "node" in the name, it only matches if the regex is correctly anchored.
Symptom: Your regex silence matches nothing or too little.
Fix: Understand how AlertManager regex works:
# This matches: "NodeDown", "NodeHighCPU", "NodeMemory"
matchers:
- name: alertname
value: "Node.*"
isRegex: true
# WRONG: This does NOT match "NodeDown" — regex is not automatically anchored at end
# so "Node" would match "NodeDown", but "Node.*" is explicit about what followsTest your regex before applying:
# AlertManager has a test endpoint
curl -X POST http://alertmanager:9093/api/v2/alerts/test \
-H "Content-Type: application/json" \
-d '[{"labels": {"alertname": "NodeDown", "severity": "critical"}}]'Reason 4: Multiple AlertManager Instances Out of Sync
If you're running AlertManager in HA mode (multiple replicas), silences created on one instance may not have propagated to others yet.
Symptom: You see the silence in the AlertManager UI (which may be hitting instance A) but alerts from instance B still fire.
How to verify:
# Check cluster peers
curl http://alertmanager-0:9093/api/v2/status | python3 -m json.tool | grep -A 10 "peers"
curl http://alertmanager-1:9093/api/v2/status | python3 -m json.tool | grep -A 10 "peers"Fix: AlertManager HA uses a gossip protocol (memberlist). If peers aren't listed, your instances aren't communicating:
# Correct HA AlertManager config for Kubernetes
alertmanager:
replicas: 3
storage:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
# This is critical — tells instances how to find each other
additionalPeers:
- alertmanager-0.alertmanager:9094
- alertmanager-1.alertmanager:9094
- alertmanager-2.alertmanager:9094The headless service must expose port 9094 (gossip port), not just 9093 (HTTP port).
Reason 5: Alert Has Already Been Routed Before Silence
AlertManager evaluates silences at the moment an alert arrives. If the alert arrived before you created the silence, it may already be in the notification pipeline.
Symptom: You create a silence but the current alert notification still fires. Future firings of the same alert are silenced.
This is expected behavior. The silence doesn't retroactively cancel notifications that are already queued.
Fix: For immediate suppression, inhibit the alert at the Prometheus level or use amtool to mark the alert as resolved:
# Force an alert to be resolved (use carefully)
amtool --alertmanager.url=http://alertmanager:9093 alert resolve \
--filter 'alertname=HighCPU,env=production'Reason 6: Silence Created via API but Missing Required Fields
Programmatic silence creation (via CI/CD, scripts, or Terraform) sometimes fails silently if required fields are missing.
# WRONG: Missing 'matchers' field
curl -X POST http://alertmanager:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{"startsAt": "2026-06-13T10:00:00Z", "endsAt": "2026-06-13T12:00:00Z", "comment": "test"}'
# RIGHT: Include matchers
curl -X POST http://alertmanager:9093/api/v2/silences \
-H "Content-Type: application/json" \
-d '{
"matchers": [
{"name": "alertname", "value": "HighCPU", "isRegex": false},
{"name": "env", "value": "production", "isRegex": false}
],
"startsAt": "2026-06-13T10:00:00Z",
"endsAt": "2026-06-13T12:00:00Z",
"createdBy": "deployment-automation",
"comment": "Deploy maintenance window"
}'The API returns 200 OK even with missing matchers, creating a silence that matches nothing. Always verify:
# Verify silence was created correctly
curl http://alertmanager:9093/api/v2/silences | python3 -m json.tool | grep -A 20 "your-comment"Quick Diagnosis Checklist
When your silence isn't working:
- Get exact labels from the firing alert:
amtool alert query - List active silences:
amtool silence query - Compare labels — every label in the silence must match exactly
- Check time zones — is your silence actually active right now in UTC?
- Check HA sync — hit each AlertManager instance directly and verify the silence exists on all
- Check if alert predates silence — current notification may already be in flight
The AlertManager UI has a "Test Routing" feature — use it before your next maintenance window to verify your silence actually matches the alerts you think it does.
Monitor your Kubernetes cluster properly: Prometheus + Grafana monitoring guide
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Datadog Agent Not Sending Metrics — Diagnosis and Fix Guide
Datadog dashboards show no data, hosts appear offline, or custom metrics aren't showing up. Here's how to systematically diagnose and fix Datadog agent issues on Kubernetes and VMs.
Prometheus High Cardinality Causing OOM — How to Find and Fix It (2026)
Prometheus is crashing with OOMKilled or running out of memory. The culprit is almost always high cardinality metrics — labels with thousands of unique values. Here's how to find which metrics are killing your Prometheus and exactly how to fix it.
Prometheus Scrape Target Down — Fix
Prometheus shows your target as DOWN in the Targets page. Here's every reason a scrape target goes down and exactly how to debug and fix each one.