Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)
Your Prometheus /targets page shows red. Services are running but Prometheus can't scrape them. Here's every reason this happens — wrong port, NetworkPolicy blocks, ServiceMonitor label mismatch, auth — and exactly how to fix each one.
You open Prometheus at /targets and see red — targets marked as DOWN. Your pods are running, your app looks fine, but Prometheus can't scrape metrics.
Here's every cause and the exact fix.
Step Zero: Read the Error Message
Go to http://prometheus:9090/targets. Every DOWN target shows an error column. Read it — it tells you 90% of what you need to know before debugging anything else.
Common errors you'll see:
connection refused
context deadline exceeded
401 Unauthorized
x509: certificate signed by unknown authority
dial tcp: no such host
Each one maps to a specific cause below.
Case 1: Wrong Port or Path in Scrape Config
Your app exposes metrics on :8080/metrics but the scrape config points to :9090 or /.
Error: connection refused or 404
Check it:
# Test the actual metrics endpoint directly from inside the cluster
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://my-app.production.svc.cluster.local:8080/metrics | head -5Fix in prometheus.yml:
scrape_configs:
- job_name: 'my-app'
static_configs:
- targets: ['my-app-service.production.svc.cluster.local:8080']
metrics_path: '/metrics'Fix in ServiceMonitor (Prometheus Operator):
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: monitoring
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- production
endpoints:
- port: metrics # must match the port NAME in your Service
path: /metrics
interval: 30sAnd in your Service, name the port:
ports:
- name: metrics # this name must match ServiceMonitor.endpoints.port
port: 8080
targetPort: 8080Case 2: NetworkPolicy Blocking Prometheus
Prometheus lives in the monitoring namespace. Your app is in production. A NetworkPolicy is blocking ingress from monitoring to your pod.
Error: context deadline exceeded (timeout — silent drop)
Check it:
# Test directly from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
wget --timeout=5 -qO- http://my-app.production.svc.cluster.local:8080/metrics
# Check if NetworkPolicy exists
kubectl get networkpolicies -n productionFix — allow Prometheus to scrape:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-prometheus-scrape
namespace: production
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- protocol: TCP
port: 8080Case 3: ServiceMonitor Label Mismatch (Prometheus Operator)
Prometheus Operator uses label selectors to discover ServiceMonitors. If your ServiceMonitor doesn't have the right label, Prometheus never picks it up — the target simply doesn't appear in /targets at all.
Check it:
# What label selector does your Prometheus resource use?
kubectl get prometheus -n monitoring -o yaml | grep -A 5 serviceMonitorSelector
# serviceMonitorSelector:
# matchLabels:
# release: prometheus ← your ServiceMonitor needs this label
# Does your ServiceMonitor have it?
kubectl get servicemonitor my-app -n monitoring --show-labelsFix — add the label:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: monitoring
labels:
release: prometheus # must match prometheus.serviceMonitorSelectorAlso check serviceMonitorNamespaceSelector — by default Prometheus Operator only discovers ServiceMonitors in its own namespace:
# In your Prometheus resource — allow all namespaces
spec:
serviceMonitorNamespaceSelector: {} # empty = watch all namespaces
serviceMonitorSelector:
matchLabels:
release: prometheusCase 4: Basic Auth or TLS on the Metrics Endpoint
Your app requires authentication to access /metrics, but Prometheus has no credentials configured.
Error: 401 Unauthorized or x509: certificate signed by unknown authority
Fix for basic auth:
# Create a secret with credentials
kubectl create secret generic metrics-auth \
--from-literal=username=prometheus \
--from-literal=password=strongpassword \
-n monitoring# Reference in ServiceMonitor
spec:
endpoints:
- port: metrics
basicAuth:
username:
name: metrics-auth
key: username
password:
name: metrics-auth
key: passwordFix for self-signed TLS (internal cluster):
spec:
endpoints:
- port: metrics
scheme: https
tlsConfig:
insecureSkipVerify: true # acceptable for internal cluster trafficCase 5: No Endpoints — Pod Not Ready
Prometheus resolves a Service to its endpoints. If no pods pass the readinessProbe, the Service has zero endpoints — nothing to scrape.
Error: connection refused or target appears with 0/0 endpoints
Check it:
kubectl get endpoints my-app-service -n production
# NAME ENDPOINTS AGE
# my-app-service <none> 10m ← zero endpoints
# Why? Check pod readiness
kubectl get pods -n production -l app=my-app
kubectl describe pod my-app-abc -n production | grep -A 5 "Readiness\|Ready"Fix the readinessProbe failure — Prometheus can't scrape what has no endpoints.
Case 6: DNS Resolution Failure
Error: dial tcp: lookup my-app.production.svc.cluster.local on 10.96.0.10:53: no such host
# Test DNS from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
nslookup my-app.production.svc.cluster.local
# Does the Service actually exist?
kubectl get svc -n production my-app-serviceCommon causes: typo in service name in scrape config, Service was deleted, wrong namespace in target.
Case 7: Scrape Timeout
Prometheus scrapes but the /metrics endpoint takes too long to respond (default timeout: 10s).
Error: context deadline exceeded
Check it:
time curl http://my-app:8080/metricsFix — increase scrapeTimeout:
spec:
endpoints:
- port: metrics
interval: 60s
scrapeTimeout: 30s # must be < intervalLong-term fix: don't compute metrics on every scrape. Use a metrics registry that caches collected values and serves them instantly.
Case 8: Node Exporter DOWN on Some Nodes
Node Exporter runs as a DaemonSet on every node via hostNetwork: true. Some nodes show DOWN.
Check it:
# Get the node IP from the target label in Prometheus UI
# Then test from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://NODE_IP:9100/metrics | head -3Fix (AWS): The EC2 security group for worker nodes must allow TCP 9100 inbound from the Prometheus pod subnet or security group. Node Exporter binds to the node's real IP — Security Groups must explicitly allow it.
Quick Debug Flowchart
Target is DOWN
│
▼
Read error in /targets page
│
├── "connection refused" → wrong port/path
├── "context deadline" → NetworkPolicy or slow endpoint
├── "401 Unauthorized" → add basicAuth config
├── "no such host" → DNS issue or wrong service name
├── "x509 certificate" → TLS config missing
└── target not in list at all → ServiceMonitor label mismatch
Useful Debug Commands
# Check if Prometheus loaded config correctly
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://localhost:9090/api/v1/status/config | python3 -m json.tool
# Reload config without restarting
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- --post-data='' http://localhost:9090/-/reload
# Check Prometheus logs
kubectl logs -n monitoring prometheus-0 --tail=50 | grep -iE "error|failed|scrape"
# List all targets and their health via API
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- 'http://localhost:9090/api/v1/targets?state=active' \
| python3 -c "import sys,json; [print(t['labels'].get('job'), t['health'], t.get('lastError','')) for t in json.load(sys.stdin)['data']['activeTargets']]"| Error | Cause | Fix |
|---|---|---|
| connection refused | Wrong port/path | Fix scrape config or ServiceMonitor port name |
| context deadline exceeded | NetworkPolicy or slow app | Allow monitoring namespace ingress |
| Target missing entirely | ServiceMonitor label mismatch | Add release: prometheus label |
| 401 Unauthorized | Auth required | Add basicAuth to ServiceMonitor |
| no such host | Wrong service name/DNS | Fix target hostname in config |
| Node exporter down | EC2 SG blocks port 9100 | Open port 9100 in security group |
Related: Prometheus + Grafana Monitoring Guide | How to Set Up Prometheus Alertmanager
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Prometheus High Cardinality Causing OOM — How to Find and Fix It (2026)
Prometheus is crashing with OOMKilled or running out of memory. The culprit is almost always high cardinality metrics — labels with thousands of unique values. Here's how to find which metrics are killing your Prometheus and exactly how to fix it.
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build an AI Alert Classifier for Grafana Using LLMs (2026)
Tired of noisy Grafana alerts that wake you up for nothing? Build an AI layer that classifies incoming alerts as actionable or noise, enriches them with context, and routes them intelligently — using Claude or GPT-4 as the reasoning engine.