Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)
Your Prometheus /targets page shows red. Services are running but Prometheus can't scrape them. Here's every reason this happens — wrong port, NetworkPolicy blocks, ServiceMonitor label mismatch, auth — and exactly how to fix each one.
You open Prometheus at /targets and see red — targets marked as DOWN. Your pods are running, your app looks fine, but Prometheus can't scrape metrics.
Here's every cause and the exact fix.
Step Zero: Read the Error Message
Go to http://prometheus:9090/targets. Every DOWN target shows an error column. Read it — it tells you 90% of what you need to know before debugging anything else.
Common errors you'll see:
connection refused
context deadline exceeded
401 Unauthorized
x509: certificate signed by unknown authority
dial tcp: no such host
Each one maps to a specific cause below.
Case 1: Wrong Port or Path in Scrape Config
Your app exposes metrics on :8080/metrics but the scrape config points to :9090 or /.
Error: connection refused or 404
Check it:
# Test the actual metrics endpoint directly from inside the cluster
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://my-app.production.svc.cluster.local:8080/metrics | head -5Fix in prometheus.yml:
scrape_configs:
- job_name: 'my-app'
static_configs:
- targets: ['my-app-service.production.svc.cluster.local:8080']
metrics_path: '/metrics'Fix in ServiceMonitor (Prometheus Operator):
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: monitoring
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- production
endpoints:
- port: metrics # must match the port NAME in your Service
path: /metrics
interval: 30sAnd in your Service, name the port:
ports:
- name: metrics # this name must match ServiceMonitor.endpoints.port
port: 8080
targetPort: 8080Case 2: NetworkPolicy Blocking Prometheus
Prometheus lives in the monitoring namespace. Your app is in production. A NetworkPolicy is blocking ingress from monitoring to your pod.
Error: context deadline exceeded (timeout — silent drop)
Check it:
# Test directly from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
wget --timeout=5 -qO- http://my-app.production.svc.cluster.local:8080/metrics
# Check if NetworkPolicy exists
kubectl get networkpolicies -n productionFix — allow Prometheus to scrape:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-prometheus-scrape
namespace: production
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- protocol: TCP
port: 8080Case 3: ServiceMonitor Label Mismatch (Prometheus Operator)
Prometheus Operator uses label selectors to discover ServiceMonitors. If your ServiceMonitor doesn't have the right label, Prometheus never picks it up — the target simply doesn't appear in /targets at all.
Check it:
# What label selector does your Prometheus resource use?
kubectl get prometheus -n monitoring -o yaml | grep -A 5 serviceMonitorSelector
# serviceMonitorSelector:
# matchLabels:
# release: prometheus ← your ServiceMonitor needs this label
# Does your ServiceMonitor have it?
kubectl get servicemonitor my-app -n monitoring --show-labelsFix — add the label:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: monitoring
labels:
release: prometheus # must match prometheus.serviceMonitorSelectorAlso check serviceMonitorNamespaceSelector — by default Prometheus Operator only discovers ServiceMonitors in its own namespace:
# In your Prometheus resource — allow all namespaces
spec:
serviceMonitorNamespaceSelector: {} # empty = watch all namespaces
serviceMonitorSelector:
matchLabels:
release: prometheusCase 4: Basic Auth or TLS on the Metrics Endpoint
Your app requires authentication to access /metrics, but Prometheus has no credentials configured.
Error: 401 Unauthorized or x509: certificate signed by unknown authority
Fix for basic auth:
# Create a secret with credentials
kubectl create secret generic metrics-auth \
--from-literal=username=prometheus \
--from-literal=password=strongpassword \
-n monitoring# Reference in ServiceMonitor
spec:
endpoints:
- port: metrics
basicAuth:
username:
name: metrics-auth
key: username
password:
name: metrics-auth
key: passwordFix for self-signed TLS (internal cluster):
spec:
endpoints:
- port: metrics
scheme: https
tlsConfig:
insecureSkipVerify: true # acceptable for internal cluster trafficCase 5: No Endpoints — Pod Not Ready
Prometheus resolves a Service to its endpoints. If no pods pass the readinessProbe, the Service has zero endpoints — nothing to scrape.
Error: connection refused or target appears with 0/0 endpoints
Check it:
kubectl get endpoints my-app-service -n production
# NAME ENDPOINTS AGE
# my-app-service <none> 10m ← zero endpoints
# Why? Check pod readiness
kubectl get pods -n production -l app=my-app
kubectl describe pod my-app-abc -n production | grep -A 5 "Readiness\|Ready"Fix the readinessProbe failure — Prometheus can't scrape what has no endpoints.
Case 6: DNS Resolution Failure
Error: dial tcp: lookup my-app.production.svc.cluster.local on 10.96.0.10:53: no such host
# Test DNS from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
nslookup my-app.production.svc.cluster.local
# Does the Service actually exist?
kubectl get svc -n production my-app-serviceCommon causes: typo in service name in scrape config, Service was deleted, wrong namespace in target.
Case 7: Scrape Timeout
Prometheus scrapes but the /metrics endpoint takes too long to respond (default timeout: 10s).
Error: context deadline exceeded
Check it:
time curl http://my-app:8080/metricsFix — increase scrapeTimeout:
spec:
endpoints:
- port: metrics
interval: 60s
scrapeTimeout: 30s # must be < intervalLong-term fix: don't compute metrics on every scrape. Use a metrics registry that caches collected values and serves them instantly.
Case 8: Node Exporter DOWN on Some Nodes
Node Exporter runs as a DaemonSet on every node via hostNetwork: true. Some nodes show DOWN.
Check it:
# Get the node IP from the target label in Prometheus UI
# Then test from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://NODE_IP:9100/metrics | head -3Fix (AWS): The EC2 security group for worker nodes must allow TCP 9100 inbound from the Prometheus pod subnet or security group. Node Exporter binds to the node's real IP — Security Groups must explicitly allow it.
Quick Debug Flowchart
Target is DOWN
│
▼
Read error in /targets page
│
├── "connection refused" → wrong port/path
├── "context deadline" → NetworkPolicy or slow endpoint
├── "401 Unauthorized" → add basicAuth config
├── "no such host" → DNS issue or wrong service name
├── "x509 certificate" → TLS config missing
└── target not in list at all → ServiceMonitor label mismatch
Useful Debug Commands
# Check if Prometheus loaded config correctly
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- http://localhost:9090/api/v1/status/config | python3 -m json.tool
# Reload config without restarting
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- --post-data='' http://localhost:9090/-/reload
# Check Prometheus logs
kubectl logs -n monitoring prometheus-0 --tail=50 | grep -iE "error|failed|scrape"
# List all targets and their health via API
kubectl exec -n monitoring prometheus-0 -- \
wget -qO- 'http://localhost:9090/api/v1/targets?state=active' \
| python3 -c "import sys,json; [print(t['labels'].get('job'), t['health'], t.get('lastError','')) for t in json.load(sys.stdin)['data']['activeTargets']]"| Error | Cause | Fix |
|---|---|---|
| connection refused | Wrong port/path | Fix scrape config or ServiceMonitor port name |
| context deadline exceeded | NetworkPolicy or slow app | Allow monitoring namespace ingress |
| Target missing entirely | ServiceMonitor label mismatch | Add release: prometheus label |
| 401 Unauthorized | Auth required | Add basicAuth to ServiceMonitor |
| no such host | Wrong service name/DNS | Fix target hostname in config |
| Node exporter down | EC2 SG blocks port 9100 | Open port 9100 in security group |
Related: Prometheus + Grafana Monitoring Guide | How to Set Up Prometheus Alertmanager
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build a Complete Kubernetes Monitoring Stack from Scratch (2026)
Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.
Grafana Loki: The Complete Log Aggregation Guide for DevOps Engineers (2026)
Grafana Loki is the Prometheus-inspired log aggregation system built for Kubernetes. This guide covers architecture, installation, LogQL queries, and production best practices.