Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)

Your Prometheus /targets page shows red. Services are running but Prometheus can't scrape them. Here's every reason this happens — wrong port, NetworkPolicy blocks, ServiceMonitor label mismatch, auth — and exactly how to fix each one.

You open Prometheus at /targets and see red — targets marked as DOWN. Your pods are running, your app looks fine, but Prometheus can't scrape metrics.

Here's every cause and the exact fix.

Step Zero: Read the Error Message

Go to http://prometheus:9090/targets. Every DOWN target shows an error column. Read it — it tells you 90% of what you need to know before debugging anything else.

Common errors you'll see:

connection refused
context deadline exceeded
401 Unauthorized
x509: certificate signed by unknown authority
dial tcp: no such host

Each one maps to a specific cause below.

Case 1: Wrong Port or Path in Scrape Config

Your app exposes metrics on :8080/metrics but the scrape config points to :9090 or /.

Error: connection refused or 404

Check it:

bash

# Test the actual metrics endpoint directly from inside the cluster
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://my-app.production.svc.cluster.local:8080/metrics | head -5

Fix in prometheus.yml:

yaml

scrape_configs:
  - job_name: 'my-app'
    static_configs:
      - targets: ['my-app-service.production.svc.cluster.local:8080']
    metrics_path: '/metrics'

Fix in ServiceMonitor (Prometheus Operator):

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: metrics        # must match the port NAME in your Service
      path: /metrics
      interval: 30s

And in your Service, name the port:

yaml

ports:
  - name: metrics          # this name must match ServiceMonitor.endpoints.port
    port: 8080
    targetPort: 8080

Case 2: NetworkPolicy Blocking Prometheus

Prometheus lives in the monitoring namespace. Your app is in production. A NetworkPolicy is blocking ingress from monitoring to your pod.

Error: context deadline exceeded (timeout — silent drop)

Check it:

bash

# Test directly from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  wget --timeout=5 -qO- http://my-app.production.svc.cluster.local:8080/metrics
 
# Check if NetworkPolicy exists
kubectl get networkpolicies -n production

Fix — allow Prometheus to scrape:

yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - protocol: TCP
          port: 8080

Case 3: ServiceMonitor Label Mismatch (Prometheus Operator)

Prometheus Operator uses label selectors to discover ServiceMonitors. If your ServiceMonitor doesn't have the right label, Prometheus never picks it up — the target simply doesn't appear in /targets at all.

Check it:

bash

# What label selector does your Prometheus resource use?
kubectl get prometheus -n monitoring -o yaml | grep -A 5 serviceMonitorSelector
# serviceMonitorSelector:
#   matchLabels:
#     release: prometheus   ← your ServiceMonitor needs this label
 
# Does your ServiceMonitor have it?
kubectl get servicemonitor my-app -n monitoring --show-labels

Fix — add the label:

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: monitoring
  labels:
    release: prometheus    # must match prometheus.serviceMonitorSelector

Also check serviceMonitorNamespaceSelector — by default Prometheus Operator only discovers ServiceMonitors in its own namespace:

yaml

# In your Prometheus resource — allow all namespaces
spec:
  serviceMonitorNamespaceSelector: {}   # empty = watch all namespaces
  serviceMonitorSelector:
    matchLabels:
      release: prometheus

Case 4: Basic Auth or TLS on the Metrics Endpoint

Your app requires authentication to access /metrics, but Prometheus has no credentials configured.

Error: 401 Unauthorized or x509: certificate signed by unknown authority

Fix for basic auth:

bash

# Create a secret with credentials
kubectl create secret generic metrics-auth \
  --from-literal=username=prometheus \
  --from-literal=password=strongpassword \
  -n monitoring

yaml

# Reference in ServiceMonitor
spec:
  endpoints:
    - port: metrics
      basicAuth:
        username:
          name: metrics-auth
          key: username
        password:
          name: metrics-auth
          key: password

Fix for self-signed TLS (internal cluster):

yaml

spec:
  endpoints:
    - port: metrics
      scheme: https
      tlsConfig:
        insecureSkipVerify: true   # acceptable for internal cluster traffic

Case 5: No Endpoints — Pod Not Ready

Prometheus resolves a Service to its endpoints. If no pods pass the readinessProbe, the Service has zero endpoints — nothing to scrape.

Error: connection refused or target appears with 0/0 endpoints

Check it:

bash

kubectl get endpoints my-app-service -n production
# NAME             ENDPOINTS   AGE
# my-app-service   <none>      10m   ← zero endpoints
 
# Why? Check pod readiness
kubectl get pods -n production -l app=my-app
kubectl describe pod my-app-abc -n production | grep -A 5 "Readiness\|Ready"

Fix the readinessProbe failure — Prometheus can't scrape what has no endpoints.

Case 6: DNS Resolution Failure

Error: dial tcp: lookup my-app.production.svc.cluster.local on 10.96.0.10:53: no such host

bash

# Test DNS from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  nslookup my-app.production.svc.cluster.local
 
# Does the Service actually exist?
kubectl get svc -n production my-app-service

Common causes: typo in service name in scrape config, Service was deleted, wrong namespace in target.

Case 7: Scrape Timeout

Prometheus scrapes but the /metrics endpoint takes too long to respond (default timeout: 10s).

Error: context deadline exceeded

Check it:

bash

time curl http://my-app:8080/metrics

Fix — increase scrapeTimeout:

yaml

spec:
  endpoints:
    - port: metrics
      interval: 60s
      scrapeTimeout: 30s   # must be < interval

Long-term fix: don't compute metrics on every scrape. Use a metrics registry that caches collected values and serves them instantly.

Case 8: Node Exporter DOWN on Some Nodes

Node Exporter runs as a DaemonSet on every node via hostNetwork: true. Some nodes show DOWN.

Check it:

bash

# Get the node IP from the target label in Prometheus UI
# Then test from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://NODE_IP:9100/metrics | head -3

Fix (AWS): The EC2 security group for worker nodes must allow TCP 9100 inbound from the Prometheus pod subnet or security group. Node Exporter binds to the node's real IP — Security Groups must explicitly allow it.

Quick Debug Flowchart

Target is DOWN
      │
      ▼
Read error in /targets page
      │
      ├── "connection refused"      → wrong port/path
      ├── "context deadline"        → NetworkPolicy or slow endpoint  
      ├── "401 Unauthorized"        → add basicAuth config
      ├── "no such host"            → DNS issue or wrong service name
      ├── "x509 certificate"        → TLS config missing
      └── target not in list at all → ServiceMonitor label mismatch

Useful Debug Commands

bash

# Check if Prometheus loaded config correctly
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://localhost:9090/api/v1/status/config | python3 -m json.tool
 
# Reload config without restarting
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- --post-data='' http://localhost:9090/-/reload
 
# Check Prometheus logs
kubectl logs -n monitoring prometheus-0 --tail=50 | grep -iE "error|failed|scrape"
 
# List all targets and their health via API
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- 'http://localhost:9090/api/v1/targets?state=active' \
  | python3 -c "import sys,json; [print(t['labels'].get('job'), t['health'], t.get('lastError','')) for t in json.load(sys.stdin)['data']['activeTargets']]"

Error	Cause	Fix
connection refused	Wrong port/path	Fix scrape config or ServiceMonitor port name
context deadline exceeded	NetworkPolicy or slow app	Allow monitoring namespace ingress
Target missing entirely	ServiceMonitor label mismatch	Add `release: prometheus` label
401 Unauthorized	Auth required	Add basicAuth to ServiceMonitor
no such host	Wrong service name/DNS	Fix target hostname in config
Node exporter down	EC2 SG blocks port 9100	Open port 9100 in security group

Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)

Step Zero: Read the Error Message

Case 1: Wrong Port or Path in Scrape Config

Case 2: NetworkPolicy Blocking Prometheus

Case 3: ServiceMonitor Label Mismatch (Prometheus Operator)

Case 4: Basic Auth or TLS on the Metrics Endpoint

Case 5: No Endpoints — Pod Not Ready

Case 6: DNS Resolution Failure

Case 7: Scrape Timeout

Case 8: Node Exporter DOWN on Some Nodes

Quick Debug Flowchart

Useful Debug Commands

Stay ahead of the curve

Related Articles

Grafana Dashboard Panels Not Loading or Showing No Data Fix

Prometheus High Cardinality Causing OOM — How to Find and Fix It (2026)

Prometheus Scrape Target Down — Fix

Comments