All Articles

Prometheus Targets Showing 'Down' — Every Cause and Fix (2026)

Your Prometheus /targets page shows red. Services are running but Prometheus can't scrape them. Here's every reason this happens — wrong port, NetworkPolicy blocks, ServiceMonitor label mismatch, auth — and exactly how to fix each one.

DevOpsBoysApr 10, 20265 min read
Share:Tweet

You open Prometheus at /targets and see red — targets marked as DOWN. Your pods are running, your app looks fine, but Prometheus can't scrape metrics.

Here's every cause and the exact fix.


Step Zero: Read the Error Message

Go to http://prometheus:9090/targets. Every DOWN target shows an error column. Read it — it tells you 90% of what you need to know before debugging anything else.

Common errors you'll see:

connection refused
context deadline exceeded
401 Unauthorized
x509: certificate signed by unknown authority
dial tcp: no such host

Each one maps to a specific cause below.


Case 1: Wrong Port or Path in Scrape Config

Your app exposes metrics on :8080/metrics but the scrape config points to :9090 or /.

Error: connection refused or 404

Check it:

bash
# Test the actual metrics endpoint directly from inside the cluster
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://my-app.production.svc.cluster.local:8080/metrics | head -5

Fix in prometheus.yml:

yaml
scrape_configs:
  - job_name: 'my-app'
    static_configs:
      - targets: ['my-app-service.production.svc.cluster.local:8080']
    metrics_path: '/metrics'

Fix in ServiceMonitor (Prometheus Operator):

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: metrics        # must match the port NAME in your Service
      path: /metrics
      interval: 30s

And in your Service, name the port:

yaml
ports:
  - name: metrics          # this name must match ServiceMonitor.endpoints.port
    port: 8080
    targetPort: 8080

Case 2: NetworkPolicy Blocking Prometheus

Prometheus lives in the monitoring namespace. Your app is in production. A NetworkPolicy is blocking ingress from monitoring to your pod.

Error: context deadline exceeded (timeout — silent drop)

Check it:

bash
# Test directly from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  wget --timeout=5 -qO- http://my-app.production.svc.cluster.local:8080/metrics
 
# Check if NetworkPolicy exists
kubectl get networkpolicies -n production

Fix — allow Prometheus to scrape:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - protocol: TCP
          port: 8080

Case 3: ServiceMonitor Label Mismatch (Prometheus Operator)

Prometheus Operator uses label selectors to discover ServiceMonitors. If your ServiceMonitor doesn't have the right label, Prometheus never picks it up — the target simply doesn't appear in /targets at all.

Check it:

bash
# What label selector does your Prometheus resource use?
kubectl get prometheus -n monitoring -o yaml | grep -A 5 serviceMonitorSelector
# serviceMonitorSelector:
#   matchLabels:
#     release: prometheus   ← your ServiceMonitor needs this label
 
# Does your ServiceMonitor have it?
kubectl get servicemonitor my-app -n monitoring --show-labels

Fix — add the label:

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: monitoring
  labels:
    release: prometheus    # must match prometheus.serviceMonitorSelector

Also check serviceMonitorNamespaceSelector — by default Prometheus Operator only discovers ServiceMonitors in its own namespace:

yaml
# In your Prometheus resource — allow all namespaces
spec:
  serviceMonitorNamespaceSelector: {}   # empty = watch all namespaces
  serviceMonitorSelector:
    matchLabels:
      release: prometheus

Case 4: Basic Auth or TLS on the Metrics Endpoint

Your app requires authentication to access /metrics, but Prometheus has no credentials configured.

Error: 401 Unauthorized or x509: certificate signed by unknown authority

Fix for basic auth:

bash
# Create a secret with credentials
kubectl create secret generic metrics-auth \
  --from-literal=username=prometheus \
  --from-literal=password=strongpassword \
  -n monitoring
yaml
# Reference in ServiceMonitor
spec:
  endpoints:
    - port: metrics
      basicAuth:
        username:
          name: metrics-auth
          key: username
        password:
          name: metrics-auth
          key: password

Fix for self-signed TLS (internal cluster):

yaml
spec:
  endpoints:
    - port: metrics
      scheme: https
      tlsConfig:
        insecureSkipVerify: true   # acceptable for internal cluster traffic

Case 5: No Endpoints — Pod Not Ready

Prometheus resolves a Service to its endpoints. If no pods pass the readinessProbe, the Service has zero endpoints — nothing to scrape.

Error: connection refused or target appears with 0/0 endpoints

Check it:

bash
kubectl get endpoints my-app-service -n production
# NAME             ENDPOINTS   AGE
# my-app-service   <none>      10m   ← zero endpoints
 
# Why? Check pod readiness
kubectl get pods -n production -l app=my-app
kubectl describe pod my-app-abc -n production | grep -A 5 "Readiness\|Ready"

Fix the readinessProbe failure — Prometheus can't scrape what has no endpoints.


Case 6: DNS Resolution Failure

Error: dial tcp: lookup my-app.production.svc.cluster.local on 10.96.0.10:53: no such host

bash
# Test DNS from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  nslookup my-app.production.svc.cluster.local
 
# Does the Service actually exist?
kubectl get svc -n production my-app-service

Common causes: typo in service name in scrape config, Service was deleted, wrong namespace in target.


Case 7: Scrape Timeout

Prometheus scrapes but the /metrics endpoint takes too long to respond (default timeout: 10s).

Error: context deadline exceeded

Check it:

bash
time curl http://my-app:8080/metrics

Fix — increase scrapeTimeout:

yaml
spec:
  endpoints:
    - port: metrics
      interval: 60s
      scrapeTimeout: 30s   # must be < interval

Long-term fix: don't compute metrics on every scrape. Use a metrics registry that caches collected values and serves them instantly.


Case 8: Node Exporter DOWN on Some Nodes

Node Exporter runs as a DaemonSet on every node via hostNetwork: true. Some nodes show DOWN.

Check it:

bash
# Get the node IP from the target label in Prometheus UI
# Then test from Prometheus pod
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://NODE_IP:9100/metrics | head -3

Fix (AWS): The EC2 security group for worker nodes must allow TCP 9100 inbound from the Prometheus pod subnet or security group. Node Exporter binds to the node's real IP — Security Groups must explicitly allow it.


Quick Debug Flowchart

Target is DOWN
      │
      ▼
Read error in /targets page
      │
      ├── "connection refused"      → wrong port/path
      ├── "context deadline"        → NetworkPolicy or slow endpoint  
      ├── "401 Unauthorized"        → add basicAuth config
      ├── "no such host"            → DNS issue or wrong service name
      ├── "x509 certificate"        → TLS config missing
      └── target not in list at all → ServiceMonitor label mismatch

Useful Debug Commands

bash
# Check if Prometheus loaded config correctly
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- http://localhost:9090/api/v1/status/config | python3 -m json.tool
 
# Reload config without restarting
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- --post-data='' http://localhost:9090/-/reload
 
# Check Prometheus logs
kubectl logs -n monitoring prometheus-0 --tail=50 | grep -iE "error|failed|scrape"
 
# List all targets and their health via API
kubectl exec -n monitoring prometheus-0 -- \
  wget -qO- 'http://localhost:9090/api/v1/targets?state=active' \
  | python3 -c "import sys,json; [print(t['labels'].get('job'), t['health'], t.get('lastError','')) for t in json.load(sys.stdin)['data']['activeTargets']]"

ErrorCauseFix
connection refusedWrong port/pathFix scrape config or ServiceMonitor port name
context deadline exceededNetworkPolicy or slow appAllow monitoring namespace ingress
Target missing entirelyServiceMonitor label mismatchAdd release: prometheus label
401 UnauthorizedAuth requiredAdd basicAuth to ServiceMonitor
no such hostWrong service name/DNSFix target hostname in config
Node exporter downEC2 SG blocks port 9100Open port 9100 in security group

Related: Prometheus + Grafana Monitoring Guide | How to Set Up Prometheus Alertmanager

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments