All Articles

Kubernetes DNS Resolution Failures — How to Fix CoreDNS Issues

Fix Kubernetes DNS resolution failures caused by CoreDNS misconfigurations, ndots issues, and pod DNS policies. Real troubleshooting scenarios with step-by-step solutions.

DevOpsBoysMar 26, 20265 min read
Share:Tweet

Your pods can't resolve any service names. curl my-service.default.svc.cluster.local times out. External DNS lookups fail too. You check the logs and see NXDOMAIN responses everywhere. Welcome to one of the most common — and most frustrating — Kubernetes networking issues: CoreDNS resolution failures.

DNS is the backbone of service discovery in Kubernetes. When it breaks, everything breaks. Let's fix it.

How Kubernetes DNS Works

Before debugging, you need to understand the flow:

  1. Pod makes a DNS query (e.g., my-service)
  2. Query goes to the pod's /etc/resolv.conf nameserver (usually CoreDNS ClusterIP — 10.96.0.10)
  3. CoreDNS resolves the query using its configured plugins
  4. Response returns to the pod

Every pod gets a /etc/resolv.conf that looks like this:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That ndots:5 setting is where most problems start.

Problem 1: The ndots Trap — Slow or Failed External DNS

The Symptom

Resolving external domains (like api.github.com) takes 5-10 seconds instead of milliseconds. Or it fails entirely with NXDOMAIN.

Why It Happens

The ndots:5 default means: if a domain has fewer than 5 dots, Kubernetes appends every search domain before trying the absolute name. So when your pod queries api.github.com (2 dots, less than 5), it actually queries:

api.github.com.default.svc.cluster.local  → NXDOMAIN
api.github.com.svc.cluster.local          → NXDOMAIN
api.github.com.cluster.local              → NXDOMAIN
api.github.com.                           → SUCCESS (finally!)

That's 4 DNS queries instead of 1. Under load, this multiplies into thousands of unnecessary queries hitting CoreDNS.

The Fix

Option 1: Add a trailing dot to FQDNs in your code

yaml
# In your app config or environment variable
API_ENDPOINT: "api.github.com."  # trailing dot = absolute name, skips search

Option 2: Lower ndots in the pod spec

yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"
  containers:
    - name: app
      image: my-app:latest

Option 3: Set dnsPolicy for external-heavy workloads

yaml
spec:
  dnsPolicy: Default  # Uses node's DNS instead of cluster DNS
⚠️

Warning: dnsPolicy: Default means the pod can't resolve cluster services. Only use this for pods that exclusively talk to external services.

Problem 2: CoreDNS Pods Are CrashLoopBackOff

The Symptom

bash
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS             RESTARTS   AGE
coredns-5644d7b6d9-abcde   0/1     CrashLoopBackOff   15         1h
coredns-5644d7b6d9-fghij   0/1     CrashLoopBackOff   15         1h

Check the Logs

bash
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

Common error messages:

"Loop detected" — CoreDNS is forwarding to itself:

[FATAL] plugin/loop: Loop (127.0.0.1:53 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting

This happens when the node's /etc/resolv.conf points to 127.0.0.1 (common with systemd-resolved on Ubuntu). CoreDNS forwards to the node's nameserver → which is itself → infinite loop.

Fix: Update CoreDNS Forward Config

bash
kubectl edit configmap coredns -n kube-system

Change the forward directive from:

forward . /etc/resolv.conf

To your actual upstream DNS:

forward . 8.8.8.8 8.8.4.4 {
    max_concurrent 1000
}

Or if you want to use your cloud provider's DNS:

# AWS VPC DNS
forward . 169.254.169.253

# GCP metadata DNS
forward . 169.254.169.254

Then restart CoreDNS:

bash
kubectl rollout restart deployment coredns -n kube-system

Problem 3: DNS Lookup Works from Some Pods But Not Others

The Symptom

Pod A can resolve my-service.default.svc.cluster.local fine. Pod B in the same namespace gets NXDOMAIN.

Debug Steps

Step 1: Check the failing pod's resolv.conf

bash
kubectl exec -it pod-b -- cat /etc/resolv.conf

If the nameserver is wrong or search domains are missing, check the pod's dnsPolicy:

bash
kubectl get pod pod-b -o jsonpath='{.spec.dnsPolicy}'
  • ClusterFirst (default) — uses CoreDNS. This is what you want.
  • Default — uses the node's DNS. Won't resolve cluster services.
  • None — uses only what's in dnsConfig. Easy to misconfigure.

Step 2: Test DNS directly from the pod

bash
kubectl exec -it pod-b -- nslookup my-service.default.svc.cluster.local
# or
kubectl exec -it pod-b -- dig my-service.default.svc.cluster.local

If the container doesn't have DNS tools:

bash
kubectl run dns-debug --image=busybox:1.36 --rm -it --restart=Never -- nslookup my-service.default.svc.cluster.local

Step 3: Check if the service actually exists

bash
kubectl get svc my-service -n default
kubectl get endpoints my-service -n default

No endpoints? The service selector doesn't match any pods.

Problem 4: DNS Timeouts Under Load

The Symptom

DNS works fine normally but starts timing out during peak traffic. Pods get i/o timeout or no such host errors intermittently.

Why It Happens

CoreDNS by default runs 2 replicas. If you have hundreds of pods all making DNS queries, CoreDNS gets overwhelmed.

Fix: Scale CoreDNS

bash
# Quick fix: increase replicas
kubectl scale deployment coredns -n kube-system --replicas=5

Better fix — use the dns-autoscaler addon:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: autoscaler
          image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.9
          command:
            - /cluster-proportional-autoscaler
            - --namespace=kube-system
            - --configmap=dns-autoscaler
            - --target=deployment/coredns
            - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2,"max":10,"preventSinglePointFailure":true}}

Also, add CoreDNS caching to reduce upstream query load:

bash
kubectl edit configmap coredns -n kube-system
.:53 {
    cache 300  # Cache responses for 5 minutes
    ...
}

Problem 5: Custom Domain Resolution Fails

The Symptom

You need pods to resolve internal corporate domains (e.g., api.internal.company.com) that are hosted on a private DNS server, but CoreDNS returns NXDOMAIN.

Fix: Add Conditional Forwarding

Edit the CoreDNS ConfigMap to forward specific domains to your internal DNS:

bash
kubectl edit configmap coredns -n kube-system
.:53 {
    errors
    health {
      lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
      ttl 30
    }
    prometheus :9153
    forward . 8.8.8.8 8.8.4.4
    cache 30
    loop
    reload
    loadbalance
}

# Conditional forwarding for internal domains
internal.company.com:53 {
    errors
    cache 30
    forward . 10.0.0.2 10.0.0.3  # Your internal DNS servers
}

Monitoring CoreDNS

CoreDNS exposes Prometheus metrics on port 9153. Set up monitoring to catch issues before they become outages:

yaml
# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.io/v1
kind: ServiceMonitor
metadata:
  name: coredns
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: kube-dns
  endpoints:
    - port: metrics
      interval: 15s

Key metrics to alert on:

  • coredns_dns_requests_total — total query rate
  • coredns_dns_responses_total{rcode="SERVFAIL"} — server failures
  • coredns_dns_responses_total{rcode="NXDOMAIN"} — high NXDOMAIN rate might indicate misconfiguration
  • coredns_forward_responses_total{rcode="SERVFAIL"} — upstream DNS failures
  • coredns_panics_total — CoreDNS crashes

Quick Reference: DNS Debug Checklist

bash
# 1. Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns
 
# 2. Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
 
# 3. Check pod's DNS config
kubectl exec -it <pod> -- cat /etc/resolv.conf
 
# 4. Test DNS from a debug pod
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default
 
# 5. Check CoreDNS ConfigMap
kubectl get configmap coredns -n kube-system -o yaml
 
# 6. Check CoreDNS metrics
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl localhost:9153/metrics | grep coredns_dns_responses_total

Wrapping Up

DNS issues in Kubernetes are almost always one of these five problems: ndots misconfiguration, CoreDNS loop detection, wrong dnsPolicy, capacity limits, or missing conditional forwarding. The debug checklist above will get you to the root cause in under 5 minutes.

If you're looking to deepen your Kubernetes networking knowledge, KodeKloud's hands-on labs are excellent for practicing these scenarios in a real cluster environment.


Found this helpful? Share it with your team — DNS issues hit everyone eventually.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments