All Articles

Kubernetes DNS Not Working: How to Fix CoreDNS Failures in Production

Pods can't resolve hostnames? Getting NXDOMAIN or 'no such host' errors? Here's how to diagnose and fix CoreDNS issues in Kubernetes step by step.

DevOpsBoysMar 18, 20265 min read
Share:Tweet

It starts with a simple error: dial tcp: lookup my-service.default.svc.cluster.local: no such host. Your pods can't talk to each other. Services are unreachable. Everything was working five minutes ago.

DNS failures in Kubernetes are sneaky because they break everything. Every service-to-service call, every external API request, every database connection — all of them depend on DNS working correctly.

Let me walk you through exactly how to find the problem and fix it.

Understanding How Kubernetes DNS Works

Every Kubernetes cluster runs CoreDNS as the cluster DNS server. When a pod needs to resolve a hostname, here's what happens:

  1. Pod sends DNS query to the cluster DNS service (usually 10.96.0.10)
  2. CoreDNS receives the query and looks up the answer
  3. For internal services: CoreDNS checks the Kubernetes API for matching Service objects
  4. For external domains: CoreDNS forwards the query to upstream DNS servers

When any part of this chain breaks, your pods lose the ability to resolve names.

Step 1 — Verify DNS Is Actually Broken

Don't assume DNS is the problem. Confirm it first:

bash
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default

If DNS is working, you'll see:

Server:    10.96.0.10
Address:   10.96.0.10:53
Name:      kubernetes.default.svc.cluster.local
Address:   10.96.0.1

If you get connection timed out or NXDOMAIN, DNS is broken.

Also test external resolution:

bash
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup google.com

If internal works but external fails, the issue is upstream DNS forwarding. If both fail, CoreDNS itself is the problem.

Step 2 — Check CoreDNS Pod Status

bash
kubectl get pods -n kube-system -l k8s-app=kube-dns

You should see CoreDNS pods in Running state:

NAME                       READY   STATUS    RESTARTS   AGE
coredns-5d78c9869d-abc12   1/1     Running   0          5d
coredns-5d78c9869d-def34   1/1     Running   0          5d

If pods are CrashLoopBackOff or OOMKilled, that's your problem.

Check the logs:

bash
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

Common error patterns:

  • plugin/forward: no nameservers found — upstream DNS config is wrong
  • HINFO: read udp ... i/o timeout — network policy blocking DNS traffic
  • OOMKilled — CoreDNS needs more memory

Step 3 — Check CoreDNS ConfigMap

bash
kubectl get configmap coredns -n kube-system -o yaml

A healthy Corefile looks like this:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            launchprobe localhost:8080
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

Check for these common misconfigurations:

  • Missing kubernetes plugin — internal names won't resolve
  • Wrong forward target — external names won't resolve
  • Missing loop detection — CoreDNS can get stuck in a DNS loop and crash

Step 4 — Fix the ndots Problem

This is the #1 cause of slow DNS resolution that people miss. Check your pod's /etc/resolv.conf:

bash
kubectl exec -it your-pod -- cat /etc/resolv.conf

You'll see something like:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That ndots:5 means any hostname with fewer than 5 dots gets the search domains appended first. So when your app calls api.example.com (2 dots), Kubernetes DNS tries:

  1. api.example.com.default.svc.cluster.local — fails
  2. api.example.com.svc.cluster.local — fails
  3. api.example.com.cluster.local — fails
  4. api.example.com — finally works

That's 4 unnecessary DNS queries for every external call. Under load, this hammers CoreDNS.

Fix it by setting ndots in your pod spec:

yaml
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"

Or append a trailing dot to external hostnames in your app config: api.example.com. — the trailing dot tells the resolver it's already a fully qualified domain name.

Step 5 — Fix CoreDNS OOMKilled

If CoreDNS is getting OOMKilled in large clusters, increase its resources:

bash
kubectl edit deployment coredns -n kube-system

Update the resources:

yaml
resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "200m"

For clusters with 100+ nodes, you might need 512Mi or more. Also consider enabling the autopath plugin to reduce the number of DNS queries CoreDNS handles.

Step 6 — Check Network Policies

If you have NetworkPolicies in your cluster, they might be blocking DNS traffic. CoreDNS needs:

  • UDP port 53 — standard DNS queries
  • TCP port 53 — large DNS responses and zone transfers

Make sure your NetworkPolicy allows egress to kube-dns:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: your-namespace
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Step 7 — Scale CoreDNS for Large Clusters

Default CoreDNS deployment runs 2 replicas. For large clusters, use the DNS autoscaler:

bash
kubectl get deployment dns-autoscaler -n kube-system

If it doesn't exist, create one:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.9
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --target=deployment/coredns
        - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2}}

This scales CoreDNS replicas based on cluster size — 1 replica per 16 nodes.

Quick Diagnostic Cheatsheet

SymptomLikely CauseFix
All DNS failsCoreDNS pods downCheck pod status, restart
Internal fails, external worksKubernetes plugin misconfiguredCheck Corefile
External fails, internal worksForward config wrongFix upstream DNS in Corefile
DNS slow but worksndots:5 causing extra queriesSet ndots:2 or use trailing dot
DNS fails intermittentlyCoreDNS OOM or overloadedScale replicas, increase memory
DNS fails for specific podsNetworkPolicy blockingAdd DNS egress rule

Monitoring DNS Health

Add this Prometheus alert to catch DNS issues early:

yaml
- alert: CoreDNSDown
  expr: absent(up{job="coredns"} == 1)
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "CoreDNS is not responding"
 
- alert: CoreDNSLatencyHigh
  expr: histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) > 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "CoreDNS P99 latency above 500ms"

Wrapping Up

DNS issues in Kubernetes are frustrating because they affect everything. But they're systematic — follow this checklist:

  1. Confirm DNS is actually broken with nslookup from a test pod
  2. Check CoreDNS pods are running and healthy
  3. Verify the Corefile configuration
  4. Fix ndots for faster external resolution
  5. Scale CoreDNS for large clusters
  6. Ensure NetworkPolicies allow DNS traffic

Want to master Kubernetes networking and troubleshooting with hands-on labs? The KodeKloud Kubernetes course covers CoreDNS, network policies, and real production debugging scenarios. If you're running workloads on cloud, DigitalOcean's managed Kubernetes handles CoreDNS scaling automatically.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments