Kubernetes DNS Not Working: How to Fix CoreDNS Failures in Production

Pods can't resolve hostnames? Getting NXDOMAIN or 'no such host' errors? Here's how to diagnose and fix CoreDNS issues in Kubernetes step by step.

It starts with a simple error: dial tcp: lookup my-service.default.svc.cluster.local: no such host. Your pods can't talk to each other. Services are unreachable. Everything was working five minutes ago.

DNS failures in Kubernetes are sneaky because they break everything. Every service-to-service call, every external API request, every database connection — all of them depend on DNS working correctly.

Let me walk you through exactly how to find the problem and fix it.

Understanding How Kubernetes DNS Works

Every Kubernetes cluster runs CoreDNS as the cluster DNS server. When a pod needs to resolve a hostname, here's what happens:

Pod sends DNS query to the cluster DNS service (usually 10.96.0.10)
CoreDNS receives the query and looks up the answer
For internal services: CoreDNS checks the Kubernetes API for matching Service objects
For external domains: CoreDNS forwards the query to upstream DNS servers

When any part of this chain breaks, your pods lose the ability to resolve names.

Step 1 — Verify DNS Is Actually Broken

Don't assume DNS is the problem. Confirm it first:

bash

kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default

If DNS is working, you'll see:

Server:    10.96.0.10
Address:   10.96.0.10:53
Name:      kubernetes.default.svc.cluster.local
Address:   10.96.0.1

If you get connection timed out or NXDOMAIN, DNS is broken.

Also test external resolution:

bash

kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup google.com

If internal works but external fails, the issue is upstream DNS forwarding. If both fail, CoreDNS itself is the problem.

Step 2 — Check CoreDNS Pod Status

bash

kubectl get pods -n kube-system -l k8s-app=kube-dns

You should see CoreDNS pods in Running state:

NAME                       READY   STATUS    RESTARTS   AGE
coredns-5d78c9869d-abc12   1/1     Running   0          5d
coredns-5d78c9869d-def34   1/1     Running   0          5d

If pods are CrashLoopBackOff or OOMKilled, that's your problem.

Check the logs:

bash

kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

Common error patterns:

plugin/forward: no nameservers found — upstream DNS config is wrong
HINFO: read udp ... i/o timeout — network policy blocking DNS traffic
OOMKilled — CoreDNS needs more memory

Step 3 — Check CoreDNS ConfigMap

bash

kubectl get configmap coredns -n kube-system -o yaml

A healthy Corefile looks like this:

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            launchprobe localhost:8080
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

Check for these common misconfigurations:

Missing kubernetes plugin — internal names won't resolve
Wrong forward target — external names won't resolve
Missing loop detection — CoreDNS can get stuck in a DNS loop and crash

Step 4 — Fix the ndots Problem

This is the #1 cause of slow DNS resolution that people miss. Check your pod's /etc/resolv.conf:

bash

kubectl exec -it your-pod -- cat /etc/resolv.conf

You'll see something like:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

That ndots:5 means any hostname with fewer than 5 dots gets the search domains appended first. So when your app calls api.example.com (2 dots), Kubernetes DNS tries:

api.example.com.default.svc.cluster.local — fails
api.example.com.svc.cluster.local — fails
api.example.com.cluster.local — fails
api.example.com — finally works

That's 4 unnecessary DNS queries for every external call. Under load, this hammers CoreDNS.

Fix it by setting ndots in your pod spec:

yaml

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"

Or append a trailing dot to external hostnames in your app config: api.example.com. — the trailing dot tells the resolver it's already a fully qualified domain name.

Step 5 — Fix CoreDNS OOMKilled

If CoreDNS is getting OOMKilled in large clusters, increase its resources:

bash

kubectl edit deployment coredns -n kube-system

Update the resources:

yaml

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "200m"

For clusters with 100+ nodes, you might need 512Mi or more. Also consider enabling the autopath plugin to reduce the number of DNS queries CoreDNS handles.

Step 6 — Check Network Policies

If you have NetworkPolicies in your cluster, they might be blocking DNS traffic. CoreDNS needs:

UDP port 53 — standard DNS queries
TCP port 53 — large DNS responses and zone transfers

Make sure your NetworkPolicy allows egress to kube-dns:

yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: your-namespace
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Step 7 — Scale CoreDNS for Large Clusters

Default CoreDNS deployment runs 2 replicas. For large clusters, use the DNS autoscaler:

bash

kubectl get deployment dns-autoscaler -n kube-system

If it doesn't exist, create one:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.9
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --target=deployment/coredns
        - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2}}

This scales CoreDNS replicas based on cluster size — 1 replica per 16 nodes.

Quick Diagnostic Cheatsheet

Symptom	Likely Cause	Fix
All DNS fails	CoreDNS pods down	Check pod status, restart
Internal fails, external works	Kubernetes plugin misconfigured	Check Corefile
External fails, internal works	Forward config wrong	Fix upstream DNS in Corefile
DNS slow but works	ndots:5 causing extra queries	Set ndots:2 or use trailing dot
DNS fails intermittently	CoreDNS OOM or overloaded	Scale replicas, increase memory
DNS fails for specific pods	NetworkPolicy blocking	Add DNS egress rule

Monitoring DNS Health

Add this Prometheus alert to catch DNS issues early:

yaml

- alert: CoreDNSDown
  expr: absent(up{job="coredns"} == 1)
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "CoreDNS is not responding"
 
- alert: CoreDNSLatencyHigh
  expr: histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) > 0.5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "CoreDNS P99 latency above 500ms"

Wrapping Up

DNS issues in Kubernetes are frustrating because they affect everything. But they're systematic — follow this checklist:

Confirm DNS is actually broken with nslookup from a test pod
Check CoreDNS pods are running and healthy
Verify the Corefile configuration
Fix ndots for faster external resolution
Scale CoreDNS for large clusters
Ensure NetworkPolicies allow DNS traffic

Want to master Kubernetes networking and troubleshooting with hands-on labs? The KodeKloud Kubernetes course covers CoreDNS, network policies, and real production debugging scenarios. If you're running workloads on cloud, DigitalOcean's managed Kubernetes handles CoreDNS scaling automatically.

Kubernetes DNS Not Working: How to Fix CoreDNS Failures in Production

Understanding How Kubernetes DNS Works

Step 1 — Verify DNS Is Actually Broken

Step 2 — Check CoreDNS Pod Status

Step 3 — Check CoreDNS ConfigMap

Step 4 — Fix the ndots Problem

Step 5 — Fix CoreDNS OOMKilled

Step 6 — Check Network Policies

Step 7 — Scale CoreDNS for Large Clusters

Quick Diagnostic Cheatsheet

Monitoring DNS Health

Wrapping Up

Stay ahead of the curve

Related Articles

AWS RDS Connection Timeout from EKS Pods — How to Fix It

Ingress-NGINX Is Being Retired: How to Migrate to Gateway API Before It Breaks

Kubernetes DNS Resolution Failures — How to Fix CoreDNS Issues

Comments