Kubernetes DNS Not Working: How to Fix CoreDNS Failures in Production
Pods can't resolve hostnames? Getting NXDOMAIN or 'no such host' errors? Here's how to diagnose and fix CoreDNS issues in Kubernetes step by step.
It starts with a simple error: dial tcp: lookup my-service.default.svc.cluster.local: no such host. Your pods can't talk to each other. Services are unreachable. Everything was working five minutes ago.
DNS failures in Kubernetes are sneaky because they break everything. Every service-to-service call, every external API request, every database connection — all of them depend on DNS working correctly.
Let me walk you through exactly how to find the problem and fix it.
Understanding How Kubernetes DNS Works
Every Kubernetes cluster runs CoreDNS as the cluster DNS server. When a pod needs to resolve a hostname, here's what happens:
- Pod sends DNS query to the cluster DNS service (usually
10.96.0.10) - CoreDNS receives the query and looks up the answer
- For internal services: CoreDNS checks the Kubernetes API for matching Service objects
- For external domains: CoreDNS forwards the query to upstream DNS servers
When any part of this chain breaks, your pods lose the ability to resolve names.
Step 1 — Verify DNS Is Actually Broken
Don't assume DNS is the problem. Confirm it first:
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.defaultIf DNS is working, you'll see:
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If you get connection timed out or NXDOMAIN, DNS is broken.
Also test external resolution:
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup google.comIf internal works but external fails, the issue is upstream DNS forwarding. If both fail, CoreDNS itself is the problem.
Step 2 — Check CoreDNS Pod Status
kubectl get pods -n kube-system -l k8s-app=kube-dnsYou should see CoreDNS pods in Running state:
NAME READY STATUS RESTARTS AGE
coredns-5d78c9869d-abc12 1/1 Running 0 5d
coredns-5d78c9869d-def34 1/1 Running 0 5d
If pods are CrashLoopBackOff or OOMKilled, that's your problem.
Check the logs:
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50Common error patterns:
plugin/forward: no nameservers found— upstream DNS config is wrongHINFO: read udp ... i/o timeout— network policy blocking DNS trafficOOMKilled— CoreDNS needs more memory
Step 3 — Check CoreDNS ConfigMap
kubectl get configmap coredns -n kube-system -o yamlA healthy Corefile looks like this:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
launchprobe localhost:8080
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}Check for these common misconfigurations:
- Missing
kubernetesplugin — internal names won't resolve - Wrong
forwardtarget — external names won't resolve - Missing
loopdetection — CoreDNS can get stuck in a DNS loop and crash
Step 4 — Fix the ndots Problem
This is the #1 cause of slow DNS resolution that people miss. Check your pod's /etc/resolv.conf:
kubectl exec -it your-pod -- cat /etc/resolv.confYou'll see something like:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
That ndots:5 means any hostname with fewer than 5 dots gets the search domains appended first. So when your app calls api.example.com (2 dots), Kubernetes DNS tries:
api.example.com.default.svc.cluster.local— failsapi.example.com.svc.cluster.local— failsapi.example.com.cluster.local— failsapi.example.com— finally works
That's 4 unnecessary DNS queries for every external call. Under load, this hammers CoreDNS.
Fix it by setting ndots in your pod spec:
spec:
dnsConfig:
options:
- name: ndots
value: "2"Or append a trailing dot to external hostnames in your app config: api.example.com. — the trailing dot tells the resolver it's already a fully qualified domain name.
Step 5 — Fix CoreDNS OOMKilled
If CoreDNS is getting OOMKilled in large clusters, increase its resources:
kubectl edit deployment coredns -n kube-systemUpdate the resources:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"For clusters with 100+ nodes, you might need 512Mi or more. Also consider enabling the autopath plugin to reduce the number of DNS queries CoreDNS handles.
Step 6 — Check Network Policies
If you have NetworkPolicies in your cluster, they might be blocking DNS traffic. CoreDNS needs:
- UDP port 53 — standard DNS queries
- TCP port 53 — large DNS responses and zone transfers
Make sure your NetworkPolicy allows egress to kube-dns:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: your-namespace
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53Step 7 — Scale CoreDNS for Large Clusters
Default CoreDNS deployment runs 2 replicas. For large clusters, use the DNS autoscaler:
kubectl get deployment dns-autoscaler -n kube-systemIf it doesn't exist, create one:
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: dns-autoscaler
template:
metadata:
labels:
k8s-app: dns-autoscaler
spec:
containers:
- name: autoscaler
image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.9
command:
- /cluster-proportional-autoscaler
- --namespace=kube-system
- --configmap=dns-autoscaler
- --target=deployment/coredns
- --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2}}This scales CoreDNS replicas based on cluster size — 1 replica per 16 nodes.
Quick Diagnostic Cheatsheet
| Symptom | Likely Cause | Fix |
|---|---|---|
| All DNS fails | CoreDNS pods down | Check pod status, restart |
| Internal fails, external works | Kubernetes plugin misconfigured | Check Corefile |
| External fails, internal works | Forward config wrong | Fix upstream DNS in Corefile |
| DNS slow but works | ndots:5 causing extra queries | Set ndots:2 or use trailing dot |
| DNS fails intermittently | CoreDNS OOM or overloaded | Scale replicas, increase memory |
| DNS fails for specific pods | NetworkPolicy blocking | Add DNS egress rule |
Monitoring DNS Health
Add this Prometheus alert to catch DNS issues early:
- alert: CoreDNSDown
expr: absent(up{job="coredns"} == 1)
for: 2m
labels:
severity: critical
annotations:
summary: "CoreDNS is not responding"
- alert: CoreDNSLatencyHigh
expr: histogram_quantile(0.99, rate(coredns_dns_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "CoreDNS P99 latency above 500ms"Wrapping Up
DNS issues in Kubernetes are frustrating because they affect everything. But they're systematic — follow this checklist:
- Confirm DNS is actually broken with
nslookupfrom a test pod - Check CoreDNS pods are running and healthy
- Verify the Corefile configuration
- Fix ndots for faster external resolution
- Scale CoreDNS for large clusters
- Ensure NetworkPolicies allow DNS traffic
Want to master Kubernetes networking and troubleshooting with hands-on labs? The KodeKloud Kubernetes course covers CoreDNS, network policies, and real production debugging scenarios. If you're running workloads on cloud, DigitalOcean's managed Kubernetes handles CoreDNS scaling automatically.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Ingress-NGINX Is Being Retired: How to Migrate to Gateway API Before It Breaks
Ingress-NGINX is officially being retired. Your ingress rules will stop working. Here's the step-by-step migration plan to Kubernetes Gateway API before it's too late.
Kubernetes DNS Resolution Failures — How to Fix CoreDNS Issues
Fix Kubernetes DNS resolution failures caused by CoreDNS misconfigurations, ndots issues, and pod DNS policies. Real troubleshooting scenarios with step-by-step solutions.
Nginx Ingress 502 Bad Gateway — How to Fix It (2026)
Getting 502 Bad Gateway from your Nginx Ingress Controller? Here's every cause and the exact fix for each one.