Kubernetes DNS Resolution Failures — How to Fix CoreDNS Issues
Fix Kubernetes DNS resolution failures caused by CoreDNS misconfigurations, ndots issues, and pod DNS policies. Real troubleshooting scenarios with step-by-step solutions.
Your pods can't resolve any service names. curl my-service.default.svc.cluster.local times out. External DNS lookups fail too. You check the logs and see NXDOMAIN responses everywhere. Welcome to one of the most common — and most frustrating — Kubernetes networking issues: CoreDNS resolution failures.
DNS is the backbone of service discovery in Kubernetes. When it breaks, everything breaks. Let's fix it.
How Kubernetes DNS Works
Before debugging, you need to understand the flow:
- Pod makes a DNS query (e.g.,
my-service) - Query goes to the pod's
/etc/resolv.confnameserver (usually CoreDNS ClusterIP —10.96.0.10) - CoreDNS resolves the query using its configured plugins
- Response returns to the pod
Every pod gets a /etc/resolv.conf that looks like this:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
That ndots:5 setting is where most problems start.
Problem 1: The ndots Trap — Slow or Failed External DNS
The Symptom
Resolving external domains (like api.github.com) takes 5-10 seconds instead of milliseconds. Or it fails entirely with NXDOMAIN.
Why It Happens
The ndots:5 default means: if a domain has fewer than 5 dots, Kubernetes appends every search domain before trying the absolute name. So when your pod queries api.github.com (2 dots, less than 5), it actually queries:
api.github.com.default.svc.cluster.local → NXDOMAIN
api.github.com.svc.cluster.local → NXDOMAIN
api.github.com.cluster.local → NXDOMAIN
api.github.com. → SUCCESS (finally!)
That's 4 DNS queries instead of 1. Under load, this multiplies into thousands of unnecessary queries hitting CoreDNS.
The Fix
Option 1: Add a trailing dot to FQDNs in your code
# In your app config or environment variable
API_ENDPOINT: "api.github.com." # trailing dot = absolute name, skips searchOption 2: Lower ndots in the pod spec
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
dnsConfig:
options:
- name: ndots
value: "2"
containers:
- name: app
image: my-app:latestOption 3: Set dnsPolicy for external-heavy workloads
spec:
dnsPolicy: Default # Uses node's DNS instead of cluster DNSWarning: dnsPolicy: Default means the pod can't resolve cluster services. Only use this for pods that exclusively talk to external services.
Problem 2: CoreDNS Pods Are CrashLoopBackOff
The Symptom
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-5644d7b6d9-abcde 0/1 CrashLoopBackOff 15 1h
coredns-5644d7b6d9-fghij 0/1 CrashLoopBackOff 15 1hCheck the Logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50Common error messages:
"Loop detected" — CoreDNS is forwarding to itself:
[FATAL] plugin/loop: Loop (127.0.0.1:53 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting
This happens when the node's /etc/resolv.conf points to 127.0.0.1 (common with systemd-resolved on Ubuntu). CoreDNS forwards to the node's nameserver → which is itself → infinite loop.
Fix: Update CoreDNS Forward Config
kubectl edit configmap coredns -n kube-systemChange the forward directive from:
forward . /etc/resolv.conf
To your actual upstream DNS:
forward . 8.8.8.8 8.8.4.4 {
max_concurrent 1000
}
Or if you want to use your cloud provider's DNS:
# AWS VPC DNS
forward . 169.254.169.253
# GCP metadata DNS
forward . 169.254.169.254
Then restart CoreDNS:
kubectl rollout restart deployment coredns -n kube-systemProblem 3: DNS Lookup Works from Some Pods But Not Others
The Symptom
Pod A can resolve my-service.default.svc.cluster.local fine. Pod B in the same namespace gets NXDOMAIN.
Debug Steps
Step 1: Check the failing pod's resolv.conf
kubectl exec -it pod-b -- cat /etc/resolv.confIf the nameserver is wrong or search domains are missing, check the pod's dnsPolicy:
kubectl get pod pod-b -o jsonpath='{.spec.dnsPolicy}'ClusterFirst(default) — uses CoreDNS. This is what you want.Default— uses the node's DNS. Won't resolve cluster services.None— uses only what's indnsConfig. Easy to misconfigure.
Step 2: Test DNS directly from the pod
kubectl exec -it pod-b -- nslookup my-service.default.svc.cluster.local
# or
kubectl exec -it pod-b -- dig my-service.default.svc.cluster.localIf the container doesn't have DNS tools:
kubectl run dns-debug --image=busybox:1.36 --rm -it --restart=Never -- nslookup my-service.default.svc.cluster.localStep 3: Check if the service actually exists
kubectl get svc my-service -n default
kubectl get endpoints my-service -n defaultNo endpoints? The service selector doesn't match any pods.
Problem 4: DNS Timeouts Under Load
The Symptom
DNS works fine normally but starts timing out during peak traffic. Pods get i/o timeout or no such host errors intermittently.
Why It Happens
CoreDNS by default runs 2 replicas. If you have hundreds of pods all making DNS queries, CoreDNS gets overwhelmed.
Fix: Scale CoreDNS
# Quick fix: increase replicas
kubectl scale deployment coredns -n kube-system --replicas=5Better fix — use the dns-autoscaler addon:
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: autoscaler
image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.9
command:
- /cluster-proportional-autoscaler
- --namespace=kube-system
- --configmap=dns-autoscaler
- --target=deployment/coredns
- --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2,"max":10,"preventSinglePointFailure":true}}Also, add CoreDNS caching to reduce upstream query load:
kubectl edit configmap coredns -n kube-system.:53 {
cache 300 # Cache responses for 5 minutes
...
}
Problem 5: Custom Domain Resolution Fails
The Symptom
You need pods to resolve internal corporate domains (e.g., api.internal.company.com) that are hosted on a private DNS server, but CoreDNS returns NXDOMAIN.
Fix: Add Conditional Forwarding
Edit the CoreDNS ConfigMap to forward specific domains to your internal DNS:
kubectl edit configmap coredns -n kube-system.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
# Conditional forwarding for internal domains
internal.company.com:53 {
errors
cache 30
forward . 10.0.0.2 10.0.0.3 # Your internal DNS servers
}
Monitoring CoreDNS
CoreDNS exposes Prometheus metrics on port 9153. Set up monitoring to catch issues before they become outages:
# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.io/v1
kind: ServiceMonitor
metadata:
name: coredns
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: kube-dns
endpoints:
- port: metrics
interval: 15sKey metrics to alert on:
coredns_dns_requests_total— total query ratecoredns_dns_responses_total{rcode="SERVFAIL"}— server failurescoredns_dns_responses_total{rcode="NXDOMAIN"}— high NXDOMAIN rate might indicate misconfigurationcoredns_forward_responses_total{rcode="SERVFAIL"}— upstream DNS failurescoredns_panics_total— CoreDNS crashes
Quick Reference: DNS Debug Checklist
# 1. Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns
# 2. Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
# 3. Check pod's DNS config
kubectl exec -it <pod> -- cat /etc/resolv.conf
# 4. Test DNS from a debug pod
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default
# 5. Check CoreDNS ConfigMap
kubectl get configmap coredns -n kube-system -o yaml
# 6. Check CoreDNS metrics
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl localhost:9153/metrics | grep coredns_dns_responses_totalWrapping Up
DNS issues in Kubernetes are almost always one of these five problems: ndots misconfiguration, CoreDNS loop detection, wrong dnsPolicy, capacity limits, or missing conditional forwarding. The debug checklist above will get you to the root cause in under 5 minutes.
If you're looking to deepen your Kubernetes networking knowledge, KodeKloud's hands-on labs are excellent for practicing these scenarios in a real cluster environment.
Found this helpful? Share it with your team — DNS issues hit everyone eventually.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Ingress-NGINX Is Being Retired: How to Migrate to Gateway API Before It Breaks
Ingress-NGINX is officially being retired. Your ingress rules will stop working. Here's the step-by-step migration plan to Kubernetes Gateway API before it's too late.
Kubernetes DNS Not Working: How to Fix CoreDNS Failures in Production
Pods can't resolve hostnames? Getting NXDOMAIN or 'no such host' errors? Here's how to diagnose and fix CoreDNS issues in Kubernetes step by step.
Nginx Ingress 502 Bad Gateway — How to Fix It (2026)
Getting 502 Bad Gateway from your Nginx Ingress Controller? Here's every cause and the exact fix for each one.