🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Kubernetes DNS Lookup Failing Inside Pods — Fix Guide (2026)

Pod can't resolve service names or external domains? DNS failures inside Kubernetes pods are caused by CoreDNS issues, ndots config, search domains, or network policies. Here's how to debug and fix each.

DevOpsBoysMay 1, 20264 min read
Share:Tweet

DNS failures inside pods are one of the most frustrating Kubernetes issues because the pod is running fine but can't connect to anything. Here's the systematic fix.


Quick Diagnosis

First, confirm it's a DNS issue:

bash
# Run a debug pod
kubectl run dns-test --image=busybox:1.28 --restart=Never -- sleep 3600
 
# Test DNS resolution
kubectl exec dns-test -- nslookup kubernetes.default
kubectl exec dns-test -- nslookup my-service.my-namespace.svc.cluster.local
kubectl exec dns-test -- nslookup google.com
 
# Check which DNS server the pod is using
kubectl exec dns-test -- cat /etc/resolv.conf

Expected /etc/resolv.conf:

nameserver 10.96.0.10        # CoreDNS ClusterIP
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

If nameserver is wrong, or if nslookup hangs/fails, keep reading.


Fix 1: CoreDNS Is Down or Crashing

bash
# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
 
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
 
# Restart CoreDNS if it's crashing
kubectl rollout restart deployment/coredns -n kube-system

Common CoreDNS error — "no route to host": This usually means the CoreDNS pods are unhealthy. Check node resources — if nodes are under memory pressure, CoreDNS gets evicted.

bash
# Check CoreDNS resource usage
kubectl top pods -n kube-system -l k8s-app=kube-dns
 
# Check if CoreDNS has enough resources
kubectl describe deployment coredns -n kube-system | grep -A10 "Resources:"

If CoreDNS is OOMKilled, increase its memory limit:

bash
kubectl edit deployment coredns -n kube-system
# Increase memory limit from 170Mi to 300Mi or more

Fix 2: Network Policy Blocking DNS

If you have NetworkPolicies, they might be blocking UDP/TCP port 53 to CoreDNS.

bash
# Check if NetworkPolicies exist in your namespace
kubectl get networkpolicy -n your-namespace

Add a DNS egress rule:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: your-namespace
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  # Allow DNS
  - ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  # Allow communication to kube-dns service
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

Fix 3: ndots Setting Causing Slow DNS

options ndots:5 means Kubernetes tries 5 different search domain combinations before making an external DNS query. This causes 5x latency for external domains.

Symptom: Internal service DNS works fine, external DNS is very slow.

Fix — set ndots per pod:

yaml
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"    # reduce from 5 to 2
    - name: single-request-reopen

Or globally in CoreDNS configmap:

bash
kubectl edit configmap coredns -n kube-system

Fix 4: Wrong Service Name Format

Kubernetes has a specific DNS format. If you're using the wrong format, DNS will fail.

bash
# Within the same namespace: just the service name
curl http://my-service
 
# Cross-namespace: service.namespace
curl http://my-service.other-namespace
 
# Full FQDN (works from anywhere)
curl http://my-service.my-namespace.svc.cluster.local
 
# ExternalName service (maps to external DNS)
curl http://my-external-service  # resolves to external.example.com
bash
# Verify the service exists and has the right name
kubectl get svc -n my-namespace
kubectl get endpoints -n my-namespace my-service
 
# If endpoints are empty, the service selector doesn't match the pods
kubectl describe svc my-service -n my-namespace | grep Selector
kubectl get pods -n my-namespace --show-labels

Fix 5: DNS Caching Issues (ndots + search domain loops)

bash
# Check if CoreDNS has too many cache misses
kubectl exec -n kube-system <coredns-pod> -- \
  wget -qO- localhost:9153/metrics | grep coredns_cache

If you see high cache_misses_total, your DNS is being hammered. Add node-local DNS cache:

bash
# NodeLocal DNSCache reduces DNS load significantly
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

NodeLocal DNSCache runs on every node and caches DNS responses locally, reducing load on CoreDNS by 80-90%.


Fix 6: CoreDNS ConfigMap Misconfiguration

bash
kubectl get configmap coredns -n kube-system -o yaml

Correct Corefile should look like:

.:53 {
    errors
    health {
       lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
       max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}

If forward is pointing to a broken upstream DNS, external DNS will fail. Check that /etc/resolv.conf on the nodes has working DNS servers.


Full DNS Debug Script

bash
#!/bin/bash
# Save as dns-debug.sh
 
NAMESPACE=${1:-default}
echo "=== CoreDNS Status ==="
kubectl get pods -n kube-system -l k8s-app=kube-dns
 
echo -e "\n=== CoreDNS Logs (last 20 lines) ==="
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=20
 
echo -e "\n=== DNS Test from debug pod ==="
kubectl run dns-debug-$$ --image=busybox:1.28 --restart=Never \
  -n $NAMESPACE -- sh -c "
  echo '--- /etc/resolv.conf ---'
  cat /etc/resolv.conf
  echo '--- Internal DNS ---'
  nslookup kubernetes.default
  echo '--- External DNS ---'
  nslookup google.com
  " 2>/dev/null
 
sleep 5
kubectl logs dns-debug-$$ -n $NAMESPACE
kubectl delete pod dns-debug-$$ -n $NAMESPACE --ignore-not-found

Quick summary:

  • CoreDNS crashing → restart it, check resources
  • NetworkPolicy → allow egress on port 53
  • Slow external DNS → reduce ndots from 5 to 2
  • Wrong service name → use service.namespace.svc.cluster.local
  • High DNS load → deploy NodeLocal DNSCache
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments