Cluster DNS and Service Discovery
Topic 21

Cluster DNS and Service Discovery

DNSDiscovery

Service discovery in Kubernetes is DNS. A component called CoreDNS runs in the cluster and resolves Service names to their virtual IPs, so a Pod can reach another service by name — orders or orders.shop.svc.cluster.local — without ever knowing an IP.

It is the glue that makes the flat network usable. Pods come and go and their IPs change; the Service name is stable, and DNS is how everything finds everything else.

CoreDNS in the Cluster

CoreDNS runs as a Deployment, fronted by a Service, and every Pod is configured to use it as its resolver. When a Pod looks up a Service name, CoreDNS answers with the Service's ClusterIP; from there kube-proxy takes over. CoreDNS watches the API server, so as Services appear and disappear its records update automatically. It is a critical-path component — when DNS is slow or down, the whole cluster feels broken.

Service DNS Names

Services follow a predictable naming scheme: <service>.<namespace>.svc.cluster.local. Within the same namespace, the short name orders resolves; from another namespace you need orders.shop or the full name. This is also why namespaces are not a network boundary — any Pod can reach any Service by its fully qualified name unless a NetworkPolicy says otherwise.

Resolving a Service by name
# same namespace
curl http://orders/

# another namespace (shop)
curl http://orders.shop/

# fully qualified
curl http://orders.shop.svc.cluster.local/

Headless Services and Per-Pod DNS

A normal Service resolves to one virtual IP. A headless Service (clusterIP: None) instead returns the individual Pod IPs as multiple DNS records, and, paired with a StatefulSet, gives each Pod a stable per-Pod name like db-0.db.shop.svc.cluster.local. This is how clustered systems address specific members rather than a load-balanced pool — the DNS companion to StatefulSets (Topic 14).

Resolution Gotchas

Two issues bite in practice. The ndots setting in a Pod's resolv.conf causes short names to be tried against several search domains first, so a lookup of an external host can fire several failed queries before succeeding — measurable latency at scale, often fixed by using fully qualified names (a trailing dot) for external hosts. And CoreDNS itself must be scaled and monitored: an under-provisioned CoreDNS becomes a cluster-wide bottleneck, since every connection starts with a name lookup.

Normal vs headless Service DNS

Normal Service — one DNS name → one virtual IP, load-balanced to Pods. For stateless pools.

Headless Service — one DNS name → the individual Pod IPs, plus stable per-Pod names with a StatefulSet. For addressing specific members.

Common Mistakes
  • Assuming a short Service name resolves across namespaces — it needs the namespace or FQDN.
  • Ignoring ndots latency on external lookups, where short names trigger several failed search-domain queries.
  • Treating cross-namespace DNS resolution as proof of isolation — it is not; use NetworkPolicy.
  • Under-provisioning CoreDNS, turning name resolution into a cluster-wide bottleneck.
  • Relying on Pod DNS names from a normal Service instead of a headless Service for per-Pod addressing.
Best Practices
  • Use Service names, not IPs, for all in-cluster communication.
  • Use fully qualified names (with a trailing dot) for external hosts to sidestep ndots search-domain latency.
  • Pair StatefulSets with a headless Service when clients must address specific Pods.
  • Scale and monitor CoreDNS as the critical-path component it is.
  • Add NetworkPolicy if you need the namespace boundary to actually restrict traffic.
RelatedServices — what DNS names resolve to (Topic 08)StatefulSets — rely on headless-Service per-Pod DNS (Topic 14)Cloud private DNS — the external-zone analog

Knowledge Check

What is the fully qualified DNS name of a Service named orders in namespace shop?

  • orders.shop.svc.cluster.local
  • shop.orders.svc.cluster.local
  • orders.cluster.local.shop
  • svc.orders.shop.local

What does a headless Service (clusterIP: None) return from DNS?

  • The individual Pod IPs, enabling stable per-Pod names with a StatefulSet
  • A single stable virtual IP that kube-proxy load-balances across all backing Pods
  • The IP of each node currently hosting a backing Pod
  • Nothing — headless Services are excluded from DNS entirely

Why can external-hostname lookups be slow from inside a Pod?

  • The ndots setting makes short names try several search domains first, firing failed queries before the real one
  • CoreDNS blocks every external lookup by default until an explicit forward zone is added to its Corefile configuration
  • External DNS is disabled by default unless a NetworkPolicy explicitly allows egress to the resolver
  • Pods cache every external record for a full hour, so the very first lookup is delayed

You got correct