Topic 21

Cluster DNS and Service Discovery

DNSDiscovery

Service discovery in Kubernetes is DNS. A component called CoreDNS runs in the cluster and resolves Service names to their virtual IPs, so a Pod can reach another service by name — orders or orders.shop.svc.cluster.local — without ever knowing an IP.

It is the glue that makes the flat network usable. Pods come and go and their IPs change; the Service name is stable, and DNS is how everything finds everything else.

CoreDNS in the Cluster

CoreDNS runs as a Deployment, fronted by a Service, and every Pod is configured to use it as its resolver. When a Pod looks up a Service name, CoreDNS answers with the Service's ClusterIP; from there kube-proxy takes over. CoreDNS watches the API server, so as Services appear and disappear its records update automatically. It is a critical-path component — when DNS is slow or down, the whole cluster feels broken.

Service DNS Names

Services follow a predictable naming scheme: <service>.<namespace>.svc.cluster.local. Within the same namespace, the short name orders resolves; from another namespace you need orders.shop or the full name. This is also why namespaces are not a network boundary — any Pod can reach any Service by its fully qualified name unless a NetworkPolicy says otherwise.

Resolving a Service by name

# same namespace
curl http://orders/

# another namespace (shop)
curl http://orders.shop/

# fully qualified
curl http://orders.shop.svc.cluster.local/

Headless Services and Per-Pod DNS

A normal Service resolves to one virtual IP. A headless Service (clusterIP: None) instead returns the individual Pod IPs as multiple DNS records, and, paired with a StatefulSet, gives each Pod a stable per-Pod name like db-0.db.shop.svc.cluster.local. This is how clustered systems address specific members rather than a load-balanced pool — the DNS companion to StatefulSets (Topic 14).

Resolution Gotchas

Two issues bite in practice. The ndots setting in a Pod's resolv.conf causes short names to be tried against several search domains first, so a lookup of an external host can fire several failed queries before succeeding — measurable latency at scale, often fixed by using fully qualified names (a trailing dot) for external hosts. And CoreDNS itself must be scaled and monitored: an under-provisioned CoreDNS becomes a cluster-wide bottleneck, since every connection starts with a name lookup.

Normal vs headless Service DNS

Normal Service — one DNS name → one virtual IP, load-balanced to Pods. For stateless pools.

Headless Service — one DNS name → the individual Pod IPs, plus stable per-Pod names with a StatefulSet. For addressing specific members.

Common Mistakes

Assuming a short Service name resolves across namespaces — it needs the namespace or FQDN.
Ignoring ndots latency on external lookups, where short names trigger several failed search-domain queries.
Treating cross-namespace DNS resolution as proof of isolation — it is not; use NetworkPolicy.
Under-provisioning CoreDNS, turning name resolution into a cluster-wide bottleneck.
Relying on Pod DNS names from a normal Service instead of a headless Service for per-Pod addressing.

Best Practices

Use Service names, not IPs, for all in-cluster communication.
Use fully qualified names (with a trailing dot) for external hosts to sidestep ndots search-domain latency.
Pair StatefulSets with a headless Service when clients must address specific Pods.
Scale and monitor CoreDNS as the critical-path component it is.
Add NetworkPolicy if you need the namespace boundary to actually restrict traffic.

RelatedServices — what DNS names resolve to (Topic 08)StatefulSets — rely on headless-Service per-Pod DNS (Topic 14)Cloud private DNS — the external-zone analog

Knowledge Check

What is the fully qualified DNS name of a Service named orders in namespace shop?

orders.shop.svc.cluster.local
shop.orders.svc.cluster.local
orders.cluster.local.shop
svc.orders.shop.local

What does a headless Service (clusterIP: None) return from DNS?

The individual Pod IPs, enabling stable per-Pod names with a StatefulSet
A single stable virtual IP that kube-proxy load-balances across all backing Pods
The IP of each node currently hosting a backing Pod
Nothing — headless Services are excluded from DNS entirely

Why can external-hostname lookups be slow from inside a Pod?

The ndots setting makes short names try several search domains first, firing failed queries before the real one
CoreDNS blocks every external lookup by default until an explicit forward zone is added to its Corefile configuration
External DNS is disabled by default unless a NetworkPolicy explicitly allows egress to the resolver
Pods cache every external record for a full hour, so the very first lookup is delayed

You got correct