Cluster DNS and Service Discovery
Service discovery in Kubernetes is DNS. A component called CoreDNS runs in the cluster and resolves Service names to their virtual IPs, so a Pod can reach another service by name — orders or orders.shop.svc.cluster.local — without ever knowing an IP.
It is the glue that makes the flat network usable. Pods come and go and their IPs change; the Service name is stable, and DNS is how everything finds everything else.
CoreDNS in the Cluster
CoreDNS runs as a Deployment, fronted by a Service, and every Pod is configured to use it as its resolver. When a Pod looks up a Service name, CoreDNS answers with the Service's ClusterIP; from there kube-proxy takes over. CoreDNS watches the API server, so as Services appear and disappear its records update automatically. It is a critical-path component — when DNS is slow or down, the whole cluster feels broken.
Service DNS Names
Services follow a predictable naming scheme: <service>.<namespace>.svc.cluster.local. Within the same namespace, the short name orders resolves; from another namespace you need orders.shop or the full name. This is also why namespaces are not a network boundary — any Pod can reach any Service by its fully qualified name unless a NetworkPolicy says otherwise.
# same namespace curl http://orders/ # another namespace (shop) curl http://orders.shop/ # fully qualified curl http://orders.shop.svc.cluster.local/
Headless Services and Per-Pod DNS
A normal Service resolves to one virtual IP. A headless Service (clusterIP: None) instead returns the individual Pod IPs as multiple DNS records, and, paired with a StatefulSet, gives each Pod a stable per-Pod name like db-0.db.shop.svc.cluster.local. This is how clustered systems address specific members rather than a load-balanced pool — the DNS companion to StatefulSets (Topic 14).
Resolution Gotchas
Two issues bite in practice. The ndots setting in a Pod's resolv.conf causes short names to be tried against several search domains first, so a lookup of an external host can fire several failed queries before succeeding — measurable latency at scale, often fixed by using fully qualified names (a trailing dot) for external hosts. And CoreDNS itself must be scaled and monitored: an under-provisioned CoreDNS becomes a cluster-wide bottleneck, since every connection starts with a name lookup.
Normal Service — one DNS name → one virtual IP, load-balanced to Pods. For stateless pools.
Headless Service — one DNS name → the individual Pod IPs, plus stable per-Pod names with a StatefulSet. For addressing specific members.
- Assuming a short Service name resolves across namespaces — it needs the namespace or FQDN.
- Ignoring
ndotslatency on external lookups, where short names trigger several failed search-domain queries. - Treating cross-namespace DNS resolution as proof of isolation — it is not; use NetworkPolicy.
- Under-provisioning CoreDNS, turning name resolution into a cluster-wide bottleneck.
- Relying on Pod DNS names from a normal Service instead of a headless Service for per-Pod addressing.
- Use Service names, not IPs, for all in-cluster communication.
- Use fully qualified names (with a trailing dot) for external hosts to sidestep
ndotssearch-domain latency. - Pair StatefulSets with a headless Service when clients must address specific Pods.
- Scale and monitor CoreDNS as the critical-path component it is.
- Add NetworkPolicy if you need the namespace boundary to actually restrict traffic.
Knowledge Check
What is the fully qualified DNS name of a Service named orders in namespace shop?
- orders.shop.svc.cluster.local
- shop.orders.svc.cluster.local
- orders.cluster.local.shop
- svc.orders.shop.local
What does a headless Service (clusterIP: None) return from DNS?
- The individual Pod IPs, enabling stable per-Pod names with a StatefulSet
- A single stable virtual IP that kube-proxy load-balances across all backing Pods
- The IP of each node currently hosting a backing Pod
- Nothing — headless Services are excluded from DNS entirely
Why can external-hostname lookups be slow from inside a Pod?
- The ndots setting makes short names try several search domains first, firing failed queries before the real one
- CoreDNS blocks every external lookup by default until an explicit forward zone is added to its Corefile configuration
- External DNS is disabled by default unless a NetworkPolicy explicitly allows egress to the resolver
- Pods cache every external record for a full hour, so the very first lookup is delayed
You got correct