Topic 32

The Resolution Path

Resolution

Resolving www.example.com is a walk down the hierarchy, but the client never sees it. Your machine sends one question to its recursive resolver — "give me the address" — and waits for the final answer. Behind that single ask, the resolver does the legwork: it queries the root for who runs .com, queries a .com server for who runs example.com, then queries that authoritative server for the address. One recursive query from you, a chain of iterative referrals from the resolver.

That full walk happens far less often than it sounds, because every step is cached. The first lookup of the day for a name might touch root, TLD, and authoritative servers; the next lookup a second later is answered entirely from the resolver's cache. Caching is why DNS feels instant despite being a multi-hop, multi-server protocol — and why a stale cache is the thing you fight when a record changes.

One recursive ask, four iterative hops behind it

stub

→

resolver

→

root

→

.com TLD

→

authoritative

→

answer

Recursive versus Iterative Queries

The two query modes split the work. A recursive query says "do whatever it takes and bring me the final answer" — the resolver must chase the whole chain and may not return a referral. An iterative query says "tell me the best you know" — the server answers with either the data or a referral to a server closer to it, and the asker follows up itself. Your stub resolver (the OS client) sends one recursive query; the recursive resolver sends a series of iterative queries up the tree.

This is why root and TLD servers stay fast and cheap to run: they only ever answer iterative queries, handing back referrals, never doing recursion for anyone. If every client demanded recursion from the root, the root could not exist at the scale it does. The recursion happens once, at the resolver, and its result is shared across every client behind it.

The Walk, Referral by Referral

Trace a cold lookup. The resolver asks a root server for www.example.com; the root returns the NS records for .com. The resolver asks a .com TLD server; it returns the NS records for example.com. The resolver asks an authoritative server for example.com; it returns the A record with the address. Three referrals, then the answer — and the resolver caches every step along the way.

You can watch the whole walk with dig +trace, which disables the resolver's recursion and follows the referrals itself, printing each hand-off. It is the single best way to see where a delegation is broken: the trace simply stops at the level whose NS records point nowhere useful.

# +trace follows referrals yourself, printing each level of the walk
dig +trace www.example.com
# .            NS  a.root-servers.net.        <- ask root
# com.         NS  a.gtld-servers.net.        <- root refers to .com
# example.com. NS  a.iana-servers.net.        <- .com refers to example.com
# www.example.com.  A  93.184.216.34          <- authoritative answer

The Resolver's Cache

Every record the resolver learns is cached for the duration of its TTL, and the cache is keyed at each level. After one client resolves anything under .com, the .com NS records are cached for two days, so the next lookup of any .com name skips the root entirely. Resolve example.com once and a second name under it skips both root and TLD. The walk gets shorter the more the resolver has seen.

This is shared infrastructure working in your favor: a public resolver serving millions of clients keeps the popular branches of the tree hot, so most queries are answered from cache in under a millisecond and never leave the building. The cost is the flip side — a record you just changed may still be served from that cache until its old TTL expires, which the next topic is entirely about.

Negative Caching and NXDOMAIN

Resolvers cache "no" as well as "yes." When an authoritative server answers NXDOMAIN — the name does not exist — the resolver remembers that too, for a duration set by the minimum field of the zone's SOA record. This negative caching keeps a flood of queries for a typo'd or not-yet-created name from hammering the authoritative servers.

It also burns people. Create a record that someone queried a minute too early, and they keep getting NXDOMAIN until the negative-cache TTL expires — even though the record now exists. The fix you reach for, flushing or waiting, is dictated by that SOA minimum, not by how fast you added the record. A high negative-cache TTL turns a quick typo correction into a long wait.

Recursion vs Iteration

Recursive query demands the final answer: the server must chase the whole chain itself and return the address, not a referral. Your stub resolver sends exactly one of these to its configured recursive resolver. Use it when you want someone else to do the walk.

Iterative query asks for the best a server knows: it returns either the data or a referral to a server closer to the answer, and the asker follows up. Root and TLD servers answer only iteratively — which is what lets them stay simple and survive global query volume.

Common Mistakes

Assuming every lookup hits the root. It does not; after the first query the TLD and authoritative NS records are cached, so the vast majority of lookups skip the root and TLD entirely.
Ignoring negative-cache TTL after fixing a typo'd record. The NXDOMAIN you got before the fix stays cached for the SOA minimum, so the name keeps appearing broken until that timer expires, not when you save the record.
Pointing an application at an authoritative server expecting recursion. An authoritative server returns referrals, not full answers, so the app's lookups fail or stall waiting for a recursion that never comes.
Reading dig's default output and thinking it shows the walk. A plain dig uses your recursive resolver and shows only the final answer; you need dig +trace to see the referrals.
Forgetting that the resolver's cache, not the authoritative record, is what a client sees. The authoritative TTL has already started counting on the resolver from its first fetch, so a "fresh" change can still be served stale.

Best Practices

Debug delegation problems with dig +trace, which follows the referrals yourself and stops exactly at the level whose NS records are broken.
Set a deliberately low SOA minimum (negative-cache TTL) while a zone is under active change, so NXDOMAIN answers for not-yet-created names expire quickly instead of pinning for hours.
Point clients at a recursive resolver that serves many users, because its shared cache keeps popular branches hot and answers most queries without leaving the network.
Query the authoritative server directly with dig @ns1.example.com name when a change looks stuck, so you separate "the record is wrong" from "a resolver cache is stale."
Account for negative caching when scheduling a record's creation: create it before anyone queries the name, or expect the NXDOMAIN to linger for the SOA minimum.

Comparable conceptsAnycast resolver fleets (8.8.8.8)Stub resolver (the OS client)

Knowledge Check

Your stub resolver sends one query and gets the final address back. What query type did it send, and who did the walk?

A recursive query; the recursive resolver did the iterative walk for it
An iterative query, where the stub itself followed each referral up the tree to the answer
A recursive query sent straight to a root server, which answered it
An iterative query; the TLD server recursed on the stub's behalf

You add a record one minute after a colleague queried the name and got NXDOMAIN. They still see NXDOMAIN. What controls how long?

The SOA minimum field, which sets how long NXDOMAIN is negatively cached
The new record's TTL, which began counting down the moment you saved it
A fixed default positive TTL that every resolver everywhere applies to brand-new names
The root server's NS TTL, since the name resolution starts there

After a resolver has served one lookup under .com, why does the next .com lookup skip the root server?

The .com NS records from the root are cached, so it goes straight to the TLD
The resolver permanently hardcodes the TLD's servers into its config after the very first lookup
The root server keeps a session open and pushes future answers
The stub resolver, not the recursive one, takes over the walk

You got correct