Topic 34

Caching, TTLs, and Propagation

Caching

Every DNS record carries a TTL — a time-to-live in seconds that tells a resolver how long it may serve the answer from cache before asking again. A record with TTL 300 is cached for five minutes; one with TTL 86400 is cached for a day. That single number is the only knob you have, and it trades two things directly against each other: how fast a change takes effect versus how much load and how much resilience your authoritative servers carry.

The most important thing to internalize is that "DNS propagation" is a myth. Nothing propagates — there is no push, no global broadcast of your change. When you edit a record, the authoritative server has the new value instantly; every resolver in the world keeps serving the old value until its cached copy expires. A change "propagates" only in the sense that caches age out, one TTL at a time. Understanding this is what lets you cut over a service without a multi-hour outage window.

How long a cached answer lives, by TTL

TTL 86400
one day

TTL 3600
one hour

TTL 300
five minutes

fetchedcache windowexpires → refetch

"Propagation" is just this window expiring — the longer the TTL, the longer a stale answer lingers before a resolver refetches.

The TTL Field

TTL is set per record, in the zone, and it governs caching at every resolver that fetches it. When a resolver gets an answer with TTL 3600, it starts a one-hour timer; until that timer expires it answers from cache without contacting the authoritative server again, and the TTL it hands downstream counts down from whatever is left. A short TTL means clients see changes quickly but every expiry triggers a fresh query; a long TTL means fewer queries but slow change.

There is no universally right value — it depends on the record. A stable address that never moves can sit at TTL 86400 to minimize query load. A record you expect to change, or one fronting a service that may fail over, belongs at TTL 300 or lower so a change is visible within minutes. The mistake is using one TTL for everything and discovering it is wrong only mid-migration.

# the TTL is the second field — here 3600 seconds (one hour)
api   3600   IN   A   203.0.113.10
# dig shows the *remaining* TTL counting down on a cached answer
dig api.example.com
# api.example.com.  3187  IN  A  203.0.113.10   <- 3187s left in cache

The "Propagation" Myth

When someone says a DNS change "hasn't propagated yet," what is actually happening is that resolvers are still inside the old record's TTL window. The authoritative servers already serve the new value to anyone who asks fresh. The variation people see — works for me, broken for you — is just that different resolvers fetched the record at different times, so their caches expire at different moments.

This reframing changes how you debug. Instead of "waiting for propagation," you query the authoritative server directly to confirm the record is correct at the source, then query the specific resolver a complaining user is on to see whether it is still holding the old TTL. For a resolver that honors the TTL, the worst-case wait is the TTL that was in effect when the record was last fetched, no matter how distant the resolver — though, as the next section covers, some clients pin records longer than told.

Pre-Change TTL Lowering

The technique that makes a clean cutover possible is lowering the TTL before the change, not during it. If a record sits at TTL 86400 and you want to move it, first edit only the TTL down to 300 and wait a full 86400 seconds — one old TTL — so every cache has refetched the record and learned the short value. Now change the address: the worst-case stale window is five minutes, not a day.

After the cutover settles and you are confident in the new target, raise the TTL back up to cut query load. The sequence is lower, wait one old TTL, change, verify, raise — and the planning step everyone skips is the wait. Drop the TTL and immediately change the record and you have gained nothing, because the caches still hold the day-long copy from before.

Stale Caches and Resolvers That Ignore TTL

TTL is an instruction, not a guarantee. Some resolvers and clients honor it loosely or pin records far longer than told — corporate resolvers that cap minimum TTLs, browsers and runtimes that hold their own DNS cache with their own timer, and the occasional misconfigured server that just ignores the field. These are the long-tail clients that keep hitting the old address hours after everyone else moved on.

You cannot fix the long tail with DNS alone, so plan for it. Keep the old target alive and serving — even if only redirecting — well past the nominal TTL, because a fraction of traffic will arrive there long after the cutover. Treating the TTL as a hard cutoff and tearing down the old endpoint the instant it expires is how a migration that "worked" still drops a slice of users.

Low TTL vs High TTL

Low TTL (300s or less) makes changes visible within minutes and shortens any failover window, but every expiry sends a fresh query — more load on your authoritative servers and a harder hit if they go down. Use it for records you expect to change or fail over.

High TTL (a day or more) is cheap and resilient — few queries, and clients keep resolving even during an authoritative outage — but a change takes that long to clear from caches. Use it for stable records, and lower it ahead of any planned move.

Common Mistakes

Changing an IP that still has a 24-hour TTL cached. Resolvers keep serving the old address for up to a full day, so a fraction of traffic hits the dead host long after you flipped the record.
Lowering the TTL at the same moment you change the record. The short value only helps for future caching; the caches that already hold the day-long copy keep it until it expires, so you saved nothing.
Setting ultra-low TTLs like 10 on everything. Every cache expiry becomes a fresh query, hammering your authoritative servers and making any brief authoritative outage immediately visible to clients.
Trusting that every resolver honors the TTL. Some pin records longer or cap minimums, so a slice of clients keeps using the old answer well past expiry — and DNS gives you no way to force them.
Tearing down the old target the instant the TTL elapses. The long-tail clients still arriving there get connection failures, turning a clean migration into a trickle of errors nobody can explain.

Best Practices

Lower a record's TTL to 300 at least one full old-TTL period before a planned IP change, so caches have refetched the short value and your cutover window shrinks to minutes.
Raise the TTL back up after a migration settles, so stable records stop generating needless queries and keep resolving through an authoritative outage.
Verify a change by querying the authoritative server with dig @ns1.example.com name first, then the user's resolver, so you separate a wrong record from a stale cache.
Keep the old endpoint serving (or redirecting) for hours past the nominal TTL, because some resolvers and clients pin records far longer than instructed.
Match each record's TTL to how often it changes — long for stable addresses, short for anything fronting a failover — rather than applying one default across the zone.

Comparable conceptsCDN edge TTLsHTTP Cache-Control

Knowledge Check

A record sits at TTL 86400. You drop the TTL to 300 and change the IP in the same edit. Why is the cutover still slow?

Caches already hold the old answer at 86400s, and the short TTL only affects future fetches
The authoritative server must first push the new TTL out to every resolver in the world before the change can apply anywhere
Resolvers always ignore any TTL below 600 seconds for safety
Changing the IP and TTL together is rejected, so neither edit took effect

A user says your DNS change "hasn't propagated." What is actually happening?

Their resolver is still inside the old TTL window and serving its cached copy
The new record is being actively broadcast worldwide and simply hasn't reached their resolver yet
The root servers must approve the record before resolvers can see it
Your authoritative server is still slowly rolling the change out

After a cutover, a small fraction of traffic still hits the old IP hours later. What is the most likely cause?

Long-tail resolvers and clients that pin the record past its TTL
The authoritative server is still returning the old IP to everyone
The negative cache is keeping the old positive answer alive
The NS records have a high TTL, pinning the delegation

You got correct