DNS Resolution
Topic 49

DNS Resolution

Networking/Services

DNS resolution turns a name like api.example.com into an IP address an application can connect to. On Linux this is not one lookup against one server. The C library asks the Name Service Switch, which consults sources in a configured order — local files, then DNS, sometimes more — before any packet leaves the host.

The operational consequence: /etc/resolv.conf is rarely the source of truth on a modern Ubuntu box. It is usually a symlink to a file managed by systemd-resolved or NetworkManager. Editing it by hand works until the next network event, then your changes vanish and resolution breaks in ways that look random.

The resolver path through NSS

A program calls getaddrinfo() in glibc. glibc does not go straight to DNS. It reads /etc/nsswitch.conf and walks the hosts: line left to right. On Debian and Ubuntu that line typically reads files dns or, with systemd-resolved active, files resolve [!UNAVAIL=return] dns.

Order matters. With files first, an entry in /etc/hosts wins over any DNS record, no matter what the authoritative server says. This is why a stale /etc/hosts line silently shadows production DNS, and why dig — which queries DNS directly and ignores NSS — disagrees with what the application actually resolves.

# the hosts line decides source order
grep ^hosts /etc/nsswitch.conf
# hosts: files resolve [!UNAVAIL=return] dns

Why /etc/resolv.conf is managed

The classic resolver reads up to three nameserver lines and a search domain list from /etc/resolv.conf. The file still exists, but on most current systems a daemon owns it. Three managers compete for that role: the older resolvconf package, NetworkManager on desktops, and systemd-resolved on Ubuntu Server since 18.04.

Check what owns the file before touching it. If it points into /run, a daemon regenerates it on every link change, lease renewal, or VPN connect. Hand edits to a generated file are overwritten without warning.

# is resolv.conf real or a symlink to a stub?
ls -l /etc/resolv.conf
# -> ../run/systemd/resolve/stub-resolv.conf
# nameserver 127.0.0.53 means systemd-resolved owns DNS

systemd-resolved and the stub at 127.0.0.53

systemd-resolved runs a stub resolver listening on 127.0.0.53:53. Applications send queries there; the daemon forwards them to the real upstream servers it learned from DHCP, Netplan, or static config. This indirection enables split DNS: queries for an internal domain route to a corporate server while everything else goes to a public resolver.

DNS is configured per link, not globally. Each interface carries its own nameservers and search domains, and a VPN can register a routing domain so only matching names use its DNS. Set persistent DNS through Netplan rather than poking the daemon, because Netplan renders the systemd-resolved config on netplan apply.

# /etc/netplan/01-netcfg.yaml — DNS per interface
network:
  ethernets:
    eth0:
      nameservers:
        addresses: [10.0.0.2, 1.1.1.1]
        search: [corp.example.com]
# sudo netplan apply  (re-renders resolved config)

Caching, TTLs, and flushing

systemd-resolved caches positive and negative answers, honoring each record's time-to-live. A record with a 300-second TTL stays cached for five minutes; a negative answer (NXDOMAIN) is cached too, so a name that did not exist when you first asked stays "missing" until its TTL expires. glibc itself does not cache, so without resolved there is no host-level cache unless you run one — dnsmasq or unbound.

Flush with resolvectl flush-caches. Restarting the service also clears the cache but drops in-flight queries. Inspect the cache and per-link state with resolvectl statistics and resolvectl status, which show hit rates and the active upstream servers.

# clear the systemd-resolved cache, no restart
resolvectl flush-caches
# confirm upstream servers and per-link DNS
resolvectl status

Debugging tools that agree with the application

Pick the tool that matches the question. resolvectl query goes through systemd-resolved and honors split DNS and search domains, so it reflects what an app sees. getent hosts goes through NSS, so it also respects /etc/hosts and the nsswitch.conf order. dig talks DNS directly and bypasses both — useful for testing a specific server, misleading for "why can my app not resolve this".

Avoid nslookup for resolver debugging on Linux. It ignores NSS and /etc/hosts, queries a server of its own choosing, and reports a name as resolvable when the application would fail. On Red Hat systems the picture is similar, except getent and dig ship in glibc-common and bind-utils respectively.

# what the app sees (NSS-aware, respects /etc/hosts)
getent hosts api.example.com
# resolver-aware, honors split DNS + search domains
resolvectl query api.example.com
# raw DNS to one server, ignores NSS
dig @1.1.1.1 api.example.com A +short
Common Mistakes
  • Editing /etc/resolv.conf directly when it is a symlink into /run. The managing daemon overwrites it on the next link or lease event, and DNS reverts silently.
  • Trusting nslookup output. It bypasses NSS and /etc/hosts, so it can report success while the application fails on the same name.
  • Forgetting /etc/hosts precedence. With files first in nsswitch, a stale local entry shadows production DNS no matter what the authoritative server returns.
  • Ignoring the search domain list. A bare name like db gets each search suffix appended, and the wrong suffix resolves to the wrong host.
  • Mixing up A and AAAA. A host with a broken IPv6 path but a valid AAAA record times out before falling back, even though the IPv4 A record works.
  • Assuming a TTL of zero means no caching. systemd-resolved still caches negative answers, so an NXDOMAIN persists until its own TTL expires.
Best Practices
  • Query with resolvectl query <name> to see what the application sees, including split DNS and search-domain expansion.
  • Check the hosts: line in /etc/nsswitch.conf before debugging. The source order explains most "DNS works but the app disagrees" cases.
  • Set DNS through Netplan or systemd-resolved, then run netplan apply. Never hand-edit a generated /etc/resolv.conf.
  • Read the record TTL with dig name +noall +answer, whose answer section prints the live TTL, before assuming a change has propagated.
  • Use getent hosts for NSS-aware lookups when you need the answer that respects /etc/hosts and source order.
  • Flush correctly with resolvectl flush-caches instead of restarting the service, which drops in-flight queries.
  • Confirm per-link state with resolvectl status after a VPN connects, so you know which interface owns which domains.
Comparable toolsWindows (ipconfig /flushdns, nslookup)macOS (scutil --dns, dscacheutil)BSD (resolv.conf, no systemd)

Knowledge Check

An application cannot resolve a name, but dig returns the correct address. What is the most likely cause?

  • NSS source order or /etc/hosts is overriding what the app resolves, while dig queries DNS directly and bypasses both
  • The authoritative DNS server for the zone is down, so only dig's locally cached copy of the record is still able to answer the query
  • dig caches answers in a place the application cannot read
  • The record's TTL is too long for the application's own timeout

Why is hand-editing /etc/resolv.conf on a default Ubuntu Server unreliable?

  • It is usually a symlink to a generated stub file that a daemon rewrites on the next link change or DHCP lease
  • The file is read-only and the edit cannot be saved at all
  • glibc ignores the file entirely on modern kernels
  • Only the search and domain lines are honored, while any nameserver directive you add is silently dropped by the resolver library

Which tool reflects what an application actually sees, including /etc/hosts and the nsswitch source order?

  • getent hosts
  • nslookup
  • dig @127.0.0.53
  • host -a

What does systemd-resolved cache that surprises operators expecting a name to resolve immediately after it is created?

  • Negative answers such as NXDOMAIN, held until their own TTL expires
  • Only IPv4 A records, while AAAA records are fetched fresh from the upstream server on every single lookup
  • Records with a TTL of zero, kept indefinitely
  • Nothing — glibc performs all host-level caching

You got correct