MTU, Jumbo Frames, and Fragmentation
Topic 10

MTU, Jumbo Frames, and Fragmentation

MTU

The MTU — maximum transmission unit — is the largest payload a link will carry in a single frame, and on standard Ethernet it is 1500 bytes. A packet larger than the MTU of a link it must cross has only two outcomes: it gets fragmented into pieces, or it gets dropped. Neither is free, and the choice between them is the source of some of the most maddening "works for small requests, hangs on large ones" failures in networking.

MTU problems are nasty because they hide. A ping succeeds, a small HTTP GET succeeds, the TCP handshake succeeds — all small packets. Then a large response or a bulk upload silently stalls, because the big packets are exactly the ones that exceed the MTU somewhere on the path and vanish without an error reaching either end. Knowing where the 1500 comes from, and what a tunnel does to it, is what turns these from a multi-hour mystery into a five-minute diagnosis.

The 1500-Byte Default

The 1500-byte Ethernet MTU is the payload available to layer 3 — the IP packet — after the 14-byte Ethernet header. Of that 1500, a standard IPv4 header takes 20 bytes and a TCP header takes another 20, leaving 1460 bytes of application data in a full segment. That 1460 is the default TCP MSS (maximum segment size): MSS is MTU minus the IP and TCP headers, and it is what TCP advertises so the other end never sends a segment too big for the path.

The number is historical — it predates gigabit Ethernet and was a compromise between framing overhead and the cost of a corrupt frame — but it is now a near-universal floor you can rely on across the public internet. Anything that needs to fit through arbitrary networks sizes to 1500 or below, because you cannot assume any path supports more.

Jumbo Frames

A jumbo frame raises the MTU to 9000 bytes, cutting the per-packet overhead for bulk transfer roughly sixfold — one header per 9000 bytes instead of per 1500. The payoff is real for storage traffic (iSCSI, NFS), backups, and east-west data-center flows where throughput matters and the path is fully under your control.

The catch is that jumbo frames are all-or-nothing across a path. Every switch, NIC, and router on the segment must agree on 9000 bytes; one device still at 1500 will drop the oversized frames, and because the drop is silent it looks like intermittent loss rather than a config mismatch. Enable jumbo frames only on a closed segment where you control every hop — never on a path that touches the internet, where you cannot.

# set and verify an interface MTU
ip link set eth0 mtu 9000
ip link show eth0
# eth0: <...> mtu 9000 ...
# probe the real path MTU: largest unfragmented payload that gets through
ping -M do -s 8972 10.0.0.5   # 8972 + 8 ICMP + 20 IP = 9000

IP Fragmentation

When a packet is too big for the next link and fragmentation is allowed, the router splits it into pieces that each fit, and the receiving host — not any router in between — reassembles them. Fragmentation is expensive and discouraged: every fragment carries a full IP header, the loss of any single fragment forces the whole original packet to be retransmitted, and reassembly holds state and memory at the destination, which is an attack surface in itself.

The Don't Fragment (DF) bit in the IP header changes the outcome. With DF set, a router that would have to fragment instead drops the packet and sends back an ICMP "fragmentation needed" message telling the sender the link's MTU. Modern TCP sets DF on every packet precisely so it learns the path MTU rather than relying on fragmentation — fragmentation is the legacy fallback, not the default.

A packet too big for the next link: what decides its fate
DF bit clear — fragmentation allowedFragment
DF set, and the ICMP "frag needed" reply gets backPMTUD shrinks
DF set, but a firewall eats the ICMP replyBlack-hole drop

MTU and Tunnels

Every tunnel wraps the original packet in a fresh outer header, and that overhead comes straight out of the usable MTU. A VXLAN overlay adds about 50 bytes, IPsec roughly 50–60 depending on mode and cipher, and GRE 24 — so on a 1500-byte underlay, the inner packet that fits is only 1450, 1440, or 1476 bytes respectively. Send a 1500-byte inner packet into a 1500-byte tunnel and it no longer fits.

This is the most common real-world MTU failure. The fix is to lower the inner MTU to leave room for the encapsulation, or to clamp the TCP MSS on the tunnel so TCP never builds a segment that won't fit once wrapped. The silent-failure trap is a firewall that blocks the ICMP "fragmentation needed" messages: PMTUD then never learns the smaller MTU, the oversized packets keep getting dropped, and you get a black hole where the connection hangs forever with no error — a problem revisited in Chapter 12.

Fragmentation vs Path MTU Discovery

Fragmentation is fragment-and-hope: the sender ships a large packet and lets routers chop it to fit each link, reassembled at the destination. It works without any feedback, but it is inefficient, fragile to single-fragment loss, and a denial-of-service vector — which is why it is the legacy fallback, not the default.

Path MTU Discovery is discover-and-size: the sender sets DF on every packet, and any link too small replies with an ICMP "fragmentation needed" carrying its MTU, so the sender shrinks to fit. DF plus PMTUD is the modern default — efficient and fragment-free — but it depends entirely on those ICMP messages getting back, which a careless firewall can break.

Common Mistakes
  • Enabling jumbo frames on only part of a path. One device still at 1500 silently drops every 9000-byte frame, so throughput cratering on bulk transfers looks like random loss rather than the MTU mismatch it is.
  • Running a tunnel without accounting for its overhead. A VXLAN or IPsec wrapper eats 50-plus bytes, so a full 1500-byte inner packet no longer fits the underlay and large flows stall while small ones work fine.
  • Letting a firewall drop ICMP "fragmentation needed" messages. PMTUD goes deaf, the sender never learns the smaller MTU, and oversized packets vanish into a black hole where the connection hangs with no error to point at.
  • Assuming a successful ping proves the path MTU is fine. A default ping sends tiny packets that fit anywhere; the path's real MTU limit only surfaces under large packets, which is why small requests succeed and big ones hang.
  • Lowering an interface MTU on one host without matching the rest of the segment. A unilateral MTU change creates a mismatch that fragments or drops traffic in one direction, producing asymmetric failures that are hard to pin down.
Best Practices
  • Clamp the TCP MSS on tunnel and edge interfaces (MSS = path MTU minus 40) so TCP never builds a segment that won't fit once encapsulated, which sidesteps the whole PMTUD-black-hole class of failure.
  • Compute the effective MTU before deploying a tunnel by subtracting the encapsulation overhead from the underlay MTU, and set the inner interface to that value rather than discovering it after a production stall.
  • Allow ICMP type 3 code 4 ("fragmentation needed") through every firewall so PMTUD can function, because blocking it is what turns an MTU mismatch into a silent black hole.
  • Enable jumbo frames only on a closed segment where you control every NIC, switch, and router, and verify end to end with a DF-set ping before trusting the path to carry 9000-byte frames.
  • Probe the real path MTU with ping -M do -s and a decreasing size when a connection hangs on large transfers, to find the largest packet that survives end to end before assuming the application is at fault.
Comparable conceptsPMTUD (Topic 68)TCP MSS clamping

Knowledge Check

A connection completes its handshake and small requests, then hangs on a large upload through a tunnel. What is the most likely cause?

  • Tunnel overhead shrank the usable MTU, so only the large packets exceed it and get dropped
  • The destination port is firewalled, blocking the larger request entirely
  • DNS resolution for the destination hostname keeps failing intermittently, but only when the payload being sent is large
  • The link's FCS is rejecting the upload's frames as corrupted

Why are jumbo frames described as all-or-nothing across a path?

  • Any single device still at 1500 silently drops the oversized frames, so the whole path must agree
  • The two endpoints negotiate jumbo-frame support during the TCP handshake and quietly fall back to 1500 if either side declines
  • They are a global internet standard, so every router supports them automatically
  • Routers transparently fragment a jumbo frame down to 1500 wherever needed

What is the role of the Don't Fragment bit in modern TCP?

  • It forces a drop with an ICMP reply when a link is too small, so the sender learns the path MTU
  • It instructs each router along the path to go ahead and fragment the packet, but only when doing so is strictly necessary to fit
  • It marks the packet so its payload cannot be inspected in transit
  • It raises the packet's priority so congested links forward it first

You got correct