Topic 27

Reliability — ACKs and Retransmission

TCP

TCP's reliability is built from three moving parts, and they are simpler than their reputation. Every byte carries a sequence number, so the receiver can order and de-duplicate. The receiver acknowledges what it has received, so the sender knows what got through. And anything left unacknowledged is retransmitted. That is the whole machine: number it, confirm it, resend what was not confirmed.

What makes reliability feel fast or slow is how quickly TCP notices a loss and resends. Two mechanisms decide that: the retransmission timeout (RTO), which fires when an acknowledgment is overdue, and fast retransmit, which acts on duplicate ACKs before any timer expires. The gap between them is enormous — a timer might be hundreds of milliseconds, fast retransmit recovers in roughly one round trip — and it is exactly the gap a user perceives as a stall.

Number it, confirm it, resend what was not confirmed

Send seqbytes 1–5000

→

ACKnext = 5001

→

Loss detected3 dup-ACKs / RTO

→

Retransmitresend 1001–2000

Sequence and Acknowledgment Numbers

TCP numbers bytes, not packets. Each segment's header carries the sequence number of its first byte; the receiver's acknowledgment number is the next byte it expects, which is a cumulative ACK — it confirms everything up to that point and nothing beyond. An ACK of 5001 means "I have bytes 1 through 5000; send me 5001 next," regardless of how many segments those bytes spanned.

Cumulative ACKs are simple but blunt. If bytes 1–1000 and 2001–3000 arrive but 1001–2000 is lost, the receiver can only keep acknowledging 1001 — it cannot say "I also have 2001–3000." The sender sees a stuck ACK and, without help, would resend everything from 1001 onward. That blind spot is precisely what duplicate ACKs and SACK exist to work around.

The Retransmission Timeout

The RTO is the fallback: if an ACK for sent data does not arrive within the timeout, TCP assumes the segment was lost and resends it. The timeout is not a constant — TCP continuously estimates the round-trip time and its variance, setting the RTO to the smoothed RTT plus a margin, so a connection on a 10 ms LAN times out far sooner than one on a 200 ms satellite link.

When a retransmission itself is lost, TCP applies exponential backoff, doubling the RTO each time — 200 ms, 400 ms, 800 ms — to avoid hammering a congested path. This is why a connection to a host that has genuinely gone away appears to hang for many seconds before failing: each doubled timeout stacks on the last. An RTO-driven recovery is the slow path, and seeing many of them means loss is severe enough that fast retransmit could not catch it.

# per-socket retransmit and RTT stats — watch retrans climb on loss
ss -tin
# ... rtt:11.4/4.2 ... cwnd:42 ... retrans:0/137 ...
#         ^smoothed/var          ^in-flight   ^current/total
# aggregate retransmit counters for the whole stack
nstat -az | grep -i retrans

Fast Retransmit and Duplicate ACKs

Waiting for the RTO is wasteful when only one segment is lost and the rest are arriving. Fast retransmit exploits that. Each segment that arrives after a gap makes the receiver re-send the same cumulative ACK — a duplicate ACK — because it still wants the missing byte. When the sender sees three duplicate ACKs, it treats that as a strong loss signal and retransmits the missing segment immediately, without waiting for the timer.

Three is a deliberate threshold: one or two duplicate ACKs can be caused by simple reordering, where a segment took a different path and arrived late but not lost. Requiring three reduces false retransmissions while still recovering in about one round trip instead of an RTO's hundreds of milliseconds. Fast retransmit is the difference between a glitch you never notice and a visible stall.

Selective Acknowledgment

SACK fixes the cumulative ACK's blind spot. With SACK enabled — negotiated in the handshake — the receiver can tell the sender exactly which non-contiguous ranges it already holds: "I have up to 1000, and also 2001 through 5000; I am only missing 1001 through 2000." The sender then retransmits just that one hole instead of everything after it.

On a clean link this rarely matters, but on a lossy or high-bandwidth-delay path it is the difference between a usable connection and a crawling one. Without SACK, a single loss in a large window forces a go-back-N style resend of a whole window of data; with SACK, the sender resends only the gaps and keeps the rest of the in-flight data productive. Disabling SACK on a lossy long-fat network is one of the most damaging tunables you can touch.

Timeout-Based vs Duplicate-ACK Retransmission

Timeout-based retransmission waits for the RTO to expire — the smoothed RTT plus a margin, often hundreds of milliseconds, doubling on repeated loss. It is the safety net that recovers even when ACKs stop entirely, but it is slow and shows up to the user as a hang.

Duplicate-ACK (fast) retransmission fires after three duplicate ACKs, recovering in roughly one round trip without waiting for any timer. It only works when later segments keep arriving to generate the duplicates; combined with SACK it resends only the precise gap, which is why SACK beats plain cumulative ACK on lossy links.

Common Mistakes

Confusing packet loss with high latency. RTO inflation from backoff makes a lossy path look like a frozen one, sending you to debug the application when the real problem is drops on the wire.
Disabling SACK on lossy or high-BDP paths. Without it, one loss in a large window triggers a go-back-N resend of everything after the gap, collapsing throughput exactly where you most need it.
Reading any retransmit count as "the network is down." A few percent retransmissions is normal on a busy path; treating every retransmit as an outage hides the real signal, which is the rate, not the presence.
Setting an aggressive minimum RTO to "recover faster." Too low a floor turns ordinary RTT jitter into spurious retransmissions, adding load to a path that was never actually losing data.
Assuming a cumulative ACK confirms a specific later segment. It only confirms the contiguous prefix; bytes past the first gap are not acknowledged at all until SACK reports them or the hole is filled.

Best Practices

Keep SACK enabled on every path that can be lossy or high-BDP, so a single loss costs one retransmitted gap rather than a whole window resent.
Read the retransmission rate from ss -ti or nstat, not raw counts, since a rising fraction signals a path problem while a flat low fraction is normal.
Diagnose stalls by separating loss from latency: a hang that clears after a fixed multiple of the RTT points at RTO backoff, not at a slow server.
Leave RTO estimation to the stack rather than pinning a minimum, because the kernel's RTT-and-variance estimate adapts to the path better than a hand-picked floor.
Reach for fast retransmit by keeping data flowing — pipelining enough segments that a loss generates the three duplicate ACKs needed to recover before the timer fires.

Comparable conceptsQUIC (per-stream loss recovery)Forward error correction

Knowledge Check

What triggers a fast retransmit, and why is it faster than waiting for the RTO?

Three duplicate ACKs, which let the sender resend the gap in about one RTT
A single duplicate ACK, which on its own immediately proves to the sender that the segment was genuinely lost and must be resent at once
The RTO timer expiring, which then escalates the connection into fast-retransmit mode
A full receive window, which forces the sender to resend its oldest unacknowledged data

On a lossy high-bandwidth path, what does SACK provide over plain cumulative ACKs?

It encrypts the acknowledgments so an attacker cannot forge loss reports
It removes the need for retransmission entirely by reconstructing the lost bytes locally at the receiver
It reports the exact byte ranges already received, so the sender resends only the missing gap
It raises the 64 KB receive-window ceiling so more data can be in flight at once

A connection to a now-unreachable host hangs for many seconds before failing. What explains the delay?

The three-way handshake is stuck endlessly retrying its initial SYN, which is governed by a very long and entirely fixed retry interval
The RTO doubles on each failed retransmission, so backed-off timeouts stack
Fast retransmit keeps resending on every duplicate ACK, looping until a limit is hit
The receiver advertised a zero window, pausing the sender until it reopens

You got correct