Topic 67

Latency, Jitter, Loss, and Bufferbloat

Latency

Three numbers describe the quality of a path, and conflating them is the single most common reason an engineer tunes the wrong thing. Latency is delay — how long one packet takes to cross. Jitter is the variation in that delay from packet to packet. Loss is the fraction of packets that never arrive. They are independent: a link can have low latency and high jitter, or zero loss and terrible latency, and each one breaks a different class of application.

The trap that wastes the most time is bufferbloat — oversized router queues that hoard packets rather than dropping them, so latency stays low at idle and explodes the instant the link is loaded. It looks like a fast connection that mysteriously goes slow under use, and it is invisible to any test run on an idle link. Measuring the right number, under load, against the application that actually hurts, is what this topic is about.

Latency, Jitter, and Loss

Latency has a floor set by physics — roughly 5 ms per 1000 km of fiber each way — plus queuing and processing delay on top. It bounds anything chatty: a protocol that does ten sequential round trips pays the RTT ten times, so a 100 ms RTT turns a "fast" operation into a one-second one regardless of bandwidth. Jitter is the standard deviation of latency; a path averaging 30 ms with 2 ms jitter is smooth, the same average with 40 ms jitter is unusable for anything real-time.

Loss is the cruelest of the three for bulk transfers, because TCP reads loss as congestion and slows down hard. A path with 1% loss does not lose 1% of throughput — it can lose half or more, because every dropped packet triggers a retransmit and a congestion-window cut. Map each metric to its victim: latency hurts chatty request/response protocols, jitter wrecks real-time media, and loss strangles bulk throughput.

How Loss Crushes TCP Throughput

The relationship is mathematical, not vague. For a TCP flow, throughput is bounded roughly by MSS divided by RTT times the square root of the loss rate — the Mathis formula. The square root is the sting: dropping the loss rate by a factor of four only doubles throughput, and raising loss from 0.01% to 1% cuts the ceiling by a factor of ten. Worse, the formula has RTT in the denominator, so the same loss rate hurts a long-distance link far more than a local one.

# iperf3 over a path with mild loss — note the retransmits
iperf3 -c host -t 10
# [ID] Interval        Transfer   Bitrate        Retr  Cwnd
# [ 5] 0.00-10.0 sec   112 MBytes  94.1 Mbits/sec  318  1.41 MBytes
# 318 retransmits = loss capping the window; bandwidth is not the limit

This is why "add more bandwidth" so often fails to fix a slow transfer. If the bottleneck is loss, a fatter pipe does nothing — the window keeps collapsing on every drop, and the average sits well below the link rate. The fix is to find and fix the loss, or switch to a congestion-control algorithm like BBR that does not treat every drop as a stop sign. Chasing bandwidth for a loss problem is throwing money at the wrong number.

Jitter and Real-Time Media

Real-time media — voice, video, gaming — cares more about jitter than raw latency. A VoIP call at a steady 120 ms one-way delay is perfectly usable; the same call averaging 60 ms but swinging between 20 ms and 180 ms is choppy and broken. The reason is the jitter buffer: the receiver holds arriving packets briefly to smooth out variation, and high jitter forces a deeper buffer that adds delay, or a shallow one that drops late packets as if they were lost.

So jitter converts into either added latency or effective loss, whichever the buffer is tuned to trade. This is why a connection that benchmarks well on average bandwidth and average ping can still sound terrible on a call — the average hides the variation, and the variation is exactly what the application cannot absorb. For real-time workloads, measure the spread, not the mean.

Bufferbloat and Active Queue Management

Bufferbloat is what happens when network gear ships with queues sized for throughput instead of latency. When the link saturates, packets pile into a giant buffer instead of being dropped, and TCP — which relies on loss as its congestion signal — never gets told to slow down until the buffer is enormous and full. The result is latency that climbs from 20 ms idle to 500 ms or more under load, while a naive idle ping test shows everything is fine.

The fix is active queue management: algorithms like CoDel and fq_codel that watch how long packets sit in the queue and drop or mark them early, signaling TCP to back off before the buffer bloats. CoDel targets a standing queue delay of about 5 ms; fq_codel adds fair queuing so one bulk flow cannot starve an interactive one. The way to catch bufferbloat is to ping continuously while running a heavy download — if the RTT jumps by hundreds of milliseconds under load, the queue is bloated and AQM is the cure.

Latencyconstant delay

A fixed delay per packet — ~5 ms per 1000 km of fiber, plus queuing. Hurts chatty request/response protocols: 10 sequential round trips at 100 ms RTT cost a full second regardless of bandwidth.

Jittervariable delay

The spread of latency packet to packet. 30 ms avg with 2 ms jitter is smooth; the same average with 40 ms jitter is unusable. Wrecks real-time media — voice, video, gaming.

Lossdropped packets

Packets that never arrive. TCP reads loss as congestion and cuts hard — 1% loss can cost half your throughput. Strangles bulk transfer.

Latency vs Jitter vs Loss

Latency is constant delay; it hurts chatty request/response protocols that pay the RTT once per round trip, so a many-round-trip operation gets slow regardless of bandwidth. Fix it by cutting round trips or moving closer.

Jitter is variation in delay; it wrecks real-time media because the jitter buffer must trade depth for delay, turning variation into either latency or dropped packets. Fix it by stabilizing the path, not by raising average speed.

Loss is missing packets; it strangles bulk TCP throughput because each drop cuts the congestion window, and the Mathis ceiling falls with the square root of loss. Fix the loss itself — bandwidth does not help when loss is the bottleneck.

Common Mistakes

Measuring latency only at idle. Bufferbloat hides until the link is loaded, so an idle ping of 18 ms can mask a 400 ms latency-under-load that ruins every interactive session the moment a download starts.
Chasing bandwidth for a loss problem. When loss caps the TCP window, a fatter pipe changes nothing; the window keeps collapsing on each drop and the average stays low, so you have paid for capacity that sits unused.
Averaging away jitter. A path with great average latency and terrible variance benchmarks fine and sounds awful on a call, because the mean hides the swings the jitter buffer cannot absorb.
Assuming a "fast" link is low-latency under load. High capacity says nothing about queue behavior; a gigabit link with bloated buffers can show worse latency-under-load than a slower link with proper AQM.
Treating 1% loss as a 1% problem. TCP reads loss as congestion, so 1% loss can cut throughput by half or more — the impact is nonlinear, and dismissing it as minor leaves a transfer running at a fraction of the link.

Best Practices

Measure latency under load by pinging continuously while running a heavy transfer, because that is the only way bufferbloat reveals itself — an idle test always looks clean.
Match the metric to the workload: track loss for bulk transfers, jitter for real-time media, and round-trip latency for chatty protocols, so you tune the number that actually hurts.
Enable fq_codel as the queue discipline on links you control, so AQM signals TCP to back off before the buffer bloats and interactive flows are not starved by bulk ones.
Find and fix loss before adding bandwidth, since the Mathis ceiling falls with the square root of loss and a fatter pipe cannot lift a window that loss keeps collapsing.
Report jitter as a spread, not a mean, when characterizing a path for VoIP or video, because the average latency hides exactly the variation those applications cannot tolerate.

Comparable conceptsAQM / CoDel (queue management)QoS / DSCP prioritization (Topic 12)Mathis throughput model

Knowledge Check

A transfer over a 1 Gbps link runs at 90 Mbps with frequent TCP retransmits. Adding a second 1 Gbps link in parallel does not help. Why?

Loss is capping the window, and the Mathis ceiling depends on loss rate, not on raw bandwidth
The link is already saturated, so there is no spare capacity to use
Jitter on the path is forcing the receiver to repeatedly discard out-of-order data, which is why the second parallel link makes no measurable difference
DNS latency is being added to every segment in the transfer

A home link pings 18 ms idle but jumps to 350 ms the moment a large upload starts, then recovers when it finishes. What is happening?

Bufferbloat — an oversized queue fills under load, inflating latency until it drains
Sustained packet loss on the path is forcing constant retransmission for the entire duration of the upload, then clearing
A DNS cache miss is delaying each new connection the upload opens
The link's bandwidth is simply too low for the upload size

A VoIP call sounds choppy, yet the connection shows 40 ms average latency and 0% loss. Which metric is the likely culprit?

Jitter, because variation in delay makes the jitter buffer drop or delay late packets
Latency, since a 40 ms average one-way delay is already far too high for any usable real-time audio conversation
Loss, which is silently dropping a large share of the voice packets
Bandwidth, because voice needs more throughput than the link provides

You got correct