Topic 65

Connectivity Tools — ping, traceroute, mtr

Tools

ping answers "is it reachable and how far," traceroute maps the hops between you and it and shows where latency jumps, and mtr merges the two into a live per-hop view of loss and latency that updates every second. They are the first three commands in almost any investigation, and the most misread, because each one leans on ICMP — and ICMP is exactly the traffic routers deprioritize, rate-limit, and firewalls filter. Take their output literally and you will diagnose problems that are not there.

The skill is not running them; it is reading them with the right suspicion. A router showing 60% loss in the middle of a traceroute is usually a healthy router that simply throttles ICMP packets addressed to itself. A traceroute that looks insane past hop 7 is often showing you a return path you cannot see. Knowing which numbers are real — final-hop latency, end-to-end loss — and which are artifacts is the entire difference between these tools helping and these tools lying.

ping — Reachability, RTT, and Loss

ping sends ICMP echo requests and times the echo replies. It gives you three facts: whether the target answers at all, the round-trip time and its spread (min/avg/max/mdev), and the packet-loss percentage over the run. The mdev figure is your first jitter signal — a tight 0.3 ms spread is a clean path, a 40 ms spread under the same average means the path is queuing unevenly.

# 50 probes so the loss percentage means something
ping -c 50 -i 0.2 api.example.com
# 64 bytes from 93.184.216.34: icmp_seq=1 ttl=54 time=18.7 ms
# 64 bytes from 93.184.216.34: icmp_seq=2 ttl=54 time=21.3 ms
# --- api.example.com ping statistics ---
# 50 packets transmitted, 49 received, 2% packet loss
# rtt min/avg/max/mdev = 17.9/19.8/63.1/6.4 ms

The caveat is that ICMP is not application traffic. A host can drop or rate-limit ping while serving HTTP perfectly, so "ping fails" never proves "the service is down" — it proves ICMP to that address is filtered, which is increasingly common. Conversely, a clean ping proves the IP is reachable but says nothing about whether port 443 answers. Ping tests the network layer; it cannot vouch for the transport above it.

traceroute — Hop Discovery and Its Limits

traceroute maps the path by abusing the TTL field. It sends probes with TTL 1, 2, 3 and so on; each router that decrements TTL to zero replies with an ICMP "time exceeded," revealing its address. Three probes per hop give you three RTT samples, and a * * * line means a hop did not reply — usually because it is configured not to, not because traffic stops there.

# -n skips reverse-DNS so it runs fast; each row is one hop
traceroute -n api.example.com
 1  10.0.0.1        0.4 ms   0.3 ms   0.4 ms
 2  100.64.0.1      8.1 ms   8.3 ms   8.0 ms
 3  203.0.113.9    11.2 ms  10.9 ms  11.4 ms
 4  * * *                                      <- hop hides, path is fine
 5  198.51.100.7   42.6 ms  41.9 ms  43.0 ms  <- latency steps up here
 6  93.184.216.34  43.1 ms  42.8 ms  43.4 ms  <- destination, what matters

Two traps live in this output. First, the RTT shown for a hop is the round trip to that router, and each router's reply may take a different return path — so a mid-path number jumping and then dropping at the next hop is a return-path artifact, not congestion. Second, traceroute shows the forward path only; the return path can be completely different and invisible, which is why an asymmetric route makes traceroute and ping disagree about where the delay is.

mtr — Continuous Per-Hop Statistics

mtr is traceroute and ping fused and run in a loop. It probes every hop continuously and shows a live table: loss percentage, sent count, last/avg/best/worst RTT per hop. Because it accumulates hundreds of samples, it separates a transient blip from a persistent problem, and the per-hop loss column points at the culprit hop instead of just confirming the end-to-end number.

# report mode: 100 cycles then print, good for tickets
mtr -rwzbc 100 api.example.com
 Host                       Loss%  Snt   Avg  Best  Wrst
 1. 10.0.0.1                 0.0%  100   0.4   0.3   1.1
 2. 100.64.0.1              48.0%  100   8.2   8.0   9.1  <- ICMP rate-limit, fake
 3. 203.0.113.9              0.0%  100  11.1  10.9  12.4  <- loss vanished: hop 2 was lying
 6. 93.184.216.34           3.0%  100  43.1  42.6  61.0  <- real loss, persists to the end

The reading rule is the one most people get wrong: loss at a middle hop that does not continue to the hops after it is fake — that router is rate-limiting ICMP to itself while forwarding your traffic fine. Loss that appears at a hop and persists all the way to the destination is real. So you scan the loss column from the bottom up: the last hop's loss is what your application feels, and the first hop where loss starts and never recovers is where it began.

Reading the Lies

Three artifacts account for nearly every misdiagnosis with these tools. Routers deprioritize and rate-limit ICMP addressed to themselves, producing scary mid-path loss that does not affect forwarded traffic. Paths are frequently asymmetric, so the latency a hop reports includes a return path you are not measuring and cannot see. And ICMP is filtered at many edges, so a total black hole in a traceroute can be a firewall policy rather than a dead network.

The discipline that survives all three: trust the destination row, not the middle. End-to-end loss and final-hop latency are what the application experiences; intermediate numbers are diagnostic hints to be corroborated, never conclusions. When mid-path loss looks alarming, check whether it persists to the last hop — if it heals, it was an artifact. When a tool says "down," confirm with a transport-level probe before believing it, because filtered ICMP fakes a failure that the actual service does not have.

Just need to know it is reachable and how far (RTT, loss)→ping

Need to know where on the path latency jumps or packets vanish→traceroute

Chasing continuous or intermittent loss — want live per-hop stats over time→mtr

ping vs traceroute vs mtr

ping answers one question — is this IP reachable, at what RTT, with what loss. Reach for it first; it is the cheapest check and tells you whether to bother with the path at all. It says nothing about where a problem is.

traceroute maps the forward hops and shows where latency steps up, but it is a one-shot snapshot and only the forward direction. Use it when ping is fine end to end but slow, to see which hop adds the delay.

mtr runs both continuously, accumulating per-hop loss and latency over hundreds of probes. Escalate to it when you need to tell a transient blip from a persistent fault, or pin loss to a specific hop. Escalate ping → traceroute → mtr as the question gets harder.

Common Mistakes

Reading mid-hop loss as the problem. Routers rate-limit ICMP addressed to themselves, so a hop showing 50% loss while every later hop shows 0% is healthy — you have flagged a router that was simply throttling your probes.
Assuming traceroute shows the return path. It maps the forward direction only; an asymmetric return route is invisible, so blaming a forward hop for delay that actually accrued on the way back is a classic dead end.
Concluding "host down" because ping fails. ICMP is filtered at many edges; the IP can be serving HTTPS fine while echo requests are dropped, so a failed ping means "ICMP blocked," not "service dead."
Ignoring that final-hop latency is what matters. A scary RTT spike at hop 4 that settles back down by the destination is a return-path artifact; the number your application feels is the last hop's, not the worst middle hop's.
Trusting a 4-packet ping. The default tiny count makes the loss percentage meaningless — one dropped packet reads as 25% loss; send at least 50 probes before quoting a loss figure.

Best Practices

Send 50 or more probes with ping -c 50 before quoting loss or RTT, so the percentage is statistically meaningful and one stray drop does not read as a crisis.
Scan an mtr loss column from the destination up, because the last hop's loss is what the application feels and the first hop where loss begins and never recovers is the real culprit.
Treat mid-path loss that heals before the final hop as an ICMP rate-limit artifact, not a fault, and confirm the real story at the destination row.
Confirm a "down" verdict with a transport probe like nc -vz host 443 before believing ping, since filtered ICMP fakes an outage the actual service does not have.
Run mtr from both ends when a path is asymmetric, because each direction can take a different route and only running it from both sides reveals the half that traceroute hides.

Comparable conceptstcptraceroute (TCP-based path)hping (crafted-probe testing)PathPing (Windows)

Knowledge Check

An mtr run shows 48% loss at hop 2 but 0% at hops 3 through 6. What is the correct reading?

Hop 2 is rate-limiting ICMP to itself; the forwarded path is fine because loss does not persist
Hop 2 is dropping a real 48% of traffic and is the bottleneck
There is a persistent routing loop centered on hop 2 that is bouncing your probes back and forth and inflating the reported loss
The application is experiencing 48% packet loss to the destination

Why can a traceroute mislead you about where end-to-end latency is being added?

Each hop's RTT includes a return path that may differ from the forward path you see
It reports hop numbers but never actual time values, so latency can't be located
A single non-responding hop silently adds its entire probe timeout to the measured round-trip time of every later hop in the path
It prints hops in random order, so the latency steps appear at the wrong rows

A monitoring check pings a server every minute and alerts on failure. The server keeps serving traffic but the check fires. What is the most likely explanation?

ICMP echo to the host is filtered or rate-limited, even though its service ports answer
The application process has crashed outright and is no longer handling any of the requests arriving on its service ports
DNS for the host has stopped resolving for every client on the network
The link is fully saturated, dropping the small ping packets first

You got correct