Topic 26

TCP and the Three-Way Handshake

TCP

TCP takes IP's best-effort datagrams and hands the application something far more useful: a reliable, ordered byte stream. To do that it has to establish state on both ends before any data moves — each side must agree on where the other's byte numbering begins. That agreement is the three-way handshake: SYN, SYN-ACK, ACK. Three packets, one round trip, and only then can data flow.

The handshake is not ceremony you can ignore. It is why every new TCP connection costs at least one round-trip time before the first byte of payload — 30 ms on a cross-country link, 150 ms across an ocean — which is why connection reuse matters so much. It is also the surface that SYN floods attack and the reason a "closed" socket does not vanish instantly. Understanding these three packets explains connection latency, half-open connections, and a whole category of teardown bugs.

The three-way handshake — one RTT to ESTABLISHED

SYNclient ISN

→

SYN-ACKserver ISN + ack

→

ACKacks server ISN

→

ESTABLISHEDboth ends synced

The Byte-Stream Abstraction

TCP presents the application with an unbroken stream of bytes, not packets. You write 10 KB and the kernel may slice it into eight segments; the receiver may get them out of order or with gaps, but your read() only ever returns a contiguous, in-order prefix of the stream. Every byte in the connection has a sequence number, and that numbering is what lets TCP reassemble segments into the original order no matter how the network shuffled them.

Because it is a stream and not a sequence of messages, TCP does not preserve write boundaries — two send() calls may arrive as one read(), or one send may split across two reads. The application is responsible for framing: length prefixes, delimiters, or a higher-level protocol. This is the opposite of UDP's datagram boundaries, and forgetting it is a classic source of protocol-parsing bugs.

The Three-Way Handshake

The handshake exists to synchronize the two sequence-number spaces. The client picks a random initial sequence number (ISN) and sends a SYN carrying it. The server replies with a SYN-ACK: its own random ISN plus an acknowledgment of the client's. The client sends a final ACK of the server's ISN, and the connection is ESTABLISHED on both ends. The ISNs are randomized, not zero, to make it hard for an off-path attacker to forge segments into the stream.

Three packets, one and a half round trips of signaling — but the client can piggyback data on that final ACK, so usable throughput begins after one RTT. That single RTT is unavoidable with classic TCP, which is exactly what QUIC and TLS 1.3 attack with their 0-RTT and 1-RTT resumption modes. The cost is fixed per connection, so the way to beat it is to open fewer connections and keep them alive.

# the three packets of a connection setup, seen by tcpdump
tcpdump -n -i eth0 'tcp[tcpflags] & (tcp-syn|tcp-ack) != 0'
# client > server: Flags [S],   seq 1043...        (SYN)
# server > client: Flags [S.],  seq 88..., ack 1044 (SYN-ACK)
# client > server: Flags [.],   ack 89...           (ACK -> ESTABLISHED)

The Connection State Machine

TCP is a state machine, and the states are exactly what ss and netstat report. A server socket sits in LISTEN. An arriving SYN moves a half-open connection to SYN-RECV while the server waits for the final ACK; the client side passes through SYN-SENT. Once the handshake completes, both ends reach ESTABLISHED, where data flows. Reading these states off a live socket is how you tell a handshake stuck mid-flight from a connection that completed and then stalled.

The half-open window between SYN and the final ACK is what a SYN flood abuses: an attacker sends SYNs and never completes them, filling the server's backlog of SYN-RECV entries until legitimate clients are refused. SYN cookies defuse this by encoding the connection state into the server's ISN instead of allocating a backlog entry, so the server holds no memory for a half-open connection until the genuine final ACK proves the client is real.

Connection Teardown

Closing is a four-way exchange, because each direction of the stream is shut independently. The side that closes first sends a FIN, the peer ACKs it, and that direction is now half-closed — the peer can still send. When the peer finishes, it sends its own FIN, the first side ACKs, and the connection is fully closed. The four packets reflect that a TCP connection is really two simplex streams glued together.

The side that sent the first FIN — the active closer — does not free its socket immediately. It enters TIME_WAIT for twice the maximum segment lifetime, holding the 5-tuple so that delayed segments from the old connection cannot be mistaken for a new one reusing the same ports. On a busy active closer this state piles up and is a leading cause of port exhaustion, which Topic 30 covers in operational detail.

TCP Setup vs TLS Setup

The TCP handshake establishes the connection itself — SYN, SYN-ACK, ACK — and costs one round trip before any application byte can be sent. It synchronizes sequence numbers; it does nothing about encryption.

The TLS handshake then runs on top of the established TCP connection to negotiate keys, adding one more round trip in TLS 1.3 or two in older versions. The total connect latency for an HTTPS request is therefore several RTTs stacked — one for TCP, one or two for TLS — which is why latency-sensitive services keep connections warm rather than opening fresh ones.

Common Mistakes

Ignoring the handshake RTT in a latency budget. Every new TCP connection costs one round trip before the first payload byte; on a 150 ms transoceanic path that is 150 ms a connection pool would have amortized away.
Running a public server without SYN-cookie protection. A SYN flood fills the half-open backlog with SYN-RECV entries and refuses real clients, where net.ipv4.tcp_syncookies would have absorbed it.
Reusing a 5-tuple still in TIME_WAIT. A rapid reconnect on the same source and destination ports can collide with the lingering state and fail to bind until the 2×MSL timer expires.
Assuming close() frees the socket instantly. The active closer sits in TIME_WAIT for tens of seconds, so a process that closes thousands of connections fast accumulates state long after the code thinks it is done.
Treating a byte stream like message frames. TCP merges and splits writes, so reading one recv() as one application message corrupts the protocol the moment two messages coalesce into a single segment.

Best Practices

Amortize the handshake with keep-alive and connection pooling on any client that makes repeated requests, so the one-RTT setup cost is paid once rather than per request.
Enable net.ipv4.tcp_syncookies on any internet-facing listener so a SYN flood cannot exhaust the half-open backlog and lock out real clients.
Frame messages explicitly with a length prefix or delimiter, never relying on TCP segment boundaries, since the stream gives no guarantee about where one write ends.
Read socket state with ss -tan when diagnosing connection problems, so you can distinguish a stalled handshake in SYN-SENT from a completed connection that later hung.
Make the peer that handles more connections the passive closer where you can, so TIME_WAIT accumulates on the lower-volume side and does not exhaust the busy server's ports.

Comparable conceptsQUIC (0/1-RTT setup)TLS 1.3 handshake

Knowledge Check

What does the three-way handshake actually accomplish before data can flow?

Each side exchanges and acknowledges a random initial sequence number
It negotiates the encryption keys that will protect the connection's payload
It reserves a dedicated bandwidth path through the routers along the route
It measures the path MTU end to end and locks the maximum segment size for the entire connection's lifetime

A public server's backlog fills with half-open connections during an attack and it stops accepting clients. What mitigates this?

Raising the process file-descriptor limit so the kernel can keep more half-open sockets in the backlog at once
SYN cookies, which encode the state in the ISN until the final ACK
Shortening the TIME_WAIT timer so old connections are reclaimed faster
Increasing the receive window so each connection consumes less backlog

Why does the side that calls close() first not free the socket immediately?

It is still busy flushing all the unsent data out of its send buffer before it is allowed to release the port back
It must wait for the peer to grant permission to reuse the connection's ports
It enters TIME_WAIT for 2×MSL so stale segments cannot pollute a new connection
The four-way close never fully completes, so the socket stays open indefinitely

You got correct