Topic 56

L4 vs L7 Load Balancing

Load Balancing

A load balancer spreads traffic across a pool of backends, and it does that at one of two layers. An L4 load balancer works at the transport layer: it sees IP addresses and TCP/UDP ports and nothing more. It forwards connections to backends without ever looking inside the packets, which makes it fast, protocol-agnostic, and blind to content. An L7 load balancer works at the application layer: it terminates the connection, parses the HTTP request, and can route on the path, the Host header, a cookie, or anything else in the request.

That single difference — blind forwarding versus parsing the request — cascades into every other decision. It determines whether the balancer can do path-based routing or canary releases, how TLS is handled, whether the backend sees the real client IP, and how many requests per second one box can push. L4 is a wire; L7 is a reader. Pick the wrong one and you either pay for intelligence you can't use or reach for routing the layer physically cannot do.

L4transport layer

Forwards by the 5-tuple of IP and port. Decides once per connection, never reads the payload — fast, protocol-agnostic, and blind to content. Pushes millions of packets/sec and can pass TLS through untouched.

L7application layer

Terminates and parses the HTTP request. Routes on path, Host, or cookie and balances per request — smart but heavier, doing TLS and reassembly on every request. The throughput-versus-intelligence tradeoff.

L4 Load Balancing — Transport-Level Forwarding

An L4 balancer makes its decision once per connection, on the first packet, using only the 5-tuple of source IP, source port, destination IP, destination port, and protocol. It picks a backend and pins the whole flow there. Because it never reassembles the byte stream or parses HTTP, it handles any TCP or UDP protocol — Postgres, Redis, gRPC, raw TCP, a game server's UDP — and pushes millions of packets per second on commodity hardware.

There are two forwarding styles. In NAT mode the balancer rewrites the destination IP to the chosen backend and the return traffic flows back through it, so it sees both directions. In Direct Server Return (DSR) the balancer rewrites only the inbound path and the backend replies straight to the client, bypassing the balancer on the way out. DSR lets one balancer front enormous outbound bandwidth — a video server pushing 40 Gbps out — because the heavy return traffic never touches it, at the cost of a more complex setup where backends share the balancer's virtual IP.

L7 Load Balancing — Application-Aware Routing

An L7 balancer terminates the client connection, reads the complete HTTP request, and then decides where it goes. That lets it route on content: send /api/ to the API pool and /static/ to a cache, route checkout.example.com and blog.example.com to different services from one IP, or send 5% of traffic carrying a specific cookie to a canary build. None of that is possible at L4, because the path and headers live inside bytes an L4 balancer never inspects.

The price is per-request cost and a hard architectural fact: the balancer is now a full HTTP endpoint. It does TLS work, maintains its own connection pool to the backends, and reassembles every request before forwarding it. It can also add value the backend would otherwise repeat — response compression, request retries, header injection, and a single place to terminate TLS. An HAProxy config makes the content-routing explicit.

# HAProxy L7: route by path, with active health checks
frontend web
    bind *:443 ssl crt /etc/ssl/site.pem
    use_backend api  if { path_beg /api/ }
    default_backend app

backend api
    balance roundrobin
    option httpchk GET /healthz
    server a1 10.0.0.21:8080 check
    server a2 10.0.0.22:8080 check

Connection Termination and the Client IP

An L7 balancer always terminates: the client's TCP and TLS connection ends at the balancer, and a separate connection carries the request to the backend. An L4 balancer in DSR or plain forwarding mode does not terminate — the same TCP connection logically reaches the backend, which is why the backend can see the client's real source IP directly in the IP header.

This is the consequence engineers trip over. Behind an L4 balancer the backend sees the client IP for free. Behind an L7 balancer the backend sees the balancer's IP, because the connection originated there — the real client IP must be carried in X-Forwarded-For or, for non-HTTP TCP, the PROXY protocol header. Forget to set and read it and your access logs, rate limits, and geo rules all key on one address.

Choosing the Layer for the Workload

Reach for L4 when you need raw throughput, a non-HTTP protocol, or end-to-end TLS that no middle box should decrypt. A database proxy, a Redis front end, a UDP game backend, or a TLS service you want passed through untouched all belong at L4. Reach for L7 when the value is in the routing: path- and host-based dispatch, canary and A/B splits, response caching, and centralized TLS termination for a fleet of HTTP services.

Many real deployments stack both — an L4 balancer absorbing the raw connection volume and distributing across a tier of L7 balancers that do the content routing. One trap is unique to L4 with long-lived connections: a naive L4 balancer pins a flow to one backend for the connection's entire life, so a single gRPC or HTTP/2 connection that multiplexes thousands of requests lands every one of them on the same backend. The fix is L7 load balancing, which balances per request rather than per connection.

L4 vs L7 Load Balancer

L4 load balancer forwards by IP and port without inspecting content — fast, protocol-agnostic, and able to pass TLS through untouched. Choose it for non-HTTP protocols, maximum throughput, or when no middle box should decrypt the traffic.

L7 load balancer terminates the connection and routes on HTTP path, host, and headers — smart but heavier per request and always terminating. Choose it for content-based routing, canary releases, response caching, and centralized TLS termination across many HTTP services.

Common Mistakes

Reaching for L7 to balance a non-HTTP protocol like Postgres or Redis. An L7 balancer has no parser for it and either fails or falls back to dumb forwarding — use an L4 balancer that needs no application understanding.
Expecting an L4 balancer to do path-based routing. It only sees IP and port, never the URL, so a rule like "send /api/ to pool B" is impossible — that routing requires L7.
Forgetting to forward the client IP behind an L7 balancer. The backend sees the balancer's address because the connection originated there, so without X-Forwarded-For or PROXY protocol every log line shows one IP.
Putting gRPC or HTTP/2 behind a naive connection-level L4 balancer. One long-lived connection multiplexes thousands of requests onto a single backend, leaving the rest of the pool idle while one node melts.
Assuming L7 throughput matches L4. Terminating, parsing, and re-originating every request costs CPU, so an L7 tier sized like an L4 one saturates far sooner under the same packet rate.

Best Practices

Use L4 for non-HTTP traffic and raw throughput, and L7 only where content routing earns its per-request cost — match the layer to whether you need to read the request.
Enable the PROXY protocol on an L4 balancer when the backend needs the client IP, so the real source survives even for non-HTTP TCP that has no X-Forwarded-For.
Put HTTP/2 and gRPC behind an L7 balancer that balances per request, so multiplexed streams on one connection spread across the pool instead of pinning to a single backend.
Stack an L4 balancer in front of an L7 tier for very high connection volumes, letting the L4 layer absorb the packet rate and the L7 layer do the routing.
Use DSR on an L4 balancer for outbound-heavy workloads like video, so return traffic skips the balancer and one box can front tens of gigabits of egress.

Comparable conceptsCloud NLB vs ALB (managed L4 vs L7)Service mesh (L7 sidecar, Topic 73)

Knowledge Check

You need to route /api/ requests to one backend pool and everything else to another, from a single IP. Which layer can do this?

L7, since it terminates and parses the request and can route on the URL path
L4, by mapping each path to a distinct destination port it inspects in the header
L4 in DSR mode, which exposes the URL to the balancer on the return path
Neither, because path-based routing always demands a separate IP for each path

A backend behind an L7 balancer logs the balancer's IP as the client. Behind an L4 balancer it logs the real client IP. Why the difference?

L7 re-originates the connection; L4 forwards the same flow and keeps the source IP
L7 deliberately anonymizes the client IP for privacy while L4 simply chooses not to do that
L4 only works with plaintext traffic, so the original source IP always stays visible
L4 injects an X-Forwarded-For header on each request and L7 does not

A single gRPC connection sends thousands of requests, but a naive L4 balancer sends them all to one backend. What fixes the imbalance?

Balance at L7, which distributes per request rather than per connection
Switch the L4 balancer over to round-robin, which then rebalances on each request
Add more backends to the pool so the single pinned connection has more targets
Disable HTTP/2 multiplexing so that every single request opens its own connection

You got correct