Topic 57

Algorithms and Health Checks

Algorithms

A load balancer makes two decisions on every request: which backends are eligible, and which eligible one gets this request. Health checks answer the first; the balancing algorithm answers the second. Get the algorithm wrong and you pile uneven load on one node while others idle. Get health checks wrong and you either route traffic to a dead backend or evict a healthy one over a momentary blip.

These two mechanisms are what turn a backend failure from an outage into a non-event. A pool of ten servers with working health checks loses one and shrugs; the same pool with checks hitting the wrong path keeps sending a tenth of all traffic into a black hole. The algorithm and the checks are not tuning details — they are the difference between a balancer that absorbs failure and one that amplifies it.

Balancing Algorithms

Round-robin hands each new request to the next backend in rotation. It is simple and spreads connection count evenly, but it is blind to load — if one request costs 10 ms and the next costs 10 seconds, round-robin still deals them out one-for-one and a slow backend keeps getting fed. Weighted round-robin biases the rotation toward bigger machines, useful when the pool is not uniform.

Least-connections sends the next request to whichever backend currently has the fewest open connections, which tracks actual load far better when request costs vary. Consistent hashing maps a key — client IP, a session ID, a cache key — to a backend so the same key lands on the same backend repeatedly, and crucially, adding or removing one backend remaps only 1/N of keys instead of reshuffling all of them. That stability is what makes it the right choice for cache affinity.

# nginx: least-connections plus a weighted node and checks
upstream app {
    least_conn;
    server 10.0.0.11:8080 max_fails=3 fail_timeout=10s;
    server 10.0.0.12:8080 max_fails=3 fail_timeout=10s;
    server 10.0.0.13:8080 weight=2;   # bigger box
}

Requests are cheap and uniform in cost→round-robin

Request cost varies widely (10 ms vs 10 s)→least-connections

Same key should keep hitting the same node (cache affinity)→consistent hashing

Pool is non-uniform — some boxes are bigger→weighted round-robin

Session Affinity

Sticky sessions pin a given client to one backend for the duration of its session, usually keyed on a cookie the balancer sets or on the client IP via a hash. You need it when the backend holds per-user state in memory — an in-process session, an upload assembled across requests — and losing that backend means logging the user out. The cost is real: affinity defeats even balancing, so a few heavy users can hammer one node while the rest of the pool sits idle, and removing that node drops every session pinned to it.

The better answer is usually to remove the reason for stickiness — push session state into Redis or a database so any backend can serve any request — and reserve affinity for cases where you genuinely cannot, like a WebSocket already bound to one backend. Consistent hashing is the affinity mechanism with the least collateral damage, because adding a node reshuffles only a fraction of the keys instead of scrambling every session at once.

Health Checks

An active health check has the balancer probe each backend on a schedule — typically an HTTP GET to a readiness path every few seconds — and mark it down after a threshold of consecutive failures. A passive check infers health from real traffic: if a backend returns errors or times out on live requests, the balancer ejects it. Active checks catch a sick backend before a user hits it; passive checks catch failures the probe path misses. Production systems run both.

The thresholds matter as much as the probe. Check too aggressively — say, eject after one failure on a 1-second interval — and a single garbage-collection pause flaps a healthy node out and back, scrambling the pool. Check too leniently — 30-second intervals, five failures to eject — and a dead backend keeps taking traffic for over two minutes. The probe must hit a real readiness endpoint that exercises the app's dependencies, not a path that returns 200 while the database connection behind it is dead.

# a readiness path that actually checks dependencies
GET /healthz HTTP/1.1

# 200 only if the DB and cache are reachable:
#   interval 2s · timeout 1s · healthy after 2 · unhealthy after 3
# a backend that is "listening but broken" must return 503 here

Draining and Graceful Removal

When you deploy or scale down, you do not want to yank a backend mid-request. Connection draining stops sending the node new requests while letting in-flight ones finish, then removes it once they drain or a timeout expires — typically 30 to 300 seconds. Without it, a deploy kills live connections and users see resets and 502s for every request that happened to be in flight on a terminating node.

Draining is the deploy-time twin of health checking. A health check ejects a node that failed unexpectedly; draining removes one you are taking out on purpose, gracefully. The two together mean a rolling deploy across a ten-node pool is invisible to users — each node drains, exits, returns, and rejoins, with traffic shifting smoothly the whole time and not a single dropped request.

Round-Robin vs Least-Connections vs Consistent Hashing

Round-robin deals requests out evenly in rotation, blind to how long each takes. Choose it when requests are cheap and uniform, so even connection count means even load.

Least-connections sends each request to the backend with the fewest open connections, tracking real load. Choose it when request cost varies widely, so a slow backend stops getting piled on.

Consistent hashing maps a key to a backend and keeps it there, remapping only 1/N of keys when the pool changes. Choose it for cache affinity or sticky-backend needs where the same key should keep hitting the same node.

Common Mistakes

Pointing the health check at / instead of a real readiness path. The web server answers 200 while the database behind it is down, so the balancer keeps routing to a backend that errors on every actual request.
Setting eviction thresholds too tight. Ejecting after one failed probe on a 1-second interval flaps a node out and back on every garbage-collection pause, churning the pool and breaking sticky sessions.
Deploying with no connection draining. Terminating a backend mid-request resets every in-flight connection, so a routine rolling deploy serves a burst of 502s to whoever was mid-request.
Using round-robin for wildly uneven request costs. A backend handed a 30-second report keeps getting cheap requests dealt to it on schedule, while least-connections would have steered around it.
Turning on sticky sessions to mask state you could externalize. Affinity defeats even balancing and ties users to a node, so a few heavy sessions overload one backend and its removal logs them all out.

Best Practices

Probe a dedicated /healthz readiness endpoint that checks the backend's real dependencies, so "listening but broken" returns 503 and the balancer ejects it.
Choose least-connections when request cost varies and round-robin only when requests are uniform, matching the algorithm to the workload's variance.
Use consistent hashing for cache-affinity workloads, so adding or removing a node remaps only 1/N of keys instead of cold-missing the entire cache.
Enable connection draining with a timeout near your longest request (30–300 s) on every pool, so deploys and scale-downs finish in-flight requests instead of resetting them.
Externalize session state into Redis or a database so any backend serves any request, and reserve sticky sessions for connections like WebSockets that genuinely cannot move.

Comparable conceptsKubernetes readiness probes (the parallel)DNS round-robin (the crude version)

Knowledge Check

Backend request costs range from 5 ms to 30 seconds. Which algorithm keeps load even across the pool?

Least-connections, which steers away from a backend busy with a long request
Round-robin, since it rotates evenly and fairly through every backend in turn
Consistent hashing, which routes each request by the current load on each node
Weighted round-robin, which continuously adapts its weights to live latency

A health check does a plain GET to / and the pool keeps a backend in rotation that errors on every real request. What went wrong?

It returns 200 while dependencies are dead; probe a real readiness path
The check interval is far too slow to notice the backend failure in time
Active checks cannot detect this kind of failure, and only passive checks can
The algorithm is round-robin when it really should be least-connections

During a routine rolling deploy, users get a burst of 502 errors. Which mechanism prevents this?

Connection draining, which lets in-flight requests finish before removal
Sticky sessions, which pin each user to one stable backend for the session
A much tighter health-check interval that ejects the terminating node faster
Consistent hashing, so that requests keep landing on the same backend node

You got correct