Topic 48

Service Dependencies and Startup Order

OrderingHealthcheck

web needs db to be up before it can serve a request, so the instinct is depends_on: [db] and assume the problem is solved. It is not. Plain depends_on waits only for the db container to start — not for Postgres inside it to finish initializing and start accepting connections — so web races ahead, connects to a port that isn't listening yet, and crashes on boot.

This is the single most common Compose footgun, and it trips everyone exactly once. The fix is small once you see it, but the failure mode is confusing because every service reports "running" while the stack is plainly broken.

What depends_on Actually Guarantees

depends_on in its short form orders container start: on up, Compose starts the dependencies first and the dependents after. That is the entire promise. "Started" means the container process is running — the kernel has launched the entrypoint — not that the application inside has finished booting and is ready to serve. The ordering is real and useful; the readiness guarantee people expect is not there.

Started Is Not Ready

postgres:16 spends a few seconds after its container starts initializing the data directory, running startup scripts, and finally binding port 5432. During that window the container is "running" but Postgres is not yet listening. web, having been told only to start after db started, opens its connection in exactly that window, gets connection-refused, and exits. The stack appears broken on first up even though every service is "running" — which is precisely why the cause is hard to spot.

Healthchecks Plus condition: service_healthy

The correct fix is to gate on readiness, not on start. Give db a healthcheck that probes whether Postgres actually accepts connections — pg_isready is the standard probe — and change web's dependency to the long form with condition: service_healthy. Now Compose holds web back until db's healthcheck passes, which is the moment Postgres is truly listening. This is the same HEALTHCHECK mechanism the image chapters touched, and Chapter 11 topic 67 covers how the check itself is written and tuned.

Startup gated on readiness with condition: service_healthy

db starts

→

db healthcheck passes

→

web starts

→

proxy starts

compose.yaml — gate web on a real readiness check, not just container start

services:
  web:
    build: ./web
    image: driftwood/web
    depends_on:
      db:
        condition: service_healthy
    networks:
      - driftwood-net

  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U driftwood -d driftwood"]
      interval: 5s
      timeout: 3s
      retries: 5
    volumes:
      - driftwood-db-data:/var/lib/postgresql/data
    networks:
      - driftwood-net

  proxy:
    image: nginx:1.27-alpine
    depends_on:
      web:
        condition: service_started
    ports:
      - "80:80"
      - "443:443"
    networks:
      - driftwood-net

Read the dependency chain off the file: proxy waits for web to start, web waits for db to pass pg_isready. The healthcheck retries every five seconds up to five times, so Compose holds web for as long as Postgres needs to come up and no longer.

App-Side Retry as the Resilient Path

Health gating fixes cold boot, but it does not cover a dependency that restarts mid-life. If db is restarted for an upgrade while web is running, the healthcheck condition was satisfied long ago and does nothing now — web simply loses its connection. The durable answer is for the application itself to retry the database connection with backoff, both at startup and on any dropped connection. The healthcheck handles the cold-boot half; the retry handles everything after.

Driftwood Wired Correctly

In the finished file db carries a pg_isready-based healthcheck, web depends on db with condition: service_healthy, and proxy depends on web. The chain means a single docker compose up brings the stack up in an order where each tier finds the one below it actually listening — no race, no connection-refused, no manual sleep between commands. Driftwood's own startup code also retries its connection, so a later db restart no longer takes web down with it.

depends_on short form vs condition: service_healthy

Short form (depends_on: [db]) — orders container startup only and returns the moment the container process is running. It loses the race against a database that needs seconds to accept connections. Use it for pure ordering, when the dependent does not connect to the dependency at boot.

condition: service_healthy form — waits for the dependency's healthcheck to pass before starting the dependent, which is what "wait until the database is actually ready" requires. Use it whenever the dependent opens a connection to the dependency during startup — which is exactly Driftwood's web-to-db case.

Common Mistakes

Writing depends_on: [db] and expecting web to wait for Postgres to accept connections — it waits only for the container to start, and web crashes on connection-refused during Postgres's init window.
Adding condition: service_healthy but giving db no healthcheck — Compose rejects the condition or never satisfies it, because there is nothing reporting health to gate on.
Writing a healthcheck that returns healthy too early — checking the process exists rather than that the port answers — so web still races in; the probe must test readiness (pg_isready, a real query), not mere liveness.
Relying solely on startup ordering and shipping no connection retry in web — the first cold boot works, but any later db restart takes web down with it, because nothing reconnects.

Best Practices

Gate dependents with depends_on plus condition: service_healthy against a real readiness healthcheck (pg_isready for db), not the bare short form, whenever the dependent connects at startup.
Write healthchecks that probe actual readiness — the port accepting a real request — rather than just the presence of a process, so "healthy" means "serves traffic".
Add connection-retry-with-backoff in the application (web) so a mid-life dependency restart is survived, treating health gating as the cold-boot half of the answer.
Order the full chain explicitly — proxy depends on web, web depends on db — so up reconciles the stack in an order where each tier finds the one below it listening.

Comparable tools A docker run sequence orders only by the human running commands in order, plus a sleep Podman podman-compose honors the same depends_on and healthcheck keys Kubernetes splits this into readiness probes and startup probes / init containers

Knowledge Check

What does plain depends_on: [db] actually wait for before starting web?

Only for the db container process to start — not for Postgres inside it to accept connections
For Postgres inside the container to finish initializing and bind port 5432 so connections succeed
For db's defined healthcheck to start passing, which the short form runs and waits on automatically
For web's own startup and initial data load to fully complete before db is allowed to run

Why does the stack appear broken on first up even though every service reports "running"?

web connects during the window after db's container starts but before Postgres is listening, and crashes
The two services were placed on different project networks and so cannot resolve each other's names
The driftwood-db-data named volume silently failed to mount on boot, leaving Postgres with no writable data directory to initialize
The db image only partially pulled from the registry, so the container runs without Postgres installed

Why must db have a healthcheck for condition: service_healthy to work?

The condition gates on the dependency's reported health, so without a check there is no health status to gate on
Compose synthesizes a default pg_isready probe for Postgres, so the explicit check is only documentation
The healthcheck is what registers db in the project network's embedded DNS, so without it web cannot resolve the hostname at all
Without a healthcheck defined Compose simply refuses to build or pull the db image at all

Why is app-side connection retry still needed once health gating is in place?

Health gating covers cold boot only; a mid-life db restart drops web's connection and only app retry survives it
The healthcheck re-runs and re-gates web automatically on every later db restart, making retry redundant
Retry makes the initial cold boot of the stack measurably faster than waiting for the healthcheck
App retry fully replaces the healthcheck on cold boot as well, so the db service's healthcheck block can simply be deleted

You got correct