Healthchecks
A running container is not the same as a working one. driftwood/web can be up, listening on its port, and still returning 500s because its database connection died mid-shift. The process is alive; the service is not. A HEALTHCHECK is a command Docker runs inside the container on an interval to answer the only question that matters — "is this actually serving?" — and turns the answer into a health status that other tools can act on.
On a single host, though, Docker reports the status and does almost nothing about it on its own. It will flip a container to unhealthy and leave it running, broken, until something else intervenes. That gap — between knowing a container is sick and doing something about it — is the whole point of this topic. The healthcheck is an input to a restart policy and to Compose's startup ordering, not a self-healing mechanism by itself.
Defining the Check
You declare a healthcheck in the Dockerfile with HEALTHCHECK, or in Compose under a healthcheck: key. The command runs inside the container, against its own service — for driftwood/web that is usually a request to a route that exercises the real dependency path. Exit code 0 means healthy; 1 means unhealthy. A typical definition probes a /health endpoint and curls its own port:
HEALTHCHECK --interval=15s --timeout=3s --start-period=30s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1
The -f flag makes curl exit non-zero on an HTTP error status, so a 500 from /health registers as a failed probe rather than a success that happened to return error text. The command runs in the container's own namespaces, which is why it targets localhost — it is talking to the same process it is checking.
The Three Statuses
A container with a healthcheck moves through three states. During the start-period grace window it is starting, and failures don't count against it. After that it is healthy or unhealthy depending on the most recent probes. docker ps shows the status in parentheses next to the container — Up 4 minutes (healthy) — so a quick listing tells you which containers are actually serving. For the detail of why a check fails, docker inspect exposes the recent probe output:
docker inspect --format '{{json .State.Health}}' web | jq
That returns the current status, the failing-streak counter, and a rolling log of the last few probe invocations with their exit codes and stdout — enough to see that the check is timing out, or that /health is returning a 503 because the database is down.
What Docker Does — and Doesn't — Do on a Single Host
This is the part that surprises people. By itself, the daemon flips the status to unhealthy and stops there. It does not restart the container, replace it, or stop it on a health failure. The restart policy (Chapter 3, topic 17) reacts only to the process exiting — and an unhealthy container whose process is still running has not exited. Only an orchestrator acts on unhealthy directly. So on one host, a healthcheck that goes red and a container that keeps running side by side is the default, not a bug.
The single-host operator has to bridge that gap deliberately: combine the healthcheck with something that actually exits or restarts on failure. The status is a signal; you still have to wire up the response.
Tuning Interval, Timeout, Retries, Start-Period
Four knobs shape the check. interval sets how often it runs; timeout caps how long a single probe may take before it counts as a failure; retries sets how many consecutive failures flip the container to unhealthy; and start-period is a grace window in which failures don't count yet. The two failure modes are opposite mistakes. A start-period that is too short marks a slow-booting driftwood/web unhealthy before it has finished running its database migrations — the image looks broken when the timing was simply wrong. A timeout longer than the interval lets probes overlap and pile up. And too long an interval means a service that died seconds ago still shows healthy for minutes.
Tie to Compose depends_on
Compose's depends_on: condition: service_healthy (Chapter 8, topic 48) holds web from starting until db reports healthy, turning the healthcheck into an ordering gate. Without it, depends_on waits only for the container to start — not to be ready — and web races ahead of a Postgres still opening its socket, failing on first boot with a connection-refused error. The healthcheck on db is what makes "wait until it's actually accepting connections" expressible instead of "wait until the process launched."
Tie to Restart Policy
Pairing a healthcheck with an app that exits on persistent, unrecoverable failure lets the restart policy (Chapter 3, topic 17) recycle the container. The common single-host pattern is three parts working together: the healthcheck reports status, the app exits on fatal failure, and restart: unless-stopped brings it back from the exit. Docker will not connect "unhealthy" to "restart" for you — but "process exited" to "restart" is exactly what the restart policy does, so the trick is to make a fatal health condition cause the process to exit.
- Setting
start-periodtoo short for a container that runs database migrations or warms a cache on boot —driftwood/webis flagged unhealthy (and killed, under an orchestrator) before it ever finished starting, which looks like a broken image when the timing was the only problem. - Assuming Docker restarts an
unhealthycontainer on a single host — it does not; the status changes and nothing acts on it unless an orchestrator or a deliberate restart-on-exit pattern is in place. - Writing a healthcheck that only confirms the port is open, not that the app works — a process stuck in a deadlock still accepts the TCP connection, so an
nc -zstyle check reports healthy while every real request hangs. - Giving the check a
timeoutlonger than theinterval, so probes overlap and pile up, or running an expensive query every few seconds that adds real load todriftwood/db. - Relying on
depends_onwithoutcondition: service_healthyto order startup — Compose waits only fordbto start, sowebconnects before Postgres is accepting connections and fails on first boot.
- Probe an endpoint that exercises the real dependency path — a
/healthroute that touches the database — not just the listening port, sohealthymeans actually serving. - Set
start-periodto comfortably cover the container's slowest legitimate boot, including migrations and cache warm, so startup failures aren't counted as health failures. - Gate dependent services with Compose
depends_on: condition: service_healthysowebwaits fordbto report healthy, not merely to start. - Pair the healthcheck with a restart policy and an app that exits on fatal, unrecoverable failure, since Docker on one host will not restart an
unhealthycontainer for you.
HEALTHCHECK directive
Healthchecks.io · Blackbox Exporter probe from outside the host rather than inside the container
Knowledge Check
On a single Docker host with no orchestrator, what does the daemon do when a container's healthcheck flips it to unhealthy?
- It records the
unhealthystatus and otherwise leaves the container running — it does not restart, stop, or replace it - It immediately restarts the container in place to try to recover it, the way an orchestrator would
- It stops and removes the unhealthy container automatically so that a fresh replacement can be scheduled and started in its place
- It triggers the configured
restartpolicy, which reacts directly to the reported health status
What does the start-period setting control, and what goes wrong if it is too short?
- It is a grace window where failing probes don't count; too short, and a slow-booting container is flagged unhealthy before it finishes starting
- It sets how often the probe runs between successive checks; too short, and the probe runs almost constantly and adds measurable, continuous load on the host
- It caps how long a single probe invocation may take before it is killed; too short, and slow-but-valid probes are counted as failures
- It sets how many consecutive failed probes flip the container to unhealthy; too short, and a single transient blip marks it unhealthy
Why does a healthcheck that only checks whether the port is open give a false sense of health?
- A deadlocked process still accepts the TCP connection, so the check reports healthy while real requests hang
- A port-open check is too expensive to run and adds enough load on each interval to make the container itself appear unhealthy
- Docker rejects port-only checks as invalid and forces the container to stay in the
startingstate forever - A port check always runs from outside the container's namespace and therefore cannot reach the app's real endpoint
In Compose, why does depends_on need condition: service_healthy to correctly order web after db?
- Plain
depends_onwaits only fordbto start;service_healthywaits for its healthcheck to pass, sowebstarts after Postgres accepts connections - Plain
depends_onalready waits fordbto be fully ready and accepting client connections first, so adding the explicit condition here is purely cosmetic - It controls shutdown order on teardown, ensuring
webalways stops cleanly beforedbgoes down - It makes Compose restart
dbautomatically whenever its healthcheck reportsunhealthyat runtime
You got correct