Chapter 3: Running Containers
Topic 15

Processes, PID 1, and Signals

SignalsFootgun

The single most surprising thing about your first real container is that docker stop seems to hang for ten seconds and then your app dies hard — no clean shutdown, no flushed connections, no closed transactions. The cause is almost always that your app is running as a child of a shell, the shell is PID 1, and the shell does not forward SIGTERM. Inside a container your process is PID 1, the kernel gives PID 1 special signal handling, and getting this wrong means every restart is effectively a kill -9.

This is the chapter's footgun, and it is silent. There is no error, no warning, no failed health check — just an app that takes ten seconds to stop and loses whatever it was holding when it does. The fix is one line in a Dockerfile or one flag on docker run, but only once you understand why PID 1 behaves the way it does.

Your Process Is PID 1

Inside the container, the main process runs as PID 1 — the role the init system (systemd, init) holds on a normal host. The kernel treats PID 1 specially: it ignores any signal that has no explicit handler installed. A SIGTERM sent to a PID 1 that never registered a SIGTERM handler is simply dropped, and the process keeps running as if nothing happened. On a normal host you never see this because PID 1 is an init system written precisely to handle signals; in a container, PID 1 is your app, which usually was not.

Shell-Form vs Exec-Form, the Footgun

How you write CMD or ENTRYPOINT decides what becomes PID 1. The exec form, CMD ["gunicorn", "app:app"], runs the binary directly, so gunicorn is PID 1 and receives signals straight from docker stop. The shell form, CMD gunicorn app:app written as a bare string, runs /bin/sh -c "gunicorn app:app" — now sh is PID 1, gunicorn is its child, and sh does not forward SIGTERM to its children. docker stop sends SIGTERM, sh ignores it, the 10-second timer expires, and the daemon SIGKILLs the whole process tree. This one distinction is why the Driftwood web container must use exec form.

Where SIGTERM goes, depending on how you wrote CMD
Exec form — CMD ["gunicorn","app:app"]
gunicorn is PID 1. docker stop → SIGTERM → gunicorn → clean shutdown, connections drained, exit before the timer.
Shell form — CMD gunicorn app:app
/bin/sh is PID 1, gunicorn is its child. SIGTERM hits sh, which drops it; 10s later the daemon SIGKILLs everything.

Why docker stop Waits 10 Seconds Then SIGKILLs

docker stop sends SIGTERM and starts a 10-second timer, tunable with --timeout (or --stop-timeout at run time). If the process is still alive when the timer expires, the daemon sends SIGKILL, which the kernel cannot be told to ignore — no process, PID 1 or not, can catch or block signal 9. So the 10-second pause is the grace window your app gets only if PID 1 actually receives and acts on SIGTERM. If PID 1 ignores it, the 10 seconds are pure dead time before a forced kill.

The difference is visible from the outside
# Exec form: gunicorn is PID 1, handles SIGTERM, exits in ~1s
$ time docker stop driftwood-web
driftwood-web
real    0m1.143s

# Shell form: sh is PID 1, drops SIGTERM, daemon waits the full 10s then SIGKILLs
$ time docker stop driftwood-web
driftwood-web
real    0m10.402s          # the tell-tale ~10s hang on every stop

Zombie Reaping

A real init system reaps dead children — it calls wait() on processes that have exited so the kernel can release their entries from the process table. A typical app running as PID 1 does not, because it was never written to. So if the container spawns children that exit — a process pool that recycles workers, a shell wrapper that forks — their dead entries accumulate as zombies, because no one reaps them. Over a long-lived container this slowly exhausts the process table until new workers cannot spawn.

The --init Flag

docker run --init injects a tiny init process — tini — as PID 1, which forwards signals to your app and reaps zombies on its behalf. You get correct SIGTERM delivery and reaping without making the app itself init-aware. The decision is simple: if the app can be PID 1 and handle its own signals, use exec form and let it be PID 1; if it genuinely cannot — it spawns children it does not reap, or it cannot handle signals — add --init (or init: true in Compose) and let tini do the init job.

Exec Form vs Shell Form
  • Exec formCMD ["gunicorn", "app:app"] (JSON array) runs the binary directly as PID 1, so it receives SIGTERM and shuts down cleanly within the grace window. This is what you want for the main process of any long-lived container.
  • Shell formCMD gunicorn app:app (bare string) wraps the command in /bin/sh -c, making sh PID 1; sh swallows SIGTERM, so docker stop hangs the full 10 seconds then SIGKILLs. Use it only when you genuinely need shell features — variable expansion, pipes, globbing — and then add --init, or end the script with exec, so signals still reach the app.
Common Mistakes
  • Writing the Driftwood web entrypoint in shell form and wondering why every deploy takes 10 seconds and Postgres connections aren't closed cleanly — sh is PID 1 and never forwards the SIGTERM, so gunicorn is SIGKILLed on every single restart.
  • Wrapping the app in a startup shell script that ends with gunicorn … instead of exec gunicorn … — without exec the script stays PID 1 and the app is a child that never sees signals; the one-word fix is exec.
  • Running a container that forks worker processes with no init and no reaping — exited workers become zombies that pile up until the container's process table is exhausted and new workers can't spawn.
  • Assuming docker stop killed the app cleanly when PID 1 ignored the SIGTERM — the app was actually SIGKILLed after the 10-second timer, so any in-flight request, open transaction, or buffered write was lost with no chance to flush.
  • Lowering --timeout to 1–2 seconds to "speed up" deploys without checking the app's real shutdown time — you turn a clean drain into a forced kill for any request that takes longer than the shortened window.
Best Practices
  • Write ENTRYPOINT and CMD in exec form — the JSON-array syntax — so the application binary is PID 1 and receives SIGTERM directly.
  • End any entrypoint wrapper script with exec "$@" (or exec the-app) so the app replaces the shell as PID 1 instead of running as its child that never sees signals.
  • Add docker run --init (or init: true in Compose) when the process legitimately spawns children or can't handle signals, so a real init reaps zombies and forwards signals for it.
  • Handle SIGTERM in the app to drain in-flight work, and set --stop-timeout/--timeout to slightly more than the app's real worst-case shutdown so docker stop always lets it finish.
Comparable tools tini the init Docker bundles for --init dumb-init · s6-overlay alternative container inits Podman exposes the same --init and PID-1 semantics — it's kernel behavior, not a Docker invention Kubernetes relies on the same exec-form-and-SIGTERM contract for its pod termination grace period (Ch12)

Knowledge Check

Why does a SIGTERM to a PID 1 that installed no handler get dropped?

  • The kernel ignores any signal to PID 1 with no explicit handler — PID 1 is special
  • Docker's daemon strips SIGTERM out of the signal stream and only ever delivers SIGKILL to containers
  • The PID namespace boundary blocks SIGTERM from crossing in from the host into the container at all
  • The 10-second grace timer consumes and absorbs the SIGTERM before the process can ever see it

Which delivers SIGTERM to gunicorn, and why?

  • Exec form (CMD ["gunicorn","app:app"]) — gunicorn is PID 1; shell form makes sh PID 1, which doesn't forward signals
  • Shell form — wrapping the command in /bin/sh -c ensures the shell faithfully relays every signal to its child
  • Both forms deliver SIGTERM to the application process identically — the form you choose only affects how the command's arguments are split and parsed
  • Neither form does — only docker run --init can deliver SIGTERM to the app regardless of the CMD form

What is actually happening during the 10-second hang on docker stop with a shell-form entrypoint?

  • SIGTERM was dropped, the grace timer ran out, and the daemon sent SIGKILL — a hard kill with no clean shutdown
  • The app is using the full 10 seconds to gracefully drain its in-flight requests before exiting cleanly
  • Docker is re-sending SIGTERM once every second until the PID 1 process finally acknowledges and acts on it
  • The container is flushing and committing its writable upper layer back into the underlying image on disk before it is finally allowed to exit

When is --init the right fix rather than just using exec form?

  • When the app spawns children it doesn't reap, or can't handle signals — tini reaps zombies for it
  • Always — every container without exception should run with --init regardless of what the app does
  • When you want the container to start measurably faster by skipping the /bin/sh wrapper process
  • When the app needs to allocate more memory than the container's default cgroup limit allows

You got correct