Stats, Events, and Inspection
When driftwood/web is slow or restarting, three built-in tools answer different questions without any external monitoring stack. docker stats shows live resource use, docker events streams what the daemon is doing as it happens, and docker inspect dumps the full runtime state of a container — including why it stopped. Add docker top for the process list inside a container, and you have enough to diagnose an OOM kill or a restart loop on one host before reaching for Prometheus.
The discipline these four enforce is to stop guessing. A container that keeps dying has a cause that is observable: an exit code, a memory column climbing toward a limit, a repeating event cycle. The mistake is to read the application code looking for a bug when the kernel killed the process for exceeding its memory cap. These commands point you at the real cause before you waste an hour on the wrong one.
docker statsdocker eventsstart, die, oom, and health_status as they happen.docker inspect.State.OOMKilled, .State.ExitCode, and the rest of a container's runtime config.docker stats — Live Resource Use
docker stats streams a live table — per-container CPU percentage, memory usage against its limit, network I/O, and block I/O — updating about once a second. It is the fast way to see driftwood/web pinned at 100% CPU, or sitting at the edge of its --memory limit moments before an OOM kill. Run it on a single container to watch one closely:
docker stats web
The output refreshes in place, so you watch the numbers move rather than reading a snapshot. A container climbing steadily is telling you something the first line alone would not.
Reading Memory Against the Limit
The memory column shows usage and the cgroup limit side by side — 480MiB / 512MiB. A container creeping toward its limit is about to be OOM-killed by the kernel (Chapter 1, topic 03), and stats makes that approach visible before it happens rather than after. If the number is parked just under the cap and the container periodically dies, you have found the cause without reading a line of application code: the memory limit is too low for the workload, or the workload leaks.
docker events — The Daemon Event Stream
docker events prints a live feed of daemon-level activity: container start, die (with its exit code), oom, kill, health_status changes, image pulls, volume mounts. Watching it while a container crash-loops shows the exact die→start cycle and the exit code each time. Because it is a live stream, a past event is gone unless you ask for a window:
docker events --since 30m --until now \ --filter container=web
--since and --until turn the stream into a query over history, which is how you recover the die that already happened. Without them, events shows only what occurs from the moment you run it forward.
docker inspect — Full Runtime State
docker inspect returns a container's complete config and live state as JSON: .State.Status, .State.ExitCode, .State.OOMKilled, .State.Health, .State.RestartCount, plus its mounts, networks, and environment. Reading the whole blob is rarely what you want; --format with a Go template pulls out the one or two fields that settle the question:
docker inspect \
--format '{{.State.OOMKilled}} {{.State.ExitCode}} {{.State.RestartCount}}' \
web
That one line tells you whether the kernel killed it, what code it exited with, and how many times it has restarted — the difference between "the app threw an exception" and "the kernel reaped it for exceeding its limit."
docker top — Processes Inside a Container
docker top web lists the processes running inside a container, using the host's ps against the container's namespace. It confirms whether PID 1 is the expected gunicorn and not a stray shell, and whether a zombie or an unexpected child process has appeared. It is a direct view of the fact that a container is just host processes (Chapter 1, topic 03), mapped through the container's PID namespace so you never mis-attribute a process to the wrong container the way filtering the host's ps by guesswork would.
Spotting OOM and Restart Loops
The four tools converge on one diagnosis. docker ps shows a high restart count or a Restarting status. inspect reports OOMKilled: true or a non-zero ExitCode. events shows the repeating oom→die→start cycle with the exit code each time. And stats confirms memory pressure by showing usage pinned at the cgroup limit. Four views, one cause — and once you have seen them line up, an OOM kill stops looking like a mysterious crash and becomes a memory-limit configuration you can fix.
- Reading
docker statsonce and treating the first sample as steady state — the initial CPU sample is often skewed; the stream needs a few seconds to settle, and a single reading misleads. - Ignoring
.State.OOMKilledand chasing an application bug when the kernel killed the container for exceeding its memory limit — the exit looks like a crash, but the cause is the cgroup cap, not the code. - Forgetting that
docker eventsis a live stream and missing past events — without--sinceand--untilit shows only what happens from now on, so thediethat already occurred is gone unless you query a time window. - Running
docker statsacross hundreds of containers as a monitoring strategy — it is an interactive diagnostic, not a metrics pipeline, and gives no history, alerting, or aggregation. - Confusing
docker topwith filtering the host'spsby guesswork —topmaps the container's namespace directly, so it never mis-attributes a process to the wrong container the way a host-level filter can.
- Reach for
docker statsfirst when a container is slow or restarting, to see CPU and memory-against-limit live before instrumenting anything. - Check
docker inspect --format '{{.State.OOMKilled}} {{.State.ExitCode}}'on any container that exited unexpectedly, so an OOM kill is not mistaken for an application crash. - Tail
docker eventswhile reproducing a crash loop to capture the exactdieexit codes and theoomandhealth_statustransitions in order. - Treat these as single-host diagnostics and graduate to a metrics stack — cAdvisor with Prometheus, or Kubernetes-native monitoring (Chapter 12, topic 76) — once you need history, alerting, or fleet-wide views.
docker stats for history and alerting
ctop a top-like TUI over the same per-container data
nerdctl · Podman mirror stats, events, inspect, and top
Knowledge Check
A container keeps exiting and you suspect the kernel killed it. Which check distinguishes an OOM kill from an application crash?
- Read
.State.OOMKilledviadocker inspect—truemeans the kernel killed it for exceeding its memory limit - Read
docker logs— an OOM kill always prints a distinctive Python traceback the application cannot suppress - Run
docker top— an OOM kill always leaves a telltale zombie process still visible in the listing - Read
.State.RestartCount— any non-zero restart count means the kernel was the one that OOM-killed it
Why does docker events often show nothing relevant when you run it after a container has already crashed?
- It is a live stream that shows only events from the moment you run it; past events need a
--since/--untilwindow - It only reports events for containers that are currently running, so a container that has already crashed is excluded entirely
- The logging driver must be set to
json-filebeforeeventshas any container activity to show - It shows only broad daemon-wide events and never per-container lifecycle ones like
die
What does the memory column in docker stats let you see before an OOM kill happens?
- Usage shown against the container's cgroup limit, so you watch it climb toward the cap that will trigger the kill
- Only the total host RAM in use, which on its own tells you nothing about any individual container's risk of being killed
- A predicted future timestamp for the exact moment the kernel will issue the OOM kill
- The
OOMKilledflag itself, which flips totruea few seconds before the process actually dies
Why is docker stats the wrong tool to run across hundreds of containers as a monitoring strategy?
- It is an interactive diagnostic with no history, alerting, or aggregation — fleet monitoring needs a metrics pipeline
- It hard-refuses to run on more than a small fixed number of containers at once, strictly capping how much you can watch live
- It depends on the configured logging driver and silently breaks whenever the logs are shipped to a remote backend
- It restarts each container in turn to sample its resource use, disrupting the running workload every time
You got correct