Debugging Containers
A container that won't stay up gives you more to work with than it looks. The exit code says how it died, docker logs says what it said on the way out, and docker inspect .State says whether the kernel killed it or it crashed on its own. Three artifacts, three different questions, and together they classify almost any failure before you touch the application code.
docker logsinspect .Stateexec / docker debugFor a container that is running you exec a shell in and look around. For a distroless or scratch driftwood/web with no shell to exec (Chapter 2, topic 12), docker debug or an ephemeral debug container attaches the tools the image deliberately left out — so you keep the production image lean and still get a way in when it misbehaves.
Why It Exited — Start With the Exit Code
docker ps -a and docker inspect give the exit code, and the code narrows the cause before you read a single log line. The common ones each tell a story:
docker inspect --format '{{.State.ExitCode}} oom={{.State.OOMKilled}}' web
0 is a clean exit; a non-zero in the app's own range is the application's failure code; 137 is SIGKILL — usually the OOM killer or a docker stop that timed out, so cross-check .State.OOMKilled; 143 is SIGTERM (a clean stop signal); 139 is a segfault. The number alone often tells you whether to read the logs or the memory limit.
docker logs and .State
docker logs <container> replays the captured STDOUT and STDERR — driver permitting (topic 66) — and that output usually carries the stack trace or the fatal message. docker inspect .State adds the context the logs lack: Status, Error, OOMKilled, StartedAt and FinishedAt, and RestartCount. Read together, they distinguish "the app threw an exception" from "the kernel killed it" from "it never started at all" — three failures that can look identical from the outside.
Crash Loops
A container with restart: always or unless-stopped that keeps dying shows a climbing RestartCount and a Restarting status. The trouble is that the logs scroll past fast and the container is never sitting still when you look. The way to read a loop that won't hold still is to grab the previous run's docker logs plus docker events (topic 68) capturing the repeating die exit code — the failure reason survives each restart that way instead of vanishing.
exec Into a Running Container
docker exec -it <container> sh (or bash) opens a shell in a running container to inspect its filesystem, environment, and live processes:
docker exec -it web sh
It does nothing for a container that already exited — there is no process to attach to — and it depends on the image actually shipping a shell, which a slim or distroless image may not. When exec errors with "executable file not found," the image has no sh, and you need the next approach.
Distroless and No Shell — docker debug and Ephemeral Debug Containers
When driftwood/web is distroless or built FROM scratch (Chapter 2, topic 12), there is no sh to exec into — that is the point of those base images. docker debug <container> attaches a temporary toolbox image into the running container's namespaces, giving you a shell and utilities without rebuilding or bloating the production image. The older, runtime-agnostic pattern does the same by joining the target's namespaces directly:
docker run -it --rm \ --pid=container:web \ --network=container:web \ nicolaka/netshoot sh
The debug container brings its own shell and tools but shares the target's process and network view, so you inspect the shell-less container's processes and connections from inside a container that does have a shell.
Failed Builds
A build that fails leaves the last successful layer cached, which is the fastest way back in. docker build shows which instruction failed and its output; BuildKit's --progress=plain un-collapses the log so you see the full output of the failing step rather than the folded summary. From there, running an interactive container from the last good intermediate image lets you reproduce the failing RUN step by hand — faster and more precise than editing the Dockerfile and rebuilding from scratch each time (ties to Chapters 4 and 5).
- Trying to
docker execinto a container that already exited —execworks only on a running container; a crashed one needsdocker logs,inspect, and possibly a re-run with an overridden entrypoint to get a shell before the failing command runs. - Reading exit code 137 as a generic crash and chasing the app — it is SIGKILL, and
.State.OOMKilled: trueshows the kernel killed it for exceeding its memory limit (Chapter 1, topic 03), a config problem, not a code bug. - Restarting a crash-looping container repeatedly and losing the failure output each time — without grabbing the previous run's
docker logsor watchingdocker events, the exit reason scrolls away on every restart. - Assuming you can
execa shell into a distroless orscratchimage to debug it — there is no shell or coreutils inside;docker debugor an ephemeral debug container is the only way in (Chapter 2, topic 12). - Debugging a failed build by repeatedly editing the Dockerfile and rebuilding from scratch — the cache already holds the last good layer, so running a container from that intermediate and reproducing the failing step by hand is faster and more precise.
- Read the exit code first (
.State.ExitCodeand.State.OOMKilled) to classify the failure — clean exit, app error, OOM kill, or signal — before diving into the logs. - Capture a crash-looping container's output with
docker logsand watchdocker eventsso the failure reason survives each restart instead of scrolling past. - Use
docker debug, or an ephemeral debug container joining the target's namespaces, for distroless andscratchimages — keeping the production image shell-free while still being debuggable (Chapter 2, topic 12). - Debug failed builds from the last cached intermediate image with
--progress=plainrather than rebuilding blind, reproducing the failingRUNstep interactively.
logs, inspect, and exec
cdebug attaches a toolbox to a running container
dive debugs image bloat from an oversized build rather than a runtime crash
Knowledge Check
A container exits with code 137 and .State.OOMKilled is true. What does that tell you?
- The kernel killed it with SIGKILL for exceeding its memory limit — a config problem, not an app bug
- The application exited cleanly on its own and 137 is simply its normal success code
- It received a SIGTERM from
docker stopand then proceeded to shut itself down gracefully - The application segfaulted on a bad pointer dereference, which is what exit code 137 always indicates
Why does docker exec -it web sh fail on a crashed container?
execattaches to a running container; an exited one has no live process to attach to- The container's network namespace was torn down on exit, so
execcan no longer reach it - The container's restart policy must be removed first before
execwill agree to connect - The configured log driver must be
json-fileforexecto be able to open a shell
driftwood/web is a distroless image with no shell and it is misbehaving while running. How do you get a shell to inspect it?
- Use
docker debugor an ephemeral debug container that joins the target's namespaces and brings its own shell - Run
docker exec -it web shdirectly — every distroless image still ships a minimal busybox shell for exactly this - Rebuild the production image with a shell layer added so you can
execinto the running container - Read
docker logs, which conveniently opens an interactive shell into the container's live filesystem
A Dockerfile build fails at a RUN step. What is the fastest way to reproduce and fix it?
- Run an interactive container from the last cached intermediate image and rerun the failing step by hand
- Delete the entire build cache and rebuild the image from scratch after every single edit until it finally passes
- Switch the daemon's log driver over to
json-fileso that the full build output is captured to disk docker execinto the half-built final image and rerun the failing step interactively there
You got correct