Chapter Eleven

Observability & Operations

Keeping one Docker host alive in production: bounding logs before they fill the disk, healthchecks that report but do not act on their own, the built-in stats-events-inspect tools that diagnose an OOM kill without a metrics stack, pruning the four things that silently eat disk, configuring the daemon, and debugging a container that won't stay up.

6 topics

Building and running driftwood/web is one job; keeping it running on a host that doesn't fall over is another. This chapter is the operations side of a single Docker host — the decisions you make once a container is meant to live for weeks instead of minutes. Logs that grow until the disk is full, a healthcheck that reports a dead service but does nothing about it, a host that quietly accumulates dangling images and stopped containers until /var/lib/docker is out of space, and a container that crash-loops at 3am and gives you exactly four built-in tools to find out why.

The through-line is that Docker on one host reports more than it acts. It captures your logs but won't rotate them unless you tell it to; it computes a health status but won't restart on it; it tracks disk usage but won't reclaim it on a schedule. Each topic here is a place where the default is "Docker tells you" and the operator's job is to wire up "and then something happens." Where the answer is genuinely fleet-scale — cluster-wide log collection, probes that act on failure, history and alerting — the chapter points at Kubernetes (Chapter 12, topic 76) rather than pretending a single host scales there.