Chapter Ten

Security

A default container is not a security boundary — it runs as root, with a broad capability set and a writable filesystem, on a kernel it shares with the host. This chapter starts from the threat model that explains why the kernel is the blast radius, then strips each default away in turn: non-root, dropped capabilities, seccomp and the LSMs, a read-only filesystem, secrets out of the image, and finally a daemon that no longer runs as root. Driftwood's web container is the worked example, hardened layer by layer.

7 topics

Every other chapter in this course made Docker do something. This one makes it do less — deliberately. The reason is the fact you have carried since Chapter 1: a container shares the host's kernel, so a process that breaks out of its namespaces is on the host, not in a sandbox. A container escape is host compromise. The default configuration that ships out of the box is tuned for convenience, and convenience is the opposite of safe.

Defense in depth is the whole strategy. No single control stops a determined attacker, so you stack independent ones — run as a non-root user, drop the capabilities the workload never uses, confine syscalls and file access with seccomp and an LSM, make the root filesystem read-only, keep secrets out of the image, and drop the daemon's own root with rootless mode. Each layer assumes the one before it failed. The thread tying them together is least privilege: the smallest set of permissions, capabilities, syscalls, and writable paths the container needs to do its job, and nothing more.

Topics in This Chapter

The Container Threat Model

The shared kernel is the blast radius, so a container escape is host compromise. Why a default container is not a security boundary, what "escape" means, and why the common ones are self-inflicted — docker.sock, --privileged, writable host mounts.

Threat modelShared kernel

Running as Non-Root

Containers run as root by default, and container root is the host's root. Adding USER app is the single highest-impact hardening you can do — plus the ports-below-1024 snag and why Driftwood listens on 8000 behind a proxy.

Non-rootLeast privilege

Linux Capabilities

The kernel splits root's powers into ~40 discrete capabilities, and Docker already drops most of them. Drop the rest with --cap-drop=ALL and add back only what the workload needs — and why --privileged is the footgun that grants them all.

CapabilitiesKernel

seccomp and AppArmor/SELinux

seccomp limits which syscalls a process can make; the LSMs limit which files it can touch. Docker's default seccomp profile blocks ~44 dangerous syscalls, and an LSM is already confining your containers — until someone goes unconfined to fix a bug.

Read-Only Filesystems and no-new-privileges

--read-only makes the root filesystem immutable so a compromised process can't persist; tmpfs handles the paths the app truly writes. Paired with no-new-privileges, which blocks setuid escalation, the container can't modify itself or escalate.

ImmutabilityRead-only

Secrets Handling

ENV and ARG values leak through docker inspect, docker history, and child processes. Secrets belong in tmpfs-backed files mounted at runtime — Docker/Compose secrets or an external manager — never baked into an image. Driftwood's DB password becomes a runtime secret.

Rootless Docker and User Namespaces

Every other control hardened the container; this one hardens the daemon. Rootless Docker runs dockerd as an unprivileged user, and user-namespace remapping maps container UID 0 to a high host UID — so even a full escape lands as nobody, not root.

RootlessUser namespace