Read-Only Filesystems and no-new-privileges
A default container has a writable layer (Chapter 6), so a process that lands inside can drop a binary, rewrite a config, or persist a backdoor that survives a restart. Running with --read-only makes the root filesystem immutable and forces you to declare exactly which directories actually need to be writable, mounting those as tmpfs. Paired with --security-opt no-new-privileges, which blocks setuid escalation, driftwood/web becomes a process that can neither modify itself nor gain privileges it didn't start with.
These two flags close the last common in-container paths. The read-only rootfs stops persistence and tampering; no-new-privileges stops a non-root process from climbing back up through a setuid binary. Together with the non-root user, the dropped capabilities, and the seccomp and LSM profiles from the earlier topics, a compromise inside driftwood/web has nowhere to write, nothing to escalate, and very little it can do.
--read-only--tmpfs /tmpno-new-privilegesThe Writable Layer Is the Problem
Every container gets a thin read-write layer stacked over the read-only image layers (Chapter 6). That layer is convenient — it is where an app writes temp files and logs — and it is also where an attacker writes. Malware persists by dropping a file there; a tampered binary survives because the write sticks for the life of the container. Immutability removes that surface: if the layer can't be written, none of it can happen.
--read-only Forces Immutability
The --read-only flag mounts the entire root filesystem read-only, so the application code and the OS files inside the image cannot be altered at run time. The container can read its image but not rewrite it. A useful side effect is that the flag surfaces every place the app secretly expected to write — the moment it tries, you get an error, and now you know about a write path you didn't know existed.
tmpfs for Writable Scratch
Apps still need some writable paths — /tmp, a cache directory, a pid file. The answer is not to drop --read-only but to declare those paths explicitly with --tmpfs (or --mount type=tmpfs), which gives each one an in-memory, ephemeral filesystem that vanishes when the container stops. Writes are allowed exactly where you declared them and nowhere durable, so the app works while the rootfs stays immutable.
docker run -d --name web \ --user app \ --cap-drop=ALL \ --read-only \ --tmpfs /tmp \ --security-opt no-new-privileges \ -p 8000:8000 \ driftwood/web
This is the accumulated hardening so far on one line: the non-root app user from topic 60, the dropped capabilities from topic 61, the read-only rootfs with a tmpfs for /tmp, and no-new-privileges. A process that lands inside has no root, no capabilities, no writable rootfs, and no way to escalate.
no-new-privileges Stops setuid Escalation
--security-opt no-new-privileges sets the kernel flag that prevents a process from gaining privileges through setuid or setgid binaries. Even if a setuid-root binary exists somewhere in the image — sudo, ping, or one that slipped in with a base image — it cannot be used to escalate. That closes a classic in-container privilege-escalation path: a compromised non-root process finds a setuid-root binary and rides it up to root. With the flag set, the ride goes nowhere.
Driftwood, Hardened
Stack everything this chapter has added and driftwood/web runs --read-only with a tmpfs for /tmp, --security-opt no-new-privileges, as the non-root app user, with --cap-drop=ALL. A process that compromises the app has no root, no capabilities, no writable root filesystem, and no path to escalate. It is not invulnerable — it can still abuse the network or try to read data the app legitimately reads — but the easy wins are gone.
A read-only rootfs is one layer, not the whole answer. It stops persistence and tampering; it does not stop a process from exfiltrating data over a connection the app is allowed to make, and it does not protect a writable volume mounted into the container. Treat it as one more independent control in the defense-in-depth stack, valuable precisely because it does not depend on the others holding.
- Adding
--read-onlywithout mounting atmpfsfor the paths the app writes — the container starts, then crashes the first time it writes a temp file or a pid; declare the writable directories explicitly. - Assuming
--read-onlyprotects mounted volumes too — named volumes and bind mounts stay writable unless you mount them:rothemselves; the flag covers the container's own layers, not your data mounts. - Skipping
no-new-privilegesand leaving setuid binaries in the image — a compromised non-root process can still escalate through a setuid-root binary that slipped into the base image. - Treating a read-only rootfs as the whole answer — it stops persistence and tampering, but a process can still exfiltrate data or abuse the network; it is one layer in the stack, not the stack itself.
- Run hardened services with
--read-onlyand mount only the specific scratch paths the app needs astmpfs, so writes are ephemeral and contained. - Add
--security-opt no-new-privilegesto every container that doesn't legitimately rely on setuid, closing the setuid escalation path for free. - Mount data volumes
:rowherever the container only needs to read them, rather than leaving every mount writable by default. - Use the read-only run as a design test — every path the app tried to write reveals undeclared state that probably belongs in a volume or a
tmpfsanyway.
readOnlyRootFilesystem and allowPrivilegeEscalation: false in a security context
Podman honors the same --read-only, --tmpfs, and no-new-privileges flags
distroless · minimal base images pair naturally with an ephemeral-root pattern (Chapter 1, topic 12)
Knowledge Check
What problem does --read-only solve that the other controls don't?
- It makes the writable layer immutable, so a compromised process can't drop a binary, rewrite a config, or persist
- It drops all of the container's remaining Linux capabilities at once with a single convenient runtime flag
- It runs the main process as a dedicated non-root user without needing any USER instruction in the Dockerfile
- It encrypts the container's image layers at rest so they can't be read straight off the host's disk by an attacker
Why mount a tmpfs alongside --read-only instead of just dropping the flag?
- Apps still need a few writable paths; a tmpfs gives those an ephemeral in-memory filesystem while the rootfs stays locked
- A tmpfs is exactly where you store the app's durable data so that it reliably survives container restarts and full host reboots
- A tmpfs is required because read-only filesystems otherwise make every disk read noticeably slower under load
- A tmpfs re-enables writing across the entire root filesystem, quietly undoing the read-only flag everywhere
What does --security-opt no-new-privileges block?
- A process gaining privileges through a setuid or setgid binary, closing the non-root escalation path
- The container from opening any outbound network connections to other hosts or services on the network
- The container from acquiring any new Linux capabilities at build time, beyond the curated default set
- All writes to the entire root filesystem, which makes the separate
--read-onlyflag completely redundant
Does --read-only protect a named volume mounted into the container?
- No — the flag covers the container's own layers; volumes and bind mounts stay writable unless mounted
:ro - Yes — it makes every single mounted path read-only too, including all named data volumes and bind mounts alike
- No — because
--read-onlyblocks the runtime from mounting any volume into the container at all - Yes — but only for in-memory tmpfs mounts, never for persistent named volumes or bind mounts
You got correct