Running as Non-Root
Containers run as root by default. The process inside is UID 0 unless the image author or a docker run flag says otherwise, and most images quietly leave it there. That root is not a sandboxed pretend-root — it is the host's UID 0, because the user namespace that would remap it is off unless you turn it on (topic 65). If a process escapes, or a writable bind mount crosses the boundary, it acts on the host as root.
Adding a USER app line to the Dockerfile so driftwood/web runs as a non-root UID is the single highest-impact hardening move in this chapter, and it costs almost nothing. One line, a created user, correct file ownership — and a compromise that would have been host root is now an unprivileged process with nothing to escalate to. Everything else in the chapter builds on a container that is already not root.
The Default Is Root
Without a USER instruction, a container's main process runs as UID 0. This is an identity decision the image author makes, not something Docker forces, and most images leave it at root because root sidesteps file-permission friction during the build — every COPY, every RUN, every package install just works without thinking about who owns what. The convenience is real at build time and a liability at run time.
The fix is to make the decision explicitly. The image author creates an unprivileged user and ends the final stage with USER, so the running process is that user rather than root. The build can still do its root-only work earlier in the Dockerfile; only the runtime identity changes.
Container Root Is Host Root
The point that makes this matter: the user namespace is not on by default — that is topic 65 — so UID 0 inside the container is UID 0 on the host. They are the same numeric identity, and the kernel treats them as the same. Inside the container that root is partly defanged by Docker's dropped capabilities (topic 61), but it is still root, and root has paths to the host that an unprivileged user does not.
Combine container root with a writable host bind mount and the container process is editing host files as root — no escape required, just a mount you configured. Combine it with a kernel escape and the process is a root shell on the host. Drop to a non-root user and both of those collapse: the bind mount writes as an unprivileged UID, and the escaped process has no root to wield.
USER appAdding a Non-Root User
In the Dockerfile, create a system user, set ownership on the files the app needs as you copy them, and end the final stage with USER app. The runtime process then drops to that unprivileged UID. For driftwood/web, the final stage runs gunicorn as app, not root, and the code is copied with --chown so app can read what it needs without a later permission failure.
# create an unprivileged system user RUN addgroup --system app \ && adduser --system --ingroup app --no-create-home app WORKDIR /app # copy the app in already owned by app, not root COPY --chown=app:app . /app # listen high; the proxy fronts it EXPOSE 8000 # drop to the non-root user for the runtime USER app CMD ["gunicorn", "--bind", "0.0.0.0:8000", "driftwood.wsgi"]
The addgroup/adduser lines and the --chown on COPY are the whole cost. The final USER app is what actually changes the runtime identity; everything above it still runs as root during the build, which is what you want — root builds the image, app runs it.
Why Most Official Images Still Default to Root
Many official images — including older defaults across the registry — still run as root, for the same convenience reason and to bind low ports during startup. Do not trust that default. Override it with a USER instruction in your own final stage, or, when you cannot rebuild an image, with --user at run time. Treat root-by-default as a bug in the upstream image to patch on your side, not a setting to accept because someone shipped it that way.
The Ports-Below-1024 Snag
A non-root process cannot bind ports under 1024 by default — that range is privileged, and the kernel reserves it for root. This is the one friction the change introduces, and it is why Driftwood's gunicorn listens on 8000 rather than 80, with the proxy from Chapter 4 fronting it and terminating on the low port itself. The proxy holds the privileged port; the app does not need it.
If a non-root process genuinely must bind a low port, the answer is not to run as root for the sake of one port. It is the NET_BIND_SERVICE capability (topic 61), which grants exactly that one power and nothing else. Listen high and let the proxy front it is the simpler pattern, and the one Driftwood uses; the capability is there for the cases where you can't.
- Leaving the container as root because "it's isolated anyway" — combined with a writable bind mount or an escape, that root is root on the host, and it is the difference between a contained incident and a full host compromise.
- Adding
USER appbut leaving files in the image owned by root with no read access forapp— the container starts, then fails on a permission error the first time it touches its own code; set ownership withCOPY --chown=app:app. - Trying to bind port 80 as the non-root
appuser and getting "permission denied" — non-root can't bind below 1024 withoutNET_BIND_SERVICE; listen on a high port like 8000 and let the proxy front it. - Using
--user 0or deleting theUSERline to "fix" a permissions problem — that reintroduces the exact risk you just removed; fix the file ownership instead of handing root back.
- End every Dockerfile with
USER app(a non-root UID) so the runtime process is unprivileged, and never rely on the upstream image's default identity. - Set file ownership in the image with
COPY --chown=app:appso the non-root user can read its code and write only where it must, rather than discovering permission errors at startup. - Listen on a high port as non-root and put the
proxyin front, rather than running as root just to bind 80 or 443. - Override a stubborn upstream image at run time with
--userwhen you can't rebuild it, treating root-by-default as a bug to patch rather than a setting to accept.
runAsNonRoot / runAsUser in a security context
distroless nonroot image variants ship a non-root user ready to use
Knowledge Check
Why is running as root inside a container dangerous, given the container is "isolated"?
- The user namespace is off by default, so container root is the host's root — a writable mount or escape acts on the host as root
- Container root is a sandboxed, fully namespaced pretend-root that can never touch a file or process outside the container
- Root processes inside a container are scheduled at a higher CPU priority by the kernel and starve every other process on the host
- Running as root bakes extra setuid binaries into the image, making it significantly larger and slower to pull each deploy
What makes USER app the highest-impact hardening control in this chapter?
- One line drops the runtime to an unprivileged UID, so a compromise lands as an unprivileged process instead of host root
- It replaces the shared host kernel with a private per-container kernel, so a kernel exploit can no longer cross to the host
- It automatically drops every Linux capability the container holds, leaving the process with an empty effective capability set
- It cuts container startup time noticeably because non-root processes skip privileged init and initialize faster
Why does Driftwood's gunicorn listen on port 8000 with the proxy in front?
- A non-root process can't bind ports below 1024, so the app listens high and the proxy terminates the low port
- Port 8000 sits above the privileged range and is simply faster for HTTP traffic than port 80 on modern Linux kernels
- Gunicorn cannot speak HTTP to browsers at all and relies on the front proxy to translate its protocol on the way out
- Binding port 80 requires running gunicorn directly as root, which is the recommended production setup
A container with USER app crashes at startup with a permission error reading its own code. What is the fix?
- Set file ownership in the image with
COPY --chown=app:appso the non-root user can read its files - Remove the
USER appline so the process runs as root again and the read error simply goes away - Add
--privilegedat runtime so the process gains enough power to override the file permissions - Mount the entire root filesystem read-only so the kernel skips the failing permission checks on the code
You got correct