Topic 60

Running as Non-Root

Non-rootLeast privilege

Containers run as root by default. The process inside is UID 0 unless the image author or a docker run flag says otherwise, and most images quietly leave it there. That root is not a sandboxed pretend-root — it is the host's UID 0, because the user namespace that would remap it is off unless you turn it on (topic 65). If a process escapes, or a writable bind mount crosses the boundary, it acts on the host as root.

Adding a USER app line to the Dockerfile so driftwood/web runs as a non-root UID is the single highest-impact hardening move in this chapter, and it costs almost nothing. One line, a created user, correct file ownership — and a compromise that would have been host root is now an unprivileged process with nothing to escalate to. Everything else in the chapter builds on a container that is already not root.

The Default Is Root

Without a USER instruction, a container's main process runs as UID 0. This is an identity decision the image author makes, not something Docker forces, and most images leave it at root because root sidesteps file-permission friction during the build — every COPY, every RUN, every package install just works without thinking about who owns what. The convenience is real at build time and a liability at run time.

The fix is to make the decision explicitly. The image author creates an unprivileged user and ends the final stage with USER, so the running process is that user rather than root. The build can still do its root-only work earlier in the Dockerfile; only the runtime identity changes.

Container Root Is Host Root

The point that makes this matter: the user namespace is not on by default — that is topic 65 — so UID 0 inside the container is UID 0 on the host. They are the same numeric identity, and the kernel treats them as the same. Inside the container that root is partly defanged by Docker's dropped capabilities (topic 61), but it is still root, and root has paths to the host that an unprivileged user does not.

Combine container root with a writable host bind mount and the container process is editing host files as root — no escape required, just a mount you configured. Combine it with a kernel escape and the process is a root shell on the host. Drop to a non-root user and both of those collapse: the bind mount writes as an unprivileged UID, and the escaped process has no root to wield.

The runtime identity decides the blast radius

Default root (UID 0)

With the user namespace off, container UID 0 is the host's UID 0. An escape, or a writable bind mount, acts on the host as root.

USER app

The runtime drops to an unprivileged UID. A compromise lands as an unprivileged process with nothing to escalate to — a limited blast radius.

Adding a Non-Root User

In the Dockerfile, create a system user, set ownership on the files the app needs as you copy them, and end the final stage with USER app. The runtime process then drops to that unprivileged UID. For driftwood/web, the final stage runs gunicorn as app, not root, and the code is copied with --chown so app can read what it needs without a later permission failure.

Dockerfile — driftwood/web final stage

# create an unprivileged system user
RUN addgroup --system app \
 && adduser --system --ingroup app --no-create-home app

WORKDIR /app
# copy the app in already owned by app, not root
COPY --chown=app:app . /app

# listen high; the proxy fronts it
EXPOSE 8000
# drop to the non-root user for the runtime
USER app
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "driftwood.wsgi"]

The addgroup/adduser lines and the --chown on COPY are the whole cost. The final USER app is what actually changes the runtime identity; everything above it still runs as root during the build, which is what you want — root builds the image, app runs it.

Why Most Official Images Still Default to Root

Many official images — including older defaults across the registry — still run as root, for the same convenience reason and to bind low ports during startup. Do not trust that default. Override it with a USER instruction in your own final stage, or, when you cannot rebuild an image, with --user at run time. Treat root-by-default as a bug in the upstream image to patch on your side, not a setting to accept because someone shipped it that way.

The Ports-Below-1024 Snag

A non-root process cannot bind ports under 1024 by default — that range is privileged, and the kernel reserves it for root. This is the one friction the change introduces, and it is why Driftwood's gunicorn listens on 8000 rather than 80, with the proxy from Chapter 4 fronting it and terminating on the low port itself. The proxy holds the privileged port; the app does not need it.

If a non-root process genuinely must bind a low port, the answer is not to run as root for the sake of one port. It is the NET_BIND_SERVICE capability (topic 61), which grants exactly that one power and nothing else. Listen high and let the proxy front it is the simpler pattern, and the one Driftwood uses; the capability is there for the cases where you can't.

Common Mistakes

Leaving the container as root because "it's isolated anyway" — combined with a writable bind mount or an escape, that root is root on the host, and it is the difference between a contained incident and a full host compromise.
Adding USER app but leaving files in the image owned by root with no read access for app — the container starts, then fails on a permission error the first time it touches its own code; set ownership with COPY --chown=app:app.
Trying to bind port 80 as the non-root app user and getting "permission denied" — non-root can't bind below 1024 without NET_BIND_SERVICE; listen on a high port like 8000 and let the proxy front it.
Using --user 0 or deleting the USER line to "fix" a permissions problem — that reintroduces the exact risk you just removed; fix the file ownership instead of handing root back.

Best Practices

End every Dockerfile with USER app (a non-root UID) so the runtime process is unprivileged, and never rely on the upstream image's default identity.
Set file ownership in the image with COPY --chown=app:app so the non-root user can read its code and write only where it must, rather than discovering permission errors at startup.
Listen on a high port as non-root and put the proxy in front, rather than running as root just to bind 80 or 443.
Override a stubborn upstream image at run time with --user when you can't rebuild it, treating root-by-default as a bug to patch rather than a setting to accept.

Comparable tools Podman maps the container to your host user by default, so "container root" is already your unprivileged UID Kubernetes expresses the same control as runAsNonRoot / runAsUser in a security context distroless nonroot image variants ship a non-root user ready to use

Knowledge Check

Why is running as root inside a container dangerous, given the container is "isolated"?

The user namespace is off by default, so container root is the host's root — a writable mount or escape acts on the host as root
Container root is a sandboxed, fully namespaced pretend-root that can never touch a file or process outside the container
Root processes inside a container are scheduled at a higher CPU priority by the kernel and starve every other process on the host
Running as root bakes extra setuid binaries into the image, making it significantly larger and slower to pull each deploy

What makes USER app the highest-impact hardening control in this chapter?

One line drops the runtime to an unprivileged UID, so a compromise lands as an unprivileged process instead of host root
It replaces the shared host kernel with a private per-container kernel, so a kernel exploit can no longer cross to the host
It automatically drops every Linux capability the container holds, leaving the process with an empty effective capability set
It cuts container startup time noticeably because non-root processes skip privileged init and initialize faster

Why does Driftwood's gunicorn listen on port 8000 with the proxy in front?

A non-root process can't bind ports below 1024, so the app listens high and the proxy terminates the low port
Port 8000 sits above the privileged range and is simply faster for HTTP traffic than port 80 on modern Linux kernels
Gunicorn cannot speak HTTP to browsers at all and relies on the front proxy to translate its protocol on the way out
Binding port 80 requires running gunicorn directly as root, which is the recommended production setup

A container with USER app crashes at startup with a permission error reading its own code. What is the fix?

Set file ownership in the image with COPY --chown=app:app so the non-root user can read its files
Remove the USER app line so the process runs as root again and the read error simply goes away
Add --privileged at runtime so the process gains enough power to override the file permissions
Mount the entire root filesystem read-only so the kernel skips the failing permission checks on the code

You got correct