Topic 22

RUN — Shell Form vs Exec Form

InstructionBuild

RUN executes a command at build time and commits the result as a layer — it is how packages get installed and files get generated while the image is being built. It comes in two forms: shell form (RUN apt-get update) wraps the command in /bin/sh -c, and exec form (RUN ["apt-get", "update"]) runs the executable directly with no shell.

The same fork shows up again in CMD and ENTRYPOINT (topic 24), where it decides whether your process is PID 1 and receives signals. Learning the distinction on the low-stakes RUN — where the cost of the wrong form is usually nothing — makes the high-stakes case obvious, so it is worth getting straight here.

Shell Form

RUN apt-get update && apt-get install -y curl runs through /bin/sh -c, so shell features work: &&, pipes, $VARIABLE expansion, and globbing all behave as they would in a terminal. This is the common form for build commands that chain steps, because chaining is exactly what shell form gives you for free.

Exec Form

RUN ["executable", "arg1", "arg2"] is a JSON array that runs the binary directly with no shell, so &&, pipes, and variable expansion do not work — the array is passed as literal arguments to the executable. It matters most in CMD/ENTRYPOINT, and for base images that have no shell at all, such as distroless or scratch, where there is no /bin/sh for shell form to invoke.

Two forms, two execution paths

Shell form

RUN cmd runs via /bin/sh -c, so &&, pipes, and $VAR work — but the shell becomes PID 1, which breaks signal delivery.

Exec form

RUN ["cmd"] runs the binary directly, with no shell in between — no && or expansion, but the process is PID 1 and receives signals.

Each `RUN` Is a Layer

Every RUN commits a new layer, which is why chaining install-and-clean into one RUN ... && ... keeps the intermediate package cache out of a persisted layer. Split them into separate RUNs and the cache is baked in permanently — deleting it in a later layer does not reclaim the space, because the earlier layer still carries it. This is the layer lesson from Chapter 2 applied to build commands.

One RUN — install, then clean, in the same layer

RUN apt-get update \
 && apt-get install -y --no-install-recommends curl \
 && rm -rf /var/lib/apt/lists/*

The cleanup runs in the same layer that created the cache, so the layer never carries it. Write those three steps as three separate RUNs and the rm -rf only hides the files behind a later layer — the bytes are still in the image, padding every pull. Shell form earns its keep here: the && chain is what lets one layer do install-then-clean at all.

Build-Time vs Run-Time

RUN happens during docker build and its effects are frozen into the image. It is not the container's startup command — that is CMD/ENTRYPOINT in topic 24 — and confusing the two is a frequent beginner error. RUN pip install installs gunicorn into the image at build time; CMD ["gunicorn", ...] is what actually starts gunicorn when a container launches. The first runs once, the second runs every container start.

Why the Form Choice Echoes Later

The shell-vs-exec distinction is mechanically identical in ENTRYPOINT/CMD, but the stakes are higher: there, shell form inserts an sh -c that becomes PID 1 and swallows the signals docker stop sends — the signal lesson from Chapter 3, topic 15. On RUN the wrong form usually just fails to chain; in ENTRYPOINT it produces a container that ignores docker stop and gets SIGKILLed after a 10-second timeout. Same fork, very different blast radius.

Common Mistakes

Writing RUN ["sh", "-c", "apt-get update && apt-get install -y curl"] thinking exec form is "better" — you have just reintroduced the shell you were avoiding; if you need &&, shell form is the honest choice.
Expecting &&, pipes, or $VAR expansion to work in exec form — the JSON array has no shell, so the chain runs as literal arguments to the first executable and fails.
Splitting apt-get update, apt-get install, and the cache cleanup across separate RUN layers — each commits a layer, so the package cache is frozen into the image even after a later layer "removes" it.
Treating RUN as the container's start command — it runs at build time only; the process the container starts is set by CMD/ENTRYPOINT.

Best Practices

Use shell form for multi-step build commands that need &&, pipes, or variable expansion, and accept that it depends on a shell being present in the base.
Chain install, use, and cleanup into a single RUN joined by && so the intermediate files never persist in a committed layer.
Reach for exec form when the base has no shell (distroless, scratch) or when you want the exact arguments passed with no shell parsing in between.
Keep RUN strictly for build-time work and set the container's start command with ENTRYPOINT/CMD, never conflating the two.

Comparable tools BuildKit adds RUN --mount for cache and secret mounts plain RUN cannot do Buildah its run command is the scriptable equivalent outside a Dockerfile Buildpacks · ko generate the build steps for you, so there is no hand-written RUN

Knowledge Check

What does shell form wrap the command in, and what does exec form skip?

Shell form runs through /bin/sh -c; exec form runs the binary directly with no shell
Exec form runs through /bin/sh -c; shell form runs the binary directly with no shell
Shell form spawns a login shell with a profile; exec form spawns a non-login shell
Shell form commits one layer per command; exec form commits one layer per argument

Which form supports &&, pipes, and variable expansion?

Shell form, because it runs through a shell that interprets those features
Exec form, because the JSON array is parsed by the shell before execution
Both forms, since the daemon transparently wraps either one in /bin/sh -c before running it
Neither form, since RUN never invokes a shell at all and execs the binary in both cases

Why does chaining install and cleanup into one RUN matter?

Each RUN is a layer, so cleaning in the same one keeps the cache out of the image; a separate RUN bakes it in permanently
A later RUN that deletes the cache reclaims the space retroactively from the earlier layer that wrote it
Chaining lets the daemon run the install commands in parallel within one layer, so the build finishes faster
A single combined RUN instruction is entirely exempt from the layer build cache, so it can never trigger any downstream rebuild of the instructions that follow it

How does build-time RUN differ from the container's start command?

RUN executes once at build time and freezes into the image; CMD/ENTRYPOINT runs on every container start
RUN executes every time the container starts, while CMD runs only once at build time
They are fully interchangeable; the daemon simply picks whichever one of the two happens to be present in the Dockerfile and ignores the rest
RUN is only used when no CMD is present, acting as a fallback start command at container launch

You got correct