Chapter 4: Dockerfiles
Topic 21

Layer Caching and Instruction Order

BuildCache

The build cache is the difference between a 90-second rebuild and a 2-second one, and it turns entirely on the order of your instructions. The daemon caches the result of each instruction and reuses it on the next build until it hits the first instruction whose inputs changed — and from that point down, every layer is rebuilt, cache invalidation cascading to the bottom.

The whole discipline follows from that one rule: order your instructions so the things that change rarely sit above the things that change on every commit, and the expensive steps stay cached. For Driftwood that is the difference between reinstalling every dependency on each one-line code edit and reinstalling none of them. This topic is the spine the rest of the chapter hangs on.

How the Cache Decides

For most instructions the daemon hashes the instruction text. For COPY and ADD it hashes the instruction text plus the contents of the files being copied. An unchanged hash means the cached layer is reused; a changed hash invalidates that layer — and every layer below it. The subtlety that catches people is the file-contents part: editing a file that a COPY brings in changes that instruction's hash even though the instruction text is byte-for-byte identical.

The Cascade

Cache invalidation flows downward only. Change instruction 4 and instructions 4 through the end rebuild, while 1 through 3 stay cached. Nothing above a change is ever affected — which is the entire basis of ordering. Put the volatile instruction near the bottom and a change to it rebuilds little; put it near the top and a change rebuilds almost everything beneath it.

Invalidation cascades down, never up
Above the change
Instructions 1–3 are untouched by a change at 4. Their cached layers are reused — order rarely-changing steps here.
From the change down
Instruction 4 and everything below it rebuild, whether or not they were logically affected. Order volatile steps here.

The COPY . . Before RUN install Trap

Driftwood's naive Dockerfile copies the whole source tree and then installs dependencies. Because COPY . . includes the application code, editing a single line of Python changes the copied files, busts that layer, and forces a full pip install on every rebuild. That is the 90-second case — reinstalling every dependency because one line of unrelated code moved.

The trap — source copied above the install
FROM python:3.12
WORKDIR /app
COPY . .                              # any code edit busts this layer…
RUN pip install -r requirements.txt   # …forcing a full reinstall here
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

The RUN pip install sits below the COPY . ., so the cascade rule guarantees it rebuilds whenever the copy does — and the copy changes on every code edit. The dependencies have not changed at all, but the cache has no way to know that, because the layer that installs them depends on the layer that copied everything.

The Copy-Deps-First Fix

Copy requirements.txt alone, run pip install, then COPY . .. Now an application code change invalidates only the final copy layer; the cached dependency layer above it is reused untouched, and the rebuild drops to roughly 2 seconds. Dependencies reinstall only when requirements.txt itself changes — which is the behavior you actually wanted.

The fix — manifest copied and installed above the source
FROM python:3.12
WORKDIR /app
COPY requirements.txt .               # changes only when deps change
RUN pip install -r requirements.txt   # cached across code edits
COPY . .                              # the only layer a code edit busts
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

Two lines moved, and the rebuild went from 90 seconds to 2. This is the lesson the whole chapter is built around: the same instructions, reordered least-to-most volatile, change the economics of every rebuild for the life of the project. The same pattern applies to package.json before npm install, go.mod before go build, and every other manifest-then-code stack.

What Else Busts the Cache

A changed RUN apt-get update mid-build, a moving base tag like python:latest that resolves to a new digest, an ARG value passed with --build-arg, and an ADD of a URL all invalidate from that point down. Understanding the inputs to each instruction's hash — instruction text, copied file contents, build args — is how you predict a cache miss before it costs you, rather than accepting "builds are just slow."

Common Mistakes
  • Placing COPY . . above RUN pip install (or npm install) — every source edit reinstalls all dependencies because the copy layer changed, turning a 2-second rebuild into a 90-second one.
  • Splitting apt-get update and apt-get install into separate RUN layers — a cached update layer serves stale package indexes for weeks, so install pulls outdated or vanished versions.
  • Assuming a code change rebuilds "just that part" when dependencies sit below it — anything beneath the first changed instruction rebuilds regardless of whether it was logically affected.
  • Trusting the cache after editing a file COPY brings in, without realizing the copied contents are part of the cache key — the layer silently rebuilds and you blame the wrong instruction.
Best Practices
  • Order instructions least-to-most volatile: base, system packages, dependency manifests and install, then application source last, so the expensive layers stay cached across code changes.
  • Copy the dependency manifest (requirements.txt, package.json) and install before copying application code, so editing code never reinstalls dependencies.
  • Pin apt-get/apk install steps with the index update in the same RUN, so a cached layer cannot serve a stale package index.
  • Read a slow rebuild as a cache miss and trace it to the first invalidated instruction, rather than accepting "builds are just slow."
Comparable tools BuildKit content-addressed caching, parallel stages, and --mount=type=cache for package caches Kaniko · Buildah implement the same layer-cache model Buildpacks cache dependency layers automatically without manual instruction ordering

Knowledge Check

How does the daemon decide whether to reuse a cached layer or rebuild it?

  • It hashes the instruction text, plus the copied file contents for COPY/ADD, and reuses the layer if the hash matches
  • It compares the modification timestamp of each layer against the source files and rebuilds anything that looks older
  • It checksums the entire assembled image and rebuilds the whole thing from scratch whenever that one checksum differs
  • It expires every single cached layer after a fixed built-in time-to-live and then rebuilds it from scratch on the very next build that happens to run after the expiry

If instruction 4 in a 7-instruction Dockerfile changes, what rebuilds?

  • Instructions 4 through 7 rebuild from the change down, while 1 through 3 stay cached and untouched
  • All seven instructions rebuild, since a single change invalidates the entire image top to bottom
  • Only instruction 4 rebuilds; the cache surgically swaps in just the one changed layer and reuses the rest
  • Instructions 1 through 4 rebuild, since the change propagates back upward toward the base layer

Why does COPY . . before RUN pip install defeat the cache?

  • Any code edit changes the copy layer, and the install below it rebuilds even though dependencies are unchanged
  • pip install itself always runs measurably slower whenever it executes below a COPY layer than it would running above one
  • The cache ignores COPY layers entirely, so the install below always reruns from scratch regardless of order
  • pip resolves and installs a different set of packages depending on whether the source was copied first

What does the copy-deps-first ordering buy, and how?

  • Copying the manifest and installing above the source keeps the dependency layer cached, cutting a code-edit rebuild from ~90s to ~2s
  • It produces a noticeably smaller final image on disk overall, because the isolated dependency layer ends up compressing far more aggressively whenever the finished image is pushed to a registry
  • It makes the pip install step itself run measurably faster by handing it the manifest before the application code
  • It permanently caches the dependency layer so it never rebuilds again, even when requirements.txt itself changes

You got correct