Chapter 5: Building Well
Topic 32

Reproducible Builds

ReproducibilityPinning

"It built last week" is not the same claim as "it builds the same today." A reproducible build produces a byte-equivalent — or at least dependency-equivalent — image every time, which means pinning everything that can move: the base image by digest, the dependencies by lockfile, and the build platform explicitly.

This is the difference between a digest you can trust in an incident review and a :latest base that silently shifted under you between two builds. When something breaks in production, "what exactly shipped in 1.4.0" has to be answerable to the byte, and that answer only exists if the build was pinned when it ran.

Pin the Base by Digest

FROM python:3.12-slim is a moving tag that can resolve to different bytes next month (Chapter 2 topic 09). FROM python:3.12-slim@sha256:… pins the exact content, so a rebuild uses the same base the original build did, and you inherit base updates deliberately by bumping the digest in a reviewable change rather than absorbing them silently the next time the tag moves.

A digest-pinned base — the tag is documentation, the digest is the contract
FROM python:3.12-slim@sha256:7a1c4d... AS builder

The tag stays in front of the digest as a human-readable label, but the @sha256: is what actually determines the bytes. Two builds days apart against this line pull the identical base; a security update to the base does not reach you until you change the digest on purpose.

Deterministic Dependency Install

Installing from a lockfile — a requirements.txt with pinned versions and hashes, or pip install --require-hashes — fixes the dependency tree. An unpinned pip install flask resolves to whatever is newest at build time, so two builds days apart ship different code under the same Dockerfile. The lockfile is the dependency half of reproducibility; the digest is the base half, and you need both.

Sources of Nondeterminism

Timestamps, build-time network resolution, unpinned apt-get install versions, and locale all vary the output. Pinning versions, using cache mounts (topic 30) instead of live re-resolution, and avoiding latest everywhere removes most of the drift. What is left — file modification times, ordering — is usually dependency-equivalent rather than byte-equivalent, which is enough for an incident review even when it is not bit-for-bit identical.

The pins that make a build repeatable
Pin base by digest
Lockfile with hashes
Name the --platform
Identical image every build

buildx and --platform, Previewed

docker buildx build --platform linux/amd64 makes the target architecture explicit rather than implicitly inheriting the build host's. An arm64 laptop and an amd64 CI runner then produce images for the same declared platform instead of two different ones depending on who ran the build. Full multi-arch manifests are Chapter 9 topic 56; here it is only the reproducibility angle — naming the platform so the output does not silently depend on the machine.

The Driftwood Reproducible Build

Driftwood's Dockerfile pins python:3.12-slim by digest, installs from a hash-pinned requirements.txt, builds with buildx --platform linux/amd64, and records the resulting image digest in the release log. So "what shipped in 1.4.0" is answerable to the exact byte, months after the fact, without trusting that a tag has not moved.

The CI pipeline that enforces all of this on every build — refusing an unpinned base, capturing the output digest automatically — is Chapter 9's subject. Here the point is the build itself: pin the inputs, name the platform, and write down the digest, so the artifact stops being a moving target.

Common Mistakes
  • Building FROM python:3.12-slim without a digest and calling the build reproducible — the tag can move between builds, so the same Dockerfile yields different images on different days (Chapter 2 topic 09).
  • Installing unpinned dependencies (pip install flask, apt-get install postgresql-client) and getting a newer, possibly breaking version on a rebuild months later under the same Dockerfile.
  • Relying on the build host's default architecture and being surprised when an arm64 laptop builds an image that will not run on an amd64 server — name the platform with --platform.
  • Treating a passing build as a stable artifact without recording its resolved digest, so after an incident there is no precise record of what was actually built and shipped.
  • Pinning the base digest but leaving dependencies unpinned, or the reverse — both layers can drift, so reproducibility needs the base and the dependency tree fixed together.
Best Practices
  • Pin base images by digest (@sha256:…) and bump them deliberately, so a rebuild gets identical bytes and base updates are an explicit, reviewable change.
  • Install dependencies from a lockfile with pinned versions and hashes, so the dependency tree is fixed across rebuilds rather than re-resolved to "newest".
  • Build with docker buildx build --platform … naming the target architecture explicitly, so the output does not silently depend on the build host (full multi-arch in Chapter 9 topic 56).
  • Record the resolved image digest in the release record at build time, so "what shipped" is answerable to the exact byte during an incident.
Comparable tools buildx · BuildKit supports digest-pinned bases, reproducible flags, and --platform Kaniko · Podman · Buildah build the same pinned Dockerfile reproducibly Buildpacks · ko enforce reproducibility through their own dependency model poetry.lock · package-lock.json the dependency-pinning half regardless of builder

Knowledge Check

Why is pinning the base by digest required for reproducibility — isn't a version tag enough?

  • A tag like python:3.12-slim can resolve to different bytes later; only the @sha256: digest pins the exact content
  • A version tag is immutable once published, so the registry can never repoint it and it already guarantees the same bytes on every build
  • The digest makes the base image download noticeably faster than the plain tag does
  • The digest automatically pulls every supported CPU architecture of the base

How does a lockfile make dependency installs deterministic?

  • It fixes the dependency tree with pinned versions and hashes, so a rebuild installs identical packages
  • It bundles every dependency's full source code directly into the image layer
  • It pins the base image digest so the dependencies match the OS version
  • It always installs the newest compatible version of each named package, re-resolving the whole tree against the registry on every single rebuild

Which is a real source of build nondeterminism?

  • Unpinned apt-get and pip versions resolved live at build time
  • Naming the target build platform explicitly with the --platform flag
  • Pinning the base image to a specific @sha256: content digest
  • Recording the resolved output image digest in the release log afterward

Why does naming --platform matter for reproducibility?

  • Without it the build inherits the host's architecture, so an arm64 laptop and an amd64 runner produce different images
  • It builds a full multi-architecture manifest covering every platform at once
  • It pins the dependency versions so the package install is fully deterministic
  • It encrypts the output image and binds it to the named platform's CPU, so a host of any other architecture refuses to load or run it

You got correct