Reproducible Builds
"It built last week" is not the same claim as "it builds the same today." A reproducible build produces a byte-equivalent — or at least dependency-equivalent — image every time, which means pinning everything that can move: the base image by digest, the dependencies by lockfile, and the build platform explicitly.
This is the difference between a digest you can trust in an incident review and a :latest base that silently shifted under you between two builds. When something breaks in production, "what exactly shipped in 1.4.0" has to be answerable to the byte, and that answer only exists if the build was pinned when it ran.
Pin the Base by Digest
FROM python:3.12-slim is a moving tag that can resolve to different bytes next month (Chapter 2 topic 09). FROM python:3.12-slim@sha256:… pins the exact content, so a rebuild uses the same base the original build did, and you inherit base updates deliberately by bumping the digest in a reviewable change rather than absorbing them silently the next time the tag moves.
FROM python:3.12-slim@sha256:7a1c4d... AS builder
The tag stays in front of the digest as a human-readable label, but the @sha256: is what actually determines the bytes. Two builds days apart against this line pull the identical base; a security update to the base does not reach you until you change the digest on purpose.
Deterministic Dependency Install
Installing from a lockfile — a requirements.txt with pinned versions and hashes, or pip install --require-hashes — fixes the dependency tree. An unpinned pip install flask resolves to whatever is newest at build time, so two builds days apart ship different code under the same Dockerfile. The lockfile is the dependency half of reproducibility; the digest is the base half, and you need both.
Sources of Nondeterminism
Timestamps, build-time network resolution, unpinned apt-get install versions, and locale all vary the output. Pinning versions, using cache mounts (topic 30) instead of live re-resolution, and avoiding latest everywhere removes most of the drift. What is left — file modification times, ordering — is usually dependency-equivalent rather than byte-equivalent, which is enough for an incident review even when it is not bit-for-bit identical.
buildx and --platform, Previewed
docker buildx build --platform linux/amd64 makes the target architecture explicit rather than implicitly inheriting the build host's. An arm64 laptop and an amd64 CI runner then produce images for the same declared platform instead of two different ones depending on who ran the build. Full multi-arch manifests are Chapter 9 topic 56; here it is only the reproducibility angle — naming the platform so the output does not silently depend on the machine.
The Driftwood Reproducible Build
Driftwood's Dockerfile pins python:3.12-slim by digest, installs from a hash-pinned requirements.txt, builds with buildx --platform linux/amd64, and records the resulting image digest in the release log. So "what shipped in 1.4.0" is answerable to the exact byte, months after the fact, without trusting that a tag has not moved.
The CI pipeline that enforces all of this on every build — refusing an unpinned base, capturing the output digest automatically — is Chapter 9's subject. Here the point is the build itself: pin the inputs, name the platform, and write down the digest, so the artifact stops being a moving target.
- Building
FROM python:3.12-slimwithout a digest and calling the build reproducible — the tag can move between builds, so the same Dockerfile yields different images on different days (Chapter 2 topic 09). - Installing unpinned dependencies (
pip install flask,apt-get install postgresql-client) and getting a newer, possibly breaking version on a rebuild months later under the same Dockerfile. - Relying on the build host's default architecture and being surprised when an arm64 laptop builds an image that will not run on an amd64 server — name the platform with
--platform. - Treating a passing build as a stable artifact without recording its resolved digest, so after an incident there is no precise record of what was actually built and shipped.
- Pinning the base digest but leaving dependencies unpinned, or the reverse — both layers can drift, so reproducibility needs the base and the dependency tree fixed together.
- Pin base images by digest (
@sha256:…) and bump them deliberately, so a rebuild gets identical bytes and base updates are an explicit, reviewable change. - Install dependencies from a lockfile with pinned versions and hashes, so the dependency tree is fixed across rebuilds rather than re-resolved to "newest".
- Build with
docker buildx build --platform …naming the target architecture explicitly, so the output does not silently depend on the build host (full multi-arch in Chapter 9 topic 56). - Record the resolved image digest in the release record at build time, so "what shipped" is answerable to the exact byte during an incident.
--platform
Kaniko · Podman · Buildah build the same pinned Dockerfile reproducibly
Buildpacks · ko enforce reproducibility through their own dependency model
poetry.lock · package-lock.json the dependency-pinning half regardless of builder
Knowledge Check
Why is pinning the base by digest required for reproducibility — isn't a version tag enough?
- A tag like
python:3.12-slimcan resolve to different bytes later; only the@sha256:digest pins the exact content - A version tag is immutable once published, so the registry can never repoint it and it already guarantees the same bytes on every build
- The digest makes the base image download noticeably faster than the plain tag does
- The digest automatically pulls every supported CPU architecture of the base
How does a lockfile make dependency installs deterministic?
- It fixes the dependency tree with pinned versions and hashes, so a rebuild installs identical packages
- It bundles every dependency's full source code directly into the image layer
- It pins the base image digest so the dependencies match the OS version
- It always installs the newest compatible version of each named package, re-resolving the whole tree against the registry on every single rebuild
Which is a real source of build nondeterminism?
- Unpinned
apt-getandpipversions resolved live at build time - Naming the target build platform explicitly with the
--platformflag - Pinning the base image to a specific
@sha256:content digest - Recording the resolved output image digest in the release log afterward
Why does naming --platform matter for reproducibility?
- Without it the build inherits the host's architecture, so an arm64 laptop and an amd64 runner produce different images
- It builds a full multi-architecture manifest covering every platform at once
- It pins the dependency versions so the package install is fully deterministic
- It encrypts the output image and binds it to the named platform's CPU, so a host of any other architecture refuses to load or run it
You got correct