Chapter 5: Building Well
Topic 29

.dockerignore and a Small Context

ContextSecurity

When you run docker build ., that directory becomes the build context — the set of files the builder is allowed to read. A .dockerignore file is the allow/deny list that keeps that context small: it stops .git, node_modules, local virtualenvs, and stray secrets from being sent to the builder and, worse, from being swept into the image by a careless COPY . ..

It is the cheapest fix in this chapter and the one most often skipped. One file beside the Dockerfile saves a slow context upload and closes the most common secret-leak path in a single move.

What the Build Context Is

The build context is the directory you point docker build at, and its files are what the builder is allowed to pull in. BuildKit — today's default builder — sends only the files the build actually needs and reuses unchanged files between builds, rather than tarring up the whole directory and uploading all of it on every build the way the legacy builder did. But the context still defines what is available: leave a 400 MB .git history and a local node_modules in scope and a careless COPY . . sweeps all of it into the image, while large unignored files still cross the wire. The context is what you expose to the build, not the subset the Dockerfile happens to reference.

.dockerignore Syntax and Behavior

.dockerignore sits beside the Dockerfile and uses gitignore-style patterns — **/__pycache__, .git, *.env, node_modules — to exclude paths from the context entirely. An excluded file cannot be COPY'd, because the daemon never receives it in the first place. The exclusion happens at the boundary, before the build sees anything, which is what makes it both a speed and a security control.

Driftwood's .dockerignore — sits beside the Dockerfile
.git
.venv
__pycache__
**/__pycache__
*.pyc
*.env
.env
tests/fixtures
node_modules

Each line drops a path from the tar the daemon receives. The patterns mirror the things that are large, machine-specific, or sensitive — version history, the local virtualenv, byte-compiled caches, and any environment file that might hold a credential.

The Secret-Leak Path

Without .dockerignore, a COPY . . happily bakes .env, the full .git history with any committed secrets, SSH keys, and cloud credentials into a layer. docker history and a layer extraction then hand all of it to anyone who pulls the image. The file is the primary defense against the "I copied my whole repo into the image" leak — the kind that surfaces months later when someone scans a published image and finds an active token in a layer.

What the daemon receives, with and without .dockerignore
Without .dockerignore
The whole directory ships — .git history, node_modules, the local .venv, stray .env files. The context upload is slow, and a careless COPY . . bakes the secrets into a layer.
With .dockerignore
Only the source the build actually copies crosses the socket. The context is lean and fast, the hash stays stable for caching, and nothing secret can leak — excluded files never reach the daemon.

Smaller Context, Faster Builds

A lean context uploads faster and keeps BuildKit's cache valid. If node_modules is in the context, touching one dependency changes the context hash and can invalidate cache that should have held — BuildKit cannot tell the relevant change from the irrelevant one when both are in the same tar. Excluding the noise means the context hash only moves when something the build actually copies has moved.

The Driftwood Context

Driftwood's .dockerignore excludes .git, .venv, __pycache__, *.pyc, .env, and the test fixtures, so the context sent for the multi-stage build is the source tree and requirements.txt only. The developer's local .venv never reaches the daemon, which matters doubly: it would otherwise be copied over the freshly built one and ship a machine-specific dependency tree on top of the wheels the builder stage just compiled.

The result is a context measured in a few megabytes instead of the hundreds the bare repo would send, with no credential file anywhere in it — the multi-stage build of topic 28 fed a clean, small input.

Common Mistakes
  • Running COPY . . with no .dockerignore and shipping .git, .env, and local credential files into the image — docker history and a layer extraction hand them straight to anyone who pulls it.
  • Copying a host node_modules or .venv into the image and then installing on top — you ship a local-machine-specific dependency tree that may not even match the target platform, on top of the size hit.
  • Assuming .gitignore covers the build — it does not; Docker reads .dockerignore specifically, and a path ignored by git is still uploaded and copyable unless .dockerignore excludes it.
  • Leaving a multi-hundred-megabyte .git directory in the context and blaming slow builds on the network, when the context upload itself is the cost.
  • Writing .dockerignore patterns that are too broad (*) without the matching !include exceptions, then being surprised the COPY finds nothing to copy.
Best Practices
  • Ship a .dockerignore with every Dockerfile that excludes .git, virtualenvs, node_modules, caches, and any *.env or credential files, so neither secrets nor bulk reach the daemon.
  • Treat .dockerignore as a security control, not just a speed tweak — it is the line that stops COPY . . from baking the repo's secrets into a layer.
  • Keep the build context to the source the Dockerfile actually copies, so context uploads stay small and BuildKit's content cache stays valid.
  • Mirror, do not reuse, .gitignore — list the build-specific exclusions explicitly in .dockerignore, since Docker never reads the git file.
Comparable tools Podman · Buildah · Kaniko · BuildKit all honor .dockerignore identically Buildpacks · ko sidestep the issue — they introspect the source rather than COPY . . an arbitrary context

Knowledge Check

What is the build context, and how does BuildKit handle it?

  • The directory you point the build at — BuildKit sends the files it needs and reuses unchanged ones, not the whole tree every build
  • Only the specific files named in COPY instructions, streamed lazily as each one runs
  • The set of cached layers the daemon reuses and reassembles from a previous build
  • The base image the daemon downloads from the configured remote registry

Why does .dockerignore, not .gitignore, govern the build?

  • Docker reads .dockerignore specifically; a git-ignored path is still uploaded unless .dockerignore excludes it
  • Docker reads .gitignore first and only falls back to .dockerignore when it is absent
  • The two files must contain identical patterns line for line or the build errors out immediately with a context-mismatch warning
  • Git filters the working tree first before handing the context to Docker

How does a missing .dockerignore plus COPY . . leak secrets?

  • It bakes .env, .git history, and credential files into a layer that anyone pulling the image can extract
  • It prints the secrets to the build log output where the CI system archives them
  • It exposes the daemon socket to every other container running on the host
  • It copies the secrets into the container's runtime environment variables, where they show up in docker inspect and the process environment of every child

How does context size affect build speed and cache validity?

  • A lean context uploads faster and keeps the context hash stable, so BuildKit's cache stays valid
  • A larger context makes the resulting container run more slowly under load
  • A larger context forces Docker to re-download the base image every build
  • A larger context adds proportionally more layers to the final image, one extra layer for every directory the daemon receives in the tarball

You got correct