Topic 23

COPY vs ADD

InstructionFiles

Both COPY and ADD move files from the build context into the image, and for plain file copying they are identical. The difference is that ADD does two extra things — it auto-extracts local tar archives, and it can fetch a remote URL — and both of those are footguns that hide what the build is actually doing.

The rule is short: use COPY for everything, and reach for ADD only for the one case it is genuinely better at. The reason is auditability — a Dockerfile reviewer should be able to read a line and know exactly what lands in the image, and ADD turns a line that looks like a copy into one that might extract an archive or hit the network.

`COPY` — The Default

COPY src dest copies files and directories from the build context into the image, and nothing more. What you see is what you get, which is exactly why it is the instruction to use for application code, configs, and the requirements.txt in the cache-ordering lesson from topic 21. There is no behavior to remember beyond "it copies."

`ADD`'s Tar Auto-Extraction

ADD local.tar.gz /opt silently extracts the archive into the destination instead of copying the file. This is convenient when you intended it and a surprise when you wanted the tarball intact — and it is the one place ADD legitimately beats COPY. If your build genuinely needs a local archive unpacked into the image, ADD does it in one line where COPY plus a RUN tar would take two.

`ADD`'s URL Fetch

ADD https://example.com/file /opt downloads a remote file during the build. That pulls an unverified, unpinned resource into the image with no checksum, no caching guarantee, and a hidden network dependency baked into every build. A changed remote silently changes the image, and a flaky remote silently breaks the build — neither of which is visible from reading the line.

Same syntax, different powers

COPY

A plain copy from the build context into the image, and nothing else — predictable and auditable. The preferred default for every file move.

ADD

Copies too, but also auto-extracts local tar archives and can fetch from a URL — surprising behavior hidden behind a copy-looking line.

Why `COPY` Wins by Default

COPY does one obvious thing, so a reviewer reading the Dockerfile knows exactly what lands in the image. ADD's magic means a line that looks like a copy might extract an archive or hit the network, depending entirely on its argument — which makes the build harder to audit and harder to reproduce. The default is COPY not because ADD is broken, but because predictability is worth more than the occasional saved line.

The Honest URL Pattern

Instead of ADD <url>, fetch the file with an explicit, checksum-verified RUN so the download is visible and pinned — everything ADD's URL form hides becomes auditable in the Dockerfile.

Replace ADD <url> with a checksum-verified fetch

RUN curl -fsSL https://example.com/tool.tar.gz -o tool.tar.gz \
 && echo "abc123…  tool.tar.gz" | sha256sum -c - \
 && tar -xzf tool.tar.gz -C /opt \
 && rm tool.tar.gz

The download is explicit, the checksum fails the build if the remote changed, and the extraction and cleanup happen in the same layer. Compare that to ADD https://example.com/tool.tar.gz /opt, which fetches the file unverified and leaves no record of what it pulled — and, since a remote tar archive is not extracted by default, drops the tarball at the destination still needing an unpack step. The RUN version is longer precisely because it refuses to hide anything.

COPY vs ADD

COPY — copies files from the build context into the image and does nothing else: predictable, auditable, the right choice for application code, configs, and dependency manifests. This is the default for every file move in the Dockerfile.

ADD — does the same plus auto-extracts local tar archives and fetches remote URLs. Use it only for the local-tar-extraction case where that behavior is exactly the intent. For everything else, COPY; if you need a remote file, fetch it with a checksum-verified RUN, not ADD.

Common Mistakes

Using ADD config.json /app out of habit when COPY is clearer — it works, but it hides that ADD could have extracted or fetched, making the Dockerfile harder to audit.
ADD-ing a remote URL to pull a dependency — the resource is unpinned and unverified, the network dependency is baked into every build, and a changed remote silently changes the image.
Expecting ADD app.tar.gz /opt/app.tar.gz to leave a tarball at the destination — ADD extracts it, so you get the unpacked contents and no archive.
Relying on ADD's URL fetch for caching — there is no checksum and weak cache semantics, so builds are neither reproducible nor reliably cached.

Best Practices

Default to COPY for every file-copy in the Dockerfile so each line does one obvious, auditable thing.
Reserve ADD for the single case of extracting a local tar archive into the image, where its behavior is exactly the intent.
Fetch remote files with a RUN curl ... && sha256sum -c so the download is explicit, pinned, and verified rather than hidden inside ADD.
Read any ADD in a code review as a question — is this extracting an archive, or should it be a COPY? — since the instruction's behavior depends on its argument.

Comparable tools BuildKit ADD --checksum adds verification to remote fetches, narrowing the gap Buildah its copy/add mirror the same split Buildpacks · ko move source into the image without either instruction

Knowledge Check

What do COPY and ADD share, and what does only ADD do?

Both copy from the context; ADD also auto-extracts local tarballs and fetches remote URLs
Both fetch remote URLs into the image; only COPY can extract a local tar archive as it lands
Both copy files from the context; only ADD sets correct file ownership on them automatically
COPY moves files into the image; ADD moves files out of the image to the host

Why is ADD's URL fetch a reproducibility and security footgun?

It pulls an unpinned, unverified resource with no checksum, so a changed remote silently changes the image
It is disabled by default and requires a privileged build flag that weakens the daemon's security posture
It downloads over plain HTTP only, so the fetched file is always transmitted unencrypted across the wire
It automatically executes the downloaded file during the build step itself, directly running arbitrary fetched remote code as the root user

When is ADD's tar extraction the right call?

When you genuinely want a local tar archive unpacked into the image in a single instruction
When copying ordinary application source files into the image, since ADD is faster at it than COPY
When fetching a remote tarball over the network, since ADD verifies and extracts it in one step
When you want to keep a tarball intact at the destination without unpacking any of its contents

What replaces ADD <url> when you genuinely need a remote file?

A RUN that fetches with curl and verifies the download with sha256sum -c
A COPY with the URL as its source, since COPY can also reach remote files
An ADD with a --pin flag that locks the URL to a fixed version
An ENV that records the URL so the daemon downloads it at container start

You got correct

COPY vs ADD

COPY — The Default

ADD's Tar Auto-Extraction

ADD's URL Fetch

Why COPY Wins by Default

The Honest URL Pattern

Knowledge Check

`COPY` — The Default

`ADD`'s Tar Auto-Extraction

`ADD`'s URL Fetch

Why `COPY` Wins by Default