Topic 53

How Registries Work

ConceptRegistry

A registry is two stores behind one HTTP API: a content-addressed blob store that holds layers and config blobs by their SHA-256 digest, and a manifest store that maps tags and digests to the small JSON manifests listing those blobs. docker push and docker pull are ordinary HTTP conversations against the OCI distribution API — the daemon negotiates which layers the registry already has, transfers only the missing ones, then writes or reads the manifest that ties them together.

That is the whole shape of distribution, and it is worth understanding before you choose a registry or build a pipeline around one. Most of the surprises engineers hit — a push that is far smaller than the image, a 401 that looks like a missing repository, a CI fleet that mysteriously rate-limits — fall out directly from how this API stores content and scopes access. This topic builds on the manifests and digests from Chapter 2; it does not re-teach them, it puts them on the wire.

Two Stores, One API

Everything in a registry is named by content. A layer is stored once under the digest of its bytes, a config blob the same way, and a manifest references both by those digests rather than by name. Because the name is the hash, two images that share a base layer store that layer exactly once on the registry, no matter how many tags point at it. The tag driftwood/web:1.4.0 is just a human-readable pointer in the manifest store; the actual content lives in the blob store, deduplicated by digest.

This is the content-addressing from Chapter 2 topic 08 applied to storage rather than to a single host. A manifest is a list of digests plus a config; the registry holds the blobs those digests name. Pull any one of them and you get back exactly the bytes that produced that hash, or the transfer fails — there is no version of "almost the right layer."

The Push Conversation

When you run docker push driftwood/web:1.4.0, the daemon does not blindly upload the image. It asks the registry, per layer, whether it already holds that digest. For every layer the registry has — typically the shared base layers — the daemon sends nothing and the registry mounts the existing blob. Only the layers the registry lacks are uploaded, and last of all the daemon PUTs the manifest under the 1.4.0 tag, tying the whole set together.

What docker push actually transfers

ask which layers exist

→

upload only the missing layers

→

PUT the manifest

→

done — pull reverses it

Pushing a tagged image to the private registry

$ docker tag driftwood/web:1.4.0 registry.driftwood.example/driftwood/web:1.4.0
$ docker push registry.driftwood.example/driftwood/web:1.4.0
The push refers to repository [registry.driftwood.example/driftwood/web]
9c1b6dd1f4a2: Layer already exists      # shared base, not re-sent
3f8a0e2c7b91: Layer already exists
b2d4e6f80a13: Pushed                    # the one small app layer that changed
1.4.0: digest: sha256:7e9a…c204 size: 1789

The practical consequence: after a one-line code change, pushing a near-identical image transfers one tiny layer, not the whole thing. A registry that already holds your base layers takes a few-megabyte push for an app-layer change. If you reason about CI time as if every push re-uploads the full image, you will overestimate it by an order of magnitude.

The Pull Conversation

A pull runs the same negotiation in reverse. The daemon fetches the manifest, compares its layer digests against what the local store already has (the layer cache from Chapter 2 topic 10), downloads only the blobs it is missing, and reassembles the image from them. This is why re-pulling an image that shares a base with something you already have is nearly free — the base layers are already on disk, addressed by the same digests, so the pull is just the manifest and whatever is genuinely new.

Authentication and the Token Dance

docker login registry.driftwood.example exchanges your credentials for a short-lived bearer token, which the daemon attaches to each subsequent request. The registry authorizes per repository, not per registry: that is exactly how registry.driftwood.example can be readable by every production host while being pushable only by the CI service account. The token a host holds grants pull on driftwood/web and nothing else.

This per-repository scoping is why a failed push so often returns 401 Unauthorized rather than a clear error. The registry refused the token for that repository; it is not telling you the repository is missing. Read a 401 on push as an auth problem on that repo, not as proof the name is wrong.

Pull-Through Caches and Mirrors

A registry can be configured as a pull-through cache that fronts an upstream like Docker Hub: the first pull of a public base image fetches it from upstream and keeps a local copy, and every subsequent pull serves from the cache. Nobody re-tags anything — images keep their original names — but the bytes are fetched across the internet once instead of once per runner. For a CI fleet or a cluster behind a single egress IP, this cuts egress cost and sidesteps the upstream rate limits the next topic covers.

Common Mistakes

Assuming docker push re-uploads the whole image every time — it only sends blobs the registry is missing, so a registry that already holds the base takes a few-megabyte push for an app-layer change; treating every push as a full transfer leads to wildly overestimated CI times and pointless "optimization" work.
Reading a 401 on push as "repository does not exist" — the registry refused the token, not the name; the repo may be fine and the credentials wrong, so you waste time recreating a repository that was already there.
Mounting the daemon's credential store or a config.json with embedded auth into shared CI without scoping it — that token grants push to every repository it covers, so one leaked CI credential can overwrite production tags across every repo.
Pointing high-volume CI directly at Docker Hub instead of a pull-through cache — every runner re-pulls base layers across the internet and burns through anonymous pull limits, turning the registry into a build bottleneck for reasons unrelated to your code.

Best Practices

Run a pull-through cache mirror in front of Docker Hub for CI and production hosts, so shared base layers are fetched once across the internet and upstream rate limits stop gating builds.
Scope registry credentials per repository — give production hosts read-only pull tokens for registry.driftwood.example/driftwood/web and reserve push tokens for the CI service account, so a leaked host credential cannot overwrite a release.
Reason about transfer cost by layers, not images — order Dockerfile instructions so the volatile app layer is small and the heavy base layers stay cached registry-side, which is what makes every push after the first cheap.
Verify connectivity and auth with a small docker pull against the target registry before debugging a failing push, so you isolate a network or auth problem from an image problem.

Comparable tools Docker Hub · GHCR · ECR/GCR/ACR · Harbor all speak the same OCI distribution API skopeo talks the API directly to copy and inspect images without a daemon Podman · nerdctl push and pull against the identical endpoints

Knowledge Check

Why does a registry store every layer and config blob by its SHA-256 digest?

So identical content is stored once and deduplicated across images, and a pulled blob is verifiably the exact bytes that produced the hash
So the blobs are encrypted at rest on the registry and unreadable to anyone without the digest as the decryption key
So each layer is automatically compressed to the smallest possible size before it is written to the blob store
So human-readable tags can be indexed and looked up far faster than linearly scanning across the entire blob store by name on every single pull request

After a one-line code change, why is the second docker push of an image so small?

The daemon uploads only the layers the registry is missing, so an unchanged base is not re-sent and only the small app layer transfers
The registry computes a byte-level diff inside the one changed layer and stores only the difference against the previous push
The image is compressed far more aggressively on the second push than it was on the first, so the same layers re-upload as a much smaller payload
The daemon skips uploading the manifest entirely because the tag already exists in the manifest store

How does a registry let production hosts pull driftwood/web while only CI can push to it?

Auth is scoped per repository, so hosts hold a read-only pull token and the CI account holds a separate push token
The registry allows pushes only from the CI server's allowlisted IP address and permits pulls from anywhere else
The pushed image is flagged read-only at the repository level so that no production host can modify it after the push
CI logs in with a permanent shared password for the registry that production hosts are deliberately never given

What does running a pull-through cache in front of Docker Hub buy a CI fleet?

Shared base layers are fetched from upstream once and served locally afterward, cutting egress and dodging upstream rate limits
It re-tags every cached image under your own private namespace so the names that runners pull are shorter and consistent
It signs each cached image as it is fetched so that every later pull is automatically verified as trusted by the daemon
It compresses base images more tightly on the way through so that even the very first download from upstream comes across faster than a direct pull would

You got correct