Chapter 9: Registries & Distribution
Topic 54

Docker Hub, Private, and Self-Hosted Registries

HostingRegistry

Every image lives in some registry, and which one you pick sets your rate limits, your access controls, and your blast radius. The options are three: the public default, Docker Hub; a managed cloud registry tied to where you deploy — ECR on AWS, Artifact Registry/GCR on Google, GHCR on GitHub, ACR on Azure; or a registry you run yourself, either Docker's plain registry:2 image or Harbor for a full product.

Three places an image can live
Docker Hub
The public default and home of official base images — fine for pulling, but anonymous pulls are rate-limited per IP.
Cloud registry
ECR · GHCR · GCR · ACR — co-located with your compute and IAM-integrated, with no rate limit on your own images.
Self-hosted
Run it yourself — plain registry:2 for a minimal store, or Harbor for RBAC, scanning, and signing.

Driftwood publishes to a private registry.driftwood.example for production and mirrors the public name driftwood/web on Docker Hub only for illustration. The reason for the split is the whole point of this topic: a public registry is where you pull base images from, not necessarily where your own images should live.

Docker Hub and the Rate-Limit Footgun

Docker Hub is the default an unqualified name resolves to — pull python:3.12-slim and you are pulling from Hub. It is the home of the official base images and is fine for that. The trap is its anonymous pull limit: 100 pulls per 6 hours per IP address, 200 for an authenticated free account. A CI fleet, or a NAT'd cluster where every node shares one egress IP, exhausts that budget fast and builds start failing with 429 Too Many Requests for reasons that have nothing to do with the code.

Cloud Registries, Co-located With Compute

ECR, Artifact Registry, GHCR, and ACR sit next to the compute you deploy on. Pulls are in-network and fast, authentication rides the cloud's IAM rather than a separate docker login, and there is no anonymous rate limit on your own images. If you deploy on a single cloud, its registry is the path of least resistance for your own images — you inherit the IAM and networking you already run.

The registry Image

Docker's official registry:2 is a single binary that implements the distribution API, and it gives you a private registry in one container. It is deliberately plain: no UI, no auth beyond a static htpasswd file, no scanning, no replication. That makes it a fine minimal cache or air-gapped store, and a poor choice for anything that needs governance.

A minimal private registry in one container
$ docker run -d -p 5000:5000 --name registry \
    -v /srv/registry:/var/lib/registry \
    registry:2
# now reachable as localhost:5000/driftwood/web — no auth, no TLS by default

That command is the whole product. It stores blobs and manifests on the mounted volume and speaks the same API as Hub, but it ships with nothing to protect or organize what you push — which is exactly the gap the next option fills.

Harbor — A Registry as a Product

Harbor wraps the same distribution API with the things a team actually needs around it: projects, role-based access control, built-in vulnerability scanning, image signing, replication between registries, and a web UI. It is what registry.driftwood.example runs when the goal is governance, not just storage. The cost is that Harbor is a real service — a database, a scanner CVE feed, and a storage backend that all need patching and backups.

When to Run Your Own

Self-hosting earns its operational cost in three cases: when images must stay inside your network, when you want one registry spanning multiple clouds, or when you need pull-through caching and replication you control. Outside those, a managed cloud registry is simply less to operate — you are not on the hook for its uptime, and a registry outage blocks every deploy that pulls from it. Driftwood runs its own because production images must not leave the network, and accepts the operational tax that comes with that.

Where to Put Your Images

Docker Hub — the public default and home of official base images. Fine for pulling; watch the anonymous pull limits (100 per 6 hours per IP) for any CI or NAT'd cluster, and authenticate or mirror rather than pulling base images bare.

Cloud registry (ECR · Artifact Registry · GHCR · ACR) — co-located with your compute, IAM-integrated, no rate-limit surprises on your own images. The default for your own images when you deploy on that cloud and have no in-network requirement.

Self-hosted (registry:2 · Harbor) — choose when images must stay in-network, span clouds, or need governance you control. Harbor when you want scanning, RBAC, signing, and replication; plain registry:2 for a minimal cache or air-gapped store.

Common Mistakes
  • Letting a CI fleet pull base images anonymously from Docker Hub through one egress IP — the shared 100-per-6-hours limit drains fast and builds fail intermittently with 429, a failure that looks like flakiness but is the rate limiter; authenticate or mirror instead.
  • Running the plain registry:2 image with its default config and no TLS or auth, then exposing it on the network — anyone who can reach it can push and pull, and the daemon will only talk to a plaintext registry if you mark it insecure, which is itself a hole.
  • Picking a cloud registry in a different region or cloud than where you deploy — every pull then crosses the internet and incurs egress, erasing the co-location win that was the reason to use a cloud registry at all.
  • Treating self-hosted Harbor as fire-and-forget — its database, scanner CVE feed, and storage backend need backups and patching like any service, and a registry outage blocks every deploy that pulls from it.
Best Practices
  • Authenticate Docker Hub pulls, or front them with a pull-through cache, in any CI or multi-host setup so anonymous rate limits never gate a build.
  • Push your own images to a registry co-located with your compute — a cloud registry on your deploy cloud, or registry.driftwood.example inside your network — so pulls are in-network and IAM-scoped.
  • Reach for Harbor when you need RBAC, scanning, signing, and cross-registry replication in one place; reach for plain registry:2 only as a minimal cache or air-gapped store.
  • Serve every private registry over TLS with real auth, and back up its storage and metadata, since a registry outage stops every deploy that depends on it.
Comparable tools Docker Hub the public default home of official base images GHCR · ECR · GCR/Artifact Registry · ACR the cloud-managed options Harbor · registry:2 the self-hosted product and its minimal alternative skopeo moves images between any of them

Knowledge Check

A CI fleet behind one NAT egress IP starts failing builds with 429 Too Many Requests pulling base images. What is happening?

  • The runners share Docker Hub's anonymous pull limit through one egress IP and have exhausted that shared budget
  • The base images were deleted from Docker Hub and the registry is returning a not-found error
  • The local daemon's disk has filled up completely and it is now refusing to cache or write any more of the pulled layers
  • The runners are speaking an outdated distribution protocol the registry no longer accepts

When does a managed cloud registry beat self-hosting your own?

  • When you deploy on a single cloud with no in-network requirement — you get co-location and IAM for free without operating a service
  • When images must never leave your private network under any circumstance, so a cloud-managed registry is the safe default for them
  • When you need one registry that spans AWS, GCP, and Azure at once and a managed cloud registry can serve all three transparently
  • When you need full control over pull-through caching and cross-registry replication, which a managed cloud registry hands you out of the box

What does Harbor add over the plain registry:2 image?

  • Projects, role-based access, vulnerability scanning, signing, replication, and a UI around the same distribution API
  • A faster proprietary push-and-pull protocol that transfers image layers more quickly than the standard distribution API
  • The ability to store images without a registry at all by embedding them directly into each deploy host's local filesystem
  • Zero operational overhead, since Harbor is a fully managed hosted offering that needs no patching or backups from you

Why does co-locating a registry with your compute matter for cost and latency?

  • Pulls stay in-network so they are fast and avoid cross-internet egress charges, which a different-region registry would incur on every pull
  • A co-located registry automatically compresses images smaller than a remote one would, so each pull moves fewer bytes overall
  • A nearby registry is exempt from rate limits entirely because the round-trip latency is too low for the limiter to measure
  • Images pulled from a co-located registry are automatically signed and cryptographically verified as a built-in part of the in-network transfer path

You got correct