Chapter 2: Images
Topic 12

Base Images

Base imageSize

Every image starts FROM a base, and that one line sets your security surface, your image size, and your debugging experience for the life of the image. The spectrum runs from full distro bases like ubuntu and debian, through slimmed variants, to minimal alpine, to distroless images that carry the runtime and your app but no shell or package manager, all the way to scratch, which is literally empty. Smaller bases mean less to attack and less to patch — and also less to debug with.

The trade is real in both directions, which is why "always use the smallest base" is wrong advice. A smaller base shrinks the attack surface and the bytes you ship, and it strips away the shell and tools you reach for when something breaks at 3am. The right choice depends on the app, the team's tolerance for ephemeral-debug-container workflows, and how much the size actually matters.

The base-image spectrum, smallest win against hardest to debug
full debian
Complete userland, ~1 GB. Easiest to debug, biggest size and attack surface.
slim
Docs and locales stripped, ~120 MB. Keeps glibc and a package manager — the sane default.
distroless
Runtime plus your app, no shell or package manager. Small and hardened, harder to debug.
scratch
Empty — a static binary plus its certs. Few MB, near-zero surface, no shell at all.

Official and Verified Images

Docker Hub's official images — python, postgres, nginx — are maintained by the registry and the upstream projects, scanned regularly, and documented. Preferring them over an arbitrary user's image is the first supply-chain decision you make, and the cheapest. A random someuser/python-fast may be faster to find and slower to regret: you inherit its unpatched vulnerabilities and unknown provenance the moment you build FROM it.

Full vs Slim

A debian or ubuntu base carries a complete userland — every coreutil, every locale, the docs, the package manager — which makes it easy to debug and large to ship. The -slim variants strip the docs, locales, and extras, cutting the base size sharply with little downside for most applications. A python:3.12 base runs close to 1 GB; python:3.12-slim is around 120 MB for the same Python.

For the vast majority of services, -slim is the sane default. You keep glibc and a real package manager — so installing a dependency or dropping into a shell still works — while paying a fraction of the size. It's the choice that asks the fewest questions.

Alpine and the musl Caveat

alpine is about 5 MB, built on musl libc and BusyBox instead of glibc and GNU coreutils. The size is genuinely tiny, but musl is not glibc, and that difference is not free. Precompiled binaries and Python wheels built against glibc can fail outright or segfault; DNS resolution and locale handling differ in subtle ways that surface as intermittent bugs rather than clean errors.

The failure mode that costs the most is the one that looks like a size win. You switch a Python service to alpine, the image drops to a tenth of its size, and three weeks later a compiled wheel that shipped fine on Debian segfaults under musl — and the hours spent diagnosing it dwarf the megabytes saved. Validate your actual dependencies on alpine before adopting it; the size is only a win if the app still runs.

Distroless

Google's distroless images contain the language runtime and your application and nothing else — no shell, no package manager, no coreutils. The attack surface and the size both shrink dramatically: there is no bash for an attacker to land in, and almost nothing to patch. The cost is the flip side of the same fact — there is no bash for you to exec into either, so debugging means ephemeral debug containers or solid logging rather than a shell prompt.

scratch and Static Binaries

A FROM scratch image is empty — no userland at all. A statically linked Go or Rust binary plus its TLS root certificates can be the entire image: a few megabytes, near-zero attack surface, nothing to exploit because there is nothing there but your binary. It only works for binaries that need nothing from the OS, which rules out anything dynamically linked against system libraries.

This is the extreme end of the spectrum, and the direction Driftwood's slim build moves toward in the Dockerfile chapter — not necessarily to scratch, but down the list as the app and the team mature enough to give up the in-image shell.

Base-Image Spectrum

full (ubuntu/debian) — everything you need to debug, the biggest attack surface and size. slim — the same family with docs and locales stripped; the sane default for most apps, keeping glibc and a package manager. alpine — ~5 MB via musl and BusyBox; tiny, but watch for musl-vs-glibc breakage.

distroless — runtime plus your app, no shell or package manager; small and hardened, but harder to debug. scratch — empty, viable only for a static binary and its certs. Move down the list as the app and team mature and you can give up the in-image shell.

Common Mistakes
  • Defaulting to a full ubuntu base for a simple service — you ship and patch hundreds of megabytes of userland the app never uses, enlarging the attack surface for nothing.
  • Switching to alpine blindly and hitting musl-vs-glibc breakage — segfaulting binaries, failing Python wheels, or DNS and locale quirks that cost far more time than the size they saved.
  • Choosing distroless or scratch without a debugging plan — there's no shell to exec into, so you need ephemeral debug containers or good logging in place before you commit to it.
  • Pulling an unofficial base image to save effort and inheriting its unpatched vulnerabilities and unknown provenance under your own image's name.
Best Practices
  • Start from an official, pinned -slim base as the default, and only move to alpine, distroless, or scratch when size or surface demands it and you've tested the app on the new base.
  • Validate alpine against your actual dependencies — compiled wheels, glibc-only binaries — before adopting it, rather than assuming the size win is free.
  • Pair distroless or scratch with a debugging strategy — ephemeral debug containers and solid logging — since there's no in-image shell to fall back on.
  • Pin the base by digest and rebuild on base updates so you inherit security patches deliberately, on your schedule, rather than by accident.
Comparable tools distroless · Chainguard/Wolfi · alpine the minimal-base ecosystem Buildpacks · ko produce minimal images without a hand-written base choice Podman · BuildKit apply the same FROM semantics identically

Knowledge Check

What does moving down the base-image spectrum from full to scratch trade away?

  • Debuggability and in-image tooling, in exchange for smaller size and a smaller attack surface
  • Raw application execution speed at run time, which drops off sharply with each smaller base you choose
  • Kernel compatibility with the host, since the smaller bases ship progressively older kernels
  • Reproducibility across rebuilds, because the more minimal bases cannot be pinned by digest

Why is switching to alpine not a free size win?

  • It uses musl libc instead of glibc, so glibc-built binaries and wheels can break and DNS/locale behavior can differ
  • Its advertised 5 MB size is misleading, because it silently expands to several gigabytes the moment the app is added
  • It has no official maintainer behind it, so its provenance and supply chain are always completely unknown
  • It requires a custom, specially patched kernel that most ordinary container hosts simply don't provide

What does a distroless base remove, and what does that cost you?

  • It removes the shell, package manager, and coreutils — shrinking size and surface, but leaving no shell to exec into for debugging
  • It removes the entire language runtime along with the surrounding userland, so you must reinstall the matching runtime at container start every single time
  • It removes your own application code from the image, which then has to be mounted back in at run time as a volume
  • It strips out the bundled kernel, so the resulting image can only ever run inside a dedicated virtual machine

When is a FROM scratch base actually viable?

  • For a statically linked binary that needs nothing from the OS, bundled with its TLS certs into an otherwise empty image
  • For practically any application, since all the missing system libraries are simply loaded from the host kernel at run time
  • For interpreted languages like Python or Ruby, which conveniently carry their own complete runtime inside the script itself
  • For apps that install all of their own runtime dependencies with apt the very first time they start

You got correct