Chapter 12: The Ecosystem
Topic 73

containerd and runc Under the Hood

RuntimeDaemon

Chapter 1 traced the chain dockerdockerdcontainerdrunc. This topic removes dockerd from the picture and looks at what is left: containerd, the daemon that manages images and container lifecycle, and runc, the reference OCI runtime that makes the actual namespace and cgroup syscalls.

This is the layer Kubernetes talks to directly through the CRI, and the layer where you swap in a stronger sandbox without touching your images. Knowing what lives below Docker is what lets you debug a Kubernetes node — where there is no dockerd at all — and reason about isolation as a runtime choice rather than an image rebuild.

Below Docker: containerd, the shim, and runc
containerd
containerd-shim
runc
container process

containerd — Image and Lifecycle Manager

containerd pulls and stores OCI images, manages the overlay snapshots from Chapter 6, and supervises the container lifecycle — create, start, stop, delete. It is the durable daemon: it survives a dockerd restart, and it is the component a Kubernetes node actually runs as its CRI implementation. When you removed the Docker daemon, containerd is still there doing the real work; dockerd was a high-level API and convenience layer on top of it.

runc — The Syscalls, Then Exit

containerd calls runc once per container. runc reads the runtime-spec bundle from the previous topic, makes the clone and unshare namespace calls, writes the cgroup limits, execs the process — and then exits. It does not stay resident. The container keeps running, parented not by runc but by a lightweight shim that containerd left in place.

Driving containerd directly with ctr — pull, then run, no dockerd involved
$ ctr image pull registry.driftwood.example/driftwood/web:1.4.0
$ ctr run --rm \
    registry.driftwood.example/driftwood/web:1.4.0 \
    driftwood-web

# ctr is the low-level debug client; the human-facing path is nerdctl:
$ nerdctl run -d -p 8080:8080 \
    --name driftwood-web \
    registry.driftwood.example/driftwood/web:1.4.0
$ nerdctl ps

ctr exposes containerd's raw task API with no namespace-management niceties, which is why the same image takes more ceremony to run than under Docker. nerdctl wraps the same containerd calls in a near-identical Docker CLI, so the muscle memory transfers and the daemonless stack feels like Docker without dockerd.

The Shim, and Why It Matters

The containerd-shim keeps the container's stdio and exit status without a long-lived runc sitting around. That indirection is the reason you can restart or upgrade containerd itself without killing the running containers — they are parented by their shims, not by the daemon. On a node packed with dozens of workloads, that property is the difference between a routine containerd upgrade and an outage.

ctr and nerdctl — Driving It Directly

ctr is containerd's low-level debug client — sharp edges, no conveniences, for when you are debugging containerd itself. nerdctl is the near-drop-in Docker-CLI-compatible front end: nerdctl run, nerdctl ps, and nerdctl build with BuildKit all work against a containerd stack with no dockerd at all. This is the "Docker experience without Docker" path, and it is what you reach for when the daemon is the thing you are trying to remove.

Alternative Runtimes — Stronger Boundaries

Because runc honors the OCI runtime-spec, it swaps out for other runtimes that honor the same contract. crun is a faster C implementation; gVisor (runsc) is a user-space kernel that intercepts syscalls to build a sandbox; Kata Containers boots a real lightweight VM per container. You change the runtime, not the driftwood/web image, to trade performance for isolation — which is the practical payoff of the runtime-spec being a written standard.

None of these is free. gVisor adds syscall-interception overhead, and Kata pays the cost of a micro-VM boot and its memory footprint. They buy a harder boundary for untrusted or multi-tenant workloads, not a default you reach for on trusted code. The decision is per workload and per security need, and it never requires rebuilding the image — that is the whole point of the contract.

Common Mistakes
  • Thinking removing Docker removes your ability to run containers — containerd plus nerdctl runs the same OCI images with the same CLI ergonomics; the daemon you removed was a convenience layer, not the engine.
  • Confusing containerd (manages images and lifecycle, long-lived) with runc (sets up one container then exits) — getting which does what wrong makes node-level debugging incoherent.
  • Reaching for gVisor or Kata expecting zero cost — gVisor adds syscall-interception overhead and Kata boots a micro-VM; both buy isolation at a measurable latency and density price, so they are for untrusted or multi-tenant workloads, not the default.
  • Using ctr as a daily driver — it is a raw debug tool with sharp edges and no namespace-management niceties; nerdctl is the human-facing client and ctr is for when you are debugging containerd itself.
Best Practices
  • Recognize containerd as the real runtime on Kubernetes nodes, so you debug a misbehaving node with crictl and containerd tools rather than assuming Docker is present.
  • Swap runc for crun when CPU-bound container churn is the bottleneck, and for gVisor or Kata when the workload is untrusted — selecting the runtime per security need, not per habit.
  • Use nerdctl, not raw ctr, when operating a daemonless containerd stack, since it preserves the Docker-CLI muscle memory and BuildKit builds.
  • Treat the image as runtime-independent — the same driftwood/web runs under runc, crun, gVisor, or Kata — so isolation decisions never require a rebuild.
Comparable tools containerd · CRI-O the two main CRI runtimes Kubernetes nodes run runc · crun · gVisor · Kata the reference OCI runtime and stronger-boundary alternatives nerdctl · ctr · crictl the Docker-compatible front end and the low-level clients

Knowledge Check

What is the division of labor between containerd and runc?

  • containerd is the long-lived daemon managing images and lifecycle; runc sets up one container with the kernel syscalls, then exits
  • runc is the long-lived daemon managing images and snapshots; containerd is invoked once per container to wire up namespaces, then exits
  • containerd and runc are just two interchangeable names for the same long-running dockerd process that ships inside the Docker package
  • containerd makes the namespace and cgroup syscalls itself at create time while runc pulls, unpacks, and stores the images

Why does the containerd-shim exist?

  • It keeps the container's stdio and exit status so containerd can restart or upgrade without killing running containers
  • It keeps runc resident and parked in memory for the container's whole lifetime so it can supervise the process and reap it
  • It translates each Docker CLI command into the matching kernel namespace, cgroup, and mount syscalls at runtime
  • It sets up the container's network bridge, veth pair, host iptables rules, and DNS resolver config before the process starts

You have removed dockerd from a host but still need to run and build containers with familiar commands. What do you reach for?

  • nerdctl over containerd — a near-drop-in Docker CLI with BuildKit builds and no daemon to remove again
  • ctr as your daily driver for runs and builds, since it is containerd's intended, fully supported human-facing user interface
  • Reinstall dockerd, because containerd on its own cannot pull, unpack, or run OCI images without it
  • Rebuild every image into a proprietary containerd-native on-disk format before it will run

When is it worth swapping runc for gVisor or Kata Containers?

  • For untrusted or multi-tenant workloads where a stronger boundary justifies the syscall or micro-VM overhead
  • For every workload by default, since they are strictly faster, lighter, and denser than runc and add no syscall or boot overhead at all
  • Only after rebuilding the image into the runtime's own proprietary on-disk format and re-pushing it
  • When you need a different registry backend to store, serve, and replicate the images across hosts

You got correct