Topic 18

Resource Limits — CPU, Memory, OOM

CgroupsLimits

By default a container can use all of the host's CPU and memory — there is no limit until you set one, and one runaway container can take down every other container on the machine. Resource limits are cgroup settings exposed on docker run: --memory caps RAM, --cpus caps CPU. The asymmetry that bites people is the consequence of exceeding each one. Going over the memory limit gets a process killed by the kernel's OOM killer; going over the CPU limit just makes it slower.

That asymmetry is the whole topic. Memory is a wall you must not hit; CPU is a speed limit you can safely sit against all day. Treating them the same — sizing memory loosely the way you would CPU — is how a container "keeps dying for no reason" that turns out to be the kernel doing exactly what you told it to.

Memory Is a Hard Cap

--memory=512m sets a cgroup memory limit. When the container's processes try to exceed it, the kernel's OOM killer terminates a process inside the container — usually the largest, which is often PID 1 — and the container exits 137. There is no graceful warning and no chance to recover; the kernel just kills. That is why memory limits are unforgiving and why you size them with headroom rather than to the exact working set.

CPU Is a Soft Throttle

--cpus=1.5 limits the container to 1.5 cores' worth of CPU time through cgroup quotas. Over the limit, the process is throttled — scheduled less often — not killed. A CPU-bound container pinned at its limit runs slower but stays alive indefinitely, which is the exact opposite consequence from memory. Chasing a "crash" that is really CPU throttling is a common time sink, because there is no crash to find.

Reading an OOM Kill

An OOM-killed container shows OOMKilled: true in docker inspect and exit code 137; the kernel log (dmesg) records which process it shot and why. This is the concrete signature behind "my container keeps dying for no reason" when the real cause is the memory cap being too low for the workload. Read those two fields before you go looking for a bug — the kernel leaves a clear fingerprint.

The fingerprint of a memory cap that's too low

$ docker run -d --name driftwood-web --memory=256m driftwood/web
$ docker stats --no-stream driftwood-web
NAME            MEM USAGE / LIMIT     MEM %     CPU %
driftwood-web   251.4MiB / 256MiB     98.2%     12.0%   ← right at the wall
# moments later the container is gone:
$ docker inspect --format '{{.State.OOMKilled}} {{.State.ExitCode}}' driftwood-web
true 137

Reservations vs Limits

--memory and --cpus are hard limits — ceilings the container cannot cross. --memory-reservation and --cpu-shares are soft hints used only under contention: a reservation is a floor the scheduler tries to honor when memory is tight, and shares are a relative weight when CPU is contested. On a single host with mixed workloads, limits prevent one container from starving the others, while reservations and shares bias the scheduler without capping anything. The two serve different jobs and are not interchangeable.

Three resource flags and what happens at the boundary

--memory

Hard cap on RAM. Exceed it and the kernel's OOM killer terminates a process — exit 137, OOMKilled: true. A wall you must not hit.

--cpus

Hard cap on CPU time. Exceed it and the container is throttled — scheduled less often, slower, but never killed. A speed limit you can sit against.

--memory-reservation

Soft limit. A floor the scheduler tries to honor only under memory contention — it biases, it does not cap.

Runtime Awareness and the OOM Trap

Modern runtimes read the cgroup limit and size their heaps to it — a recent JVM, recent Node, Python with the right flags. An older runtime that reads the host's total RAM instead will happily allocate past the container limit and get OOM-killed for it. If the runtime thinks it has 64 GB because that is what the host has, but the container is capped at 512 MB, the kill is guaranteed the moment the heap grows. The limit must match what the runtime believes it has, or you have set up a deterministic OOM.

Memory Limit vs CPU Limit

Exceeding --memory gets a process OOM-killed by the kernel — exit 137, OOMKilled: true. It's a hard cap with a fatal consequence, so size it to the measured working set plus headroom and treat it as a wall you must not hit.
Exceeding --cpus gets the container throttled — it runs slower but never dies. Treat CPU as a speed limit you can safely sit against, and accept throttling as the (non-fatal) cost of capping a noisy workload.

Common Mistakes

Running the Driftwood web container with no --memory limit on a shared host — a memory leak in the app grows until it consumes all host RAM and the kernel starts OOM-killing other containers, taking down db and proxy with it.
Setting --memory too tight for the real working set and seeing the container die with 137 repeatedly — it's an OOM kill, not a crash; the fix is a correct limit (or fixing the leak), found by reading OOMKilled and dmesg.
Assuming a CPU limit will kill or crash a busy container — it only throttles, so a CPU-bound container under --cpus is slow, not dead; chasing a "crash" that's actually throttling wastes time.
Running an old JVM, Node, or Python that reads host RAM instead of the cgroup limit, sizing its heap to the whole machine, and getting OOM-killed inside a small container — the runtime must be cgroup-aware or told the limit explicitly.
Setting --cpu-shares and expecting a hard cap — shares are only a relative weight under contention; on an idle host a container with low shares still uses all the CPU it wants. --cpus is the hard limit.

Best Practices

Set --memory on every container on a shared host so one workload's leak can't OOM-kill its neighbors, sizing it to the measured working set plus headroom.
Use --cpus to cap CPU for noisy workloads, accepting throttling as the safe consequence, rather than --cpu-shares when you need an actual ceiling.
Read docker inspect for OOMKilled: true and the 137 exit code before treating repeated deaths as an application bug — the kernel, not the app, is doing the killing.
Make the runtime cgroup-aware (a modern JVM or Node, or explicit heap flags) so it sizes itself to the container's memory limit instead of the host's total.

Comparable tools Linux cgroup v2 the kernel controls these flags configure directly Podman exposes the identical --memory/--cpus flags systemd slices MemoryMax and CPUQuota set the same kernel limits for non-container processes Kubernetes resource requests/limits map directly to reservations/limits and the same OOM behavior (Ch12)

Knowledge Check

Why does exceeding --memory kill a process while exceeding --cpus only slows it down?

Memory is a hard cap enforced by the OOM killer; CPU is a quota the scheduler enforces by throttling
The CPU limit is only an advisory hint to the scheduler while the memory limit is the one actually enforced by the kernel
The kernel can reclaim CPU back from a running process at will but cannot reclaim its memory, so it kills the process instead
Exceeding the CPU cap is fatal because the scheduler stops the process outright, while exceeding memory is harmless

A container exits 137 with OOMKilled: true. What is the correct next step?

Treat it as an OOM kill from the memory cap — raise --memory to the real working set or fix the leak
Search the application logs for the unhandled exception in the code that produced and returned exit code 137
Raise the --cpus limit, since an exit of 137 indicates the container was CPU-throttled all the way to death
Add --restart always so the OOM-killed container comes back on its own and the problem resolves itself over time

You set --cpu-shares on a container expecting it to never exceed a fixed amount of CPU. Why doesn't that work?

--cpu-shares is only a relative weight under contention — on an idle host it uses all the CPU it wants
--cpu-shares is silently ignored entirely unless it is paired with a --memory limit, which this container lacked
--cpu-shares only caps and limits the first CPU core, so all the other cores on the host run completely uncapped
--cpu-shares resets its counter every second, so the cap only holds for the first second of each one-second interval

Why does an old JVM get OOM-killed in a container capped at 512 MB even with a conservative heap setting?

It reads the host's total RAM instead of the cgroup limit and sizes its heap to the whole machine, blowing past the cap
Docker under-reports half the limit to the JVM, so the 512 MB cap looks like only 256 MB and the heap quickly overflows
The JVM actively disables the container's cgroup memory limit at startup, removing the cap entirely so it can grow
The JVM's garbage collector runs far too rarely, letting dead garbage objects exceed 512 MB before a collection runs

You got correct