Topic 25

Resource Requests and Limits

ResourcesScheduling

The CPU and memory numbers you set on a container are the most consequential configuration in Kubernetes. A request is what the scheduler reserves; a limit is the ceiling the runtime enforces. Get them wrong and you get bad placement, throttled apps, killed Pods, and surprise bills.

Almost every scheduling and reliability behavior downstream — bin-packing, eviction, autoscaling, quality of service — reads from these two numbers. They are worth understanding precisely.

Requests Drive Scheduling

A request is the amount of CPU and memory a container is guaranteed. The scheduler uses requests, and only requests, to decide which node a Pod fits on — it sums the requests of everything on a node and places the Pod where the requests still fit. A Pod with no request can be scheduled anywhere and is the first to be evicted under pressure, because Kubernetes assumes it needs nothing.

Requests and limits on a container

spec:
  containers:
    - name: app
      image: my-app:1.0
      resources:
        requests:
          cpu: "250m"        # 0.25 of a core, reserved
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"     # hard ceiling

Limits Enforce Differently for CPU and Memory

This is the subtlety that bites hardest: CPU and memory limits behave completely differently. CPU is compressible — exceed the CPU limit and the container is throttled, slowed but not killed. Memory is incompressible — exceed the memory limit and the container is OOM-killed outright. So a too-low CPU limit causes mysterious latency; a too-low memory limit causes crashes. They are not symmetric knobs.

Quality of Service Classes

Kubernetes derives a Pod's QoS class from its requests and limits, and uses it to decide eviction order under node pressure. Guaranteed (requests equal limits for every resource) is evicted last. Burstable (requests set, limits higher or absent) is in the middle. BestEffort (no requests or limits) is evicted first. You do not set the class directly — you set requests and limits, and the class follows.

QoS class	Condition	Eviction order
Guaranteed	requests == limits for all resources	Last
Burstable	requests set, limits higher/absent	Middle
BestEffort	no requests or limits	First

The Gap Is Risk

The space between request and limit is overcommit. Set them equal and you get predictable, Guaranteed behavior but waste capacity that is reserved-but-idle. Set limits far above requests and you pack more in, but a burst can push a node into memory pressure and trigger evictions. Right-sizing means measuring real usage and setting requests near the typical need and limits near the safe ceiling. The platform offers LimitRange to apply sane defaults per namespace so no Pod ships with nothing set.

Request vs limit

Request — the guaranteed, reserved amount; the only thing the scheduler considers for placement and the basis of QoS.

Limit — the hard ceiling the runtime enforces — CPU throttled, memory OOM-killed when exceeded.

Common Mistakes

Setting no requests, so the scheduler places the Pod blindly and evicts it first under pressure.
Setting a CPU limit too low and causing latency that looks like an application bug, not throttling.
Setting a memory limit too low and getting silent OOMKills under load.
Copy-pasting resource values between unrelated workloads instead of measuring each.
Treating CPU and memory limits as symmetric — one throttles, the other kills.

Best Practices

Set requests on every container based on measured typical usage so scheduling and eviction behave.
Be cautious with CPU limits; many teams set CPU requests but omit CPU limits to avoid needless throttling.
Set memory limits close to the real ceiling and watch for OOMKills as the signal to raise them.
Use Guaranteed QoS (requests == limits) for latency-critical or stateful Pods that must not be evicted.
Apply LimitRange defaults per namespace so nothing runs as BestEffort by accident.

RelatedHPA / VPA — consume these numbers to scale (Topics 27-28)PodDisruptionBudgets & QoS — eviction interacts with QoS (Topic 30)Cloud instance right-sizing — the node-level version of the same problem

Knowledge Check

What does the scheduler use to decide which node a Pod fits on?

The Pod's resource requests — it sums requests on a node and places where they fit
The Pod's resource limits, which it sums and compares against the node's total capacity
The Pod's actual measured CPU and memory usage right now
The Pod's QoS class alone, ranked against other Pods

How do CPU and memory limits differ when exceeded?

CPU is throttled (compressible); memory triggers an OOM-kill (incompressible)
Both are throttled by the container runtime until usage falls back under the limit
Both kill the container immediately on the first breach
Memory is throttled while CPU triggers an OOM-kill

Which Pods are evicted first under node memory pressure?

BestEffort Pods (no requests or limits)
Guaranteed Pods (requests equal limits)
Whichever Pod was scheduled most recently
Pods with the highest CPU limit

You got correct