Resource Requests and Limits
Topic 25

Resource Requests and Limits

ResourcesScheduling

The CPU and memory numbers you set on a container are the most consequential configuration in Kubernetes. A request is what the scheduler reserves; a limit is the ceiling the runtime enforces. Get them wrong and you get bad placement, throttled apps, killed Pods, and surprise bills.

Almost every scheduling and reliability behavior downstream — bin-packing, eviction, autoscaling, quality of service — reads from these two numbers. They are worth understanding precisely.

Requests Drive Scheduling

A request is the amount of CPU and memory a container is guaranteed. The scheduler uses requests, and only requests, to decide which node a Pod fits on — it sums the requests of everything on a node and places the Pod where the requests still fit. A Pod with no request can be scheduled anywhere and is the first to be evicted under pressure, because Kubernetes assumes it needs nothing.

Requests and limits on a container
spec:
  containers:
    - name: app
      image: my-app:1.0
      resources:
        requests:
          cpu: "250m"        # 0.25 of a core, reserved
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"     # hard ceiling

Limits Enforce Differently for CPU and Memory

This is the subtlety that bites hardest: CPU and memory limits behave completely differently. CPU is compressible — exceed the CPU limit and the container is throttled, slowed but not killed. Memory is incompressible — exceed the memory limit and the container is OOM-killed outright. So a too-low CPU limit causes mysterious latency; a too-low memory limit causes crashes. They are not symmetric knobs.

Quality of Service Classes

Kubernetes derives a Pod's QoS class from its requests and limits, and uses it to decide eviction order under node pressure. Guaranteed (requests equal limits for every resource) is evicted last. Burstable (requests set, limits higher or absent) is in the middle. BestEffort (no requests or limits) is evicted first. You do not set the class directly — you set requests and limits, and the class follows.

QoS classConditionEviction order
Guaranteedrequests == limits for all resourcesLast
Burstablerequests set, limits higher/absentMiddle
BestEffortno requests or limitsFirst

The Gap Is Risk

The space between request and limit is overcommit. Set them equal and you get predictable, Guaranteed behavior but waste capacity that is reserved-but-idle. Set limits far above requests and you pack more in, but a burst can push a node into memory pressure and trigger evictions. Right-sizing means measuring real usage and setting requests near the typical need and limits near the safe ceiling. The platform offers LimitRange to apply sane defaults per namespace so no Pod ships with nothing set.

Request vs limit

Request — the guaranteed, reserved amount; the only thing the scheduler considers for placement and the basis of QoS.

Limit — the hard ceiling the runtime enforces — CPU throttled, memory OOM-killed when exceeded.

Common Mistakes
  • Setting no requests, so the scheduler places the Pod blindly and evicts it first under pressure.
  • Setting a CPU limit too low and causing latency that looks like an application bug, not throttling.
  • Setting a memory limit too low and getting silent OOMKills under load.
  • Copy-pasting resource values between unrelated workloads instead of measuring each.
  • Treating CPU and memory limits as symmetric — one throttles, the other kills.
Best Practices
  • Set requests on every container based on measured typical usage so scheduling and eviction behave.
  • Be cautious with CPU limits; many teams set CPU requests but omit CPU limits to avoid needless throttling.
  • Set memory limits close to the real ceiling and watch for OOMKills as the signal to raise them.
  • Use Guaranteed QoS (requests == limits) for latency-critical or stateful Pods that must not be evicted.
  • Apply LimitRange defaults per namespace so nothing runs as BestEffort by accident.
RelatedHPA / VPA — consume these numbers to scale (Topics 27-28)PodDisruptionBudgets & QoS — eviction interacts with QoS (Topic 30)Cloud instance right-sizing — the node-level version of the same problem

Knowledge Check

What does the scheduler use to decide which node a Pod fits on?

  • The Pod's resource requests — it sums requests on a node and places where they fit
  • The Pod's resource limits, which it sums and compares against the node's total capacity
  • The Pod's actual measured CPU and memory usage right now
  • The Pod's QoS class alone, ranked against other Pods

How do CPU and memory limits differ when exceeded?

  • CPU is throttled (compressible); memory triggers an OOM-kill (incompressible)
  • Both are throttled by the container runtime until usage falls back under the limit
  • Both kill the container immediately on the first breach
  • Memory is throttled while CPU triggers an OOM-kill

Which Pods are evicted first under node memory pressure?

  • BestEffort Pods (no requests or limits)
  • Guaranteed Pods (requests equal limits)
  • Whichever Pod was scheduled most recently
  • Pods with the highest CPU limit

You got correct