Resource Requests and Limits
The CPU and memory numbers you set on a container are the most consequential configuration in Kubernetes. A request is what the scheduler reserves; a limit is the ceiling the runtime enforces. Get them wrong and you get bad placement, throttled apps, killed Pods, and surprise bills.
Almost every scheduling and reliability behavior downstream — bin-packing, eviction, autoscaling, quality of service — reads from these two numbers. They are worth understanding precisely.
Requests Drive Scheduling
A request is the amount of CPU and memory a container is guaranteed. The scheduler uses requests, and only requests, to decide which node a Pod fits on — it sums the requests of everything on a node and places the Pod where the requests still fit. A Pod with no request can be scheduled anywhere and is the first to be evicted under pressure, because Kubernetes assumes it needs nothing.
spec: containers: - name: app image: my-app:1.0 resources: requests: cpu: "250m" # 0.25 of a core, reserved memory: "256Mi" limits: cpu: "500m" memory: "512Mi" # hard ceiling
Limits Enforce Differently for CPU and Memory
This is the subtlety that bites hardest: CPU and memory limits behave completely differently. CPU is compressible — exceed the CPU limit and the container is throttled, slowed but not killed. Memory is incompressible — exceed the memory limit and the container is OOM-killed outright. So a too-low CPU limit causes mysterious latency; a too-low memory limit causes crashes. They are not symmetric knobs.
Quality of Service Classes
Kubernetes derives a Pod's QoS class from its requests and limits, and uses it to decide eviction order under node pressure. Guaranteed (requests equal limits for every resource) is evicted last. Burstable (requests set, limits higher or absent) is in the middle. BestEffort (no requests or limits) is evicted first. You do not set the class directly — you set requests and limits, and the class follows.
| QoS class | Condition | Eviction order |
|---|---|---|
| Guaranteed | requests == limits for all resources | Last |
| Burstable | requests set, limits higher/absent | Middle |
| BestEffort | no requests or limits | First |
The Gap Is Risk
The space between request and limit is overcommit. Set them equal and you get predictable, Guaranteed behavior but waste capacity that is reserved-but-idle. Set limits far above requests and you pack more in, but a burst can push a node into memory pressure and trigger evictions. Right-sizing means measuring real usage and setting requests near the typical need and limits near the safe ceiling. The platform offers LimitRange to apply sane defaults per namespace so no Pod ships with nothing set.
Request — the guaranteed, reserved amount; the only thing the scheduler considers for placement and the basis of QoS.
Limit — the hard ceiling the runtime enforces — CPU throttled, memory OOM-killed when exceeded.
- Setting no requests, so the scheduler places the Pod blindly and evicts it first under pressure.
- Setting a CPU limit too low and causing latency that looks like an application bug, not throttling.
- Setting a memory limit too low and getting silent OOMKills under load.
- Copy-pasting resource values between unrelated workloads instead of measuring each.
- Treating CPU and memory limits as symmetric — one throttles, the other kills.
- Set requests on every container based on measured typical usage so scheduling and eviction behave.
- Be cautious with CPU limits; many teams set CPU requests but omit CPU limits to avoid needless throttling.
- Set memory limits close to the real ceiling and watch for OOMKills as the signal to raise them.
- Use Guaranteed QoS (requests == limits) for latency-critical or stateful Pods that must not be evicted.
- Apply LimitRange defaults per namespace so nothing runs as BestEffort by accident.
Knowledge Check
What does the scheduler use to decide which node a Pod fits on?
- The Pod's resource requests — it sums requests on a node and places where they fit
- The Pod's resource limits, which it sums and compares against the node's total capacity
- The Pod's actual measured CPU and memory usage right now
- The Pod's QoS class alone, ranked against other Pods
How do CPU and memory limits differ when exceeded?
- CPU is throttled (compressible); memory triggers an OOM-kill (incompressible)
- Both are throttled by the container runtime until usage falls back under the limit
- Both kill the container immediately on the first breach
- Memory is throttled while CPU triggers an OOM-kill
Which Pods are evicted first under node memory pressure?
- BestEffort Pods (no requests or limits)
- Guaranteed Pods (requests equal limits)
- Whichever Pod was scheduled most recently
- Pods with the highest CPU limit
You got correct