Chapter Five
Scheduling and Scaling
How Kubernetes decides where Pods run and how many — requests and limits, the scheduler's filters, the autoscalers, health probes, and the budgets that protect availability.
Two questions decide cluster behavior under load: where does each Pod run, and how many copies exist. Get the inputs wrong and you get evictions, throttling, and pages at 3 a.m.
This chapter covers the levers: requests and limits that drive both scheduling and eviction, the scheduler's filter-and-score logic with affinity and taints, the three autoscalers, the probes that gate traffic and restarts, and the budgets and QoS classes that decide who survives pressure.
Topics in This Chapter
Topic 25
Resource Requests and Limits
The single most consequential numbers you set. Requests drive scheduling; limits drive throttling and OOM-kills. The gap between them is risk.
Topic 26
The Scheduler
Filter then score: how the scheduler picks a node, and how you steer it with node affinity, pod affinity, taints, and tolerations.
Topic 27
Horizontal Pod Autoscaler
Scaling replica count on CPU, memory, or custom metrics. The control loop, the stabilization window, and why it needs requests set.
Topic 28
VPA and Cluster Autoscaler
Right-sizing Pods (VPA) and adding or removing nodes (Cluster Autoscaler). How they interact, and where they conflict with the HPA.
Topic 29
Probes
Liveness, readiness, and startup probes — what each one gates, and how a misconfigured probe turns a healthy app into a restart loop.
Topic 30
Disruption Budgets and QoS
PodDisruptionBudgets that protect availability during drains, and the Guaranteed/Burstable/BestEffort classes that decide eviction order.