Chapter Five

Scheduling and Scaling

How Kubernetes decides where Pods run and how many — requests and limits, the scheduler's filters, the autoscalers, health probes, and the budgets that protect availability.

6 topics

Two questions decide cluster behavior under load: where does each Pod run, and how many copies exist. Get the inputs wrong and you get evictions, throttling, and pages at 3 a.m.

This chapter covers the levers: requests and limits that drive both scheduling and eviction, the scheduler's filter-and-score logic with affinity and taints, the three autoscalers, the probes that gate traffic and restarts, and the budgets and QoS classes that decide who survives pressure.

Topics in This Chapter

Resource Requests and Limits

The single most consequential numbers you set. Requests drive scheduling; limits drive throttling and OOM-kills. The gap between them is risk.

ResourcesScheduling

Filter then score: how the scheduler picks a node, and how you steer it with node affinity, pod affinity, taints, and tolerations.

SchedulingPlacement

Horizontal Pod Autoscaler

Scaling replica count on CPU, memory, or custom metrics. The control loop, the stabilization window, and why it needs requests set.

AutoscalingMetrics

VPA and Cluster Autoscaler

Right-sizing Pods (VPA) and adding or removing nodes (Cluster Autoscaler). How they interact, and where they conflict with the HPA.

AutoscalingNodes

Liveness, readiness, and startup probes — what each one gates, and how a misconfigured probe turns a healthy app into a restart loop.

HealthLifecycle

Disruption Budgets and QoS

PodDisruptionBudgets that protect availability during drains, and the Guaranteed/Burstable/BestEffort classes that decide eviction order.

AvailabilityPriority