Topic 28

VPA and Cluster Autoscaler

AutoscalingNodes

The HPA adds replicas; two other autoscalers handle the other axes. The Vertical Pod Autoscaler (VPA) right-sizes a Pod's CPU and memory requests. The Cluster Autoscaler adds and removes nodes so Pods always have somewhere to run. Together they decide how much compute the cluster has and how it is divided.

The three autoscalers operate on different things — replica count, per-Pod resources, and node count — and they interact in ways that range from complementary to conflicting. Knowing which does what, and where they collide, is the point of this topic.

The Vertical Pod Autoscaler

The VPA observes a workload's actual CPU and memory use over time and recommends, or applies, better request values. It runs in modes: Off (recommend only), Initial (set requests at Pod creation), and Auto (update running Pods). The catch with Auto is that changing a Pod's requests requires recreating it — so the VPA evicts and restarts Pods to resize them, which is disruptive for workloads that don't tolerate restarts.

The Cluster Autoscaler

The Cluster Autoscaler watches for Pods stuck Pending because no node has room, and adds nodes (from a configured node group) to fit them. Conversely, when nodes sit underused and their Pods could fit elsewhere, it drains and removes them to save money. It is the bridge between the scheduler's "no node fits" and actual infrastructure — scaling the cluster itself, not the workloads.

Scale-down is the tricky direction. The autoscaler will not remove a node if doing so would violate a PodDisruptionBudget, evict Pods with no controller, or move Pods using local storage — so a single unmovable Pod can pin an expensive node forever. Designing workloads to be movable is what lets scale-down actually work.

How the Three Interact

The HPA and Cluster Autoscaler compose well: the HPA adds replicas, those replicas go Pending, the Cluster Autoscaler adds nodes — horizontal scaling end to end. The HPA and VPA conflict when pointed at the same resource metric: the HPA wants more replicas because CPU is high, the VPA wants bigger Pods for the same reason, and they fight. The supported pattern is HPA on a custom/external metric while VPA manages CPU/memory, or simply not running both on the same signal.

Autoscaler	Changes	Trigger
HPA	Replica count	A metric vs target (Topic 27)
VPA	Per-Pod CPU/memory requests	Observed usage over time
Cluster Autoscaler	Node count	Pending Pods / underused nodes

Faster Node Provisioning

The classic Cluster Autoscaler scales pre-defined node groups. Newer provisioners (Karpenter on AWS, and similar approaches) skip fixed node groups and provision right-sized nodes directly from the cloud to fit Pending Pods, often faster and with better bin-packing. The principle is the same — react to unschedulable Pods by adding capacity — but the granularity and speed improve. On managed clusters this is increasingly the default.

HPA vs VPA vs Cluster Autoscaler

HPA — more or fewer replicas, on a load metric. Workload scaling.

VPA — bigger or smaller Pods, on observed usage. Right-sizing — restarts Pods in Auto mode.

Cluster Autoscaler — more or fewer nodes, on Pending Pods. Infrastructure scaling.

Common Mistakes

Running VPA in Auto mode on workloads that can't tolerate the restarts it uses to resize Pods.
Running HPA and VPA on the same resource metric, so they fight.
Expecting scale-down to work while Pods lack PDBs, have no controller, or use node-local storage.
Assuming the Cluster Autoscaler can split one oversized Pod across nodes — it can't; the Pod must fit one node.
Forgetting that node scale-up takes minutes, so bursty load needs headroom, not just autoscaling.

Best Practices

Use VPA in Off/recommend mode to right-size, applying changes during planned restarts, on restart-tolerant workloads.
Keep HPA and VPA off the same metric — HPA on a custom signal, VPA on CPU/memory.
Make workloads movable (PDBs, controllers, no node-local storage) so scale-down can reclaim nodes.
Pair the HPA with the Cluster Autoscaler so replica growth turns into node growth automatically.
Keep some node headroom for bursts, since provisioning new nodes is not instant.

RelatedHorizontal Pod Autoscaler — the replica axis (Topic 27)PodDisruptionBudgets — gate what scale-down may evict (Topic 30)Karpenter / cloud node autoscaling — faster node provisioning

Knowledge Check

Why is the VPA's Auto mode disruptive?

Changing a Pod's requests requires recreating it, so the VPA evicts and restarts Pods to resize them
It deletes the whole Deployment object and recreates it from scratch every time it adjusts a request
It scales the entire cluster down to zero nodes while resizing
It fully drains every node hosting the affected Pods

What triggers the Cluster Autoscaler to add a node?

Pods stuck Pending because no existing node has room
High CPU utilization on the Pods already running
A new Deployment being created in the namespace
The HPA finally reaching its configured maxReplicas ceiling

Why can a single Pod prevent the Cluster Autoscaler from removing an underused node?

Scale-down won't evict a Pod that violates a PDB, has no controller, or uses node-local storage
A node can never be removed once it has joined the cluster and scheduled even one Pod onto itself
The autoscaler only ever adds nodes and never removes any of them
The Pod's attached HPA actively blocks the node removal

You got correct