VPA and Cluster Autoscaler
The HPA adds replicas; two other autoscalers handle the other axes. The Vertical Pod Autoscaler (VPA) right-sizes a Pod's CPU and memory requests. The Cluster Autoscaler adds and removes nodes so Pods always have somewhere to run. Together they decide how much compute the cluster has and how it is divided.
The three autoscalers operate on different things — replica count, per-Pod resources, and node count — and they interact in ways that range from complementary to conflicting. Knowing which does what, and where they collide, is the point of this topic.
The Vertical Pod Autoscaler
The VPA observes a workload's actual CPU and memory use over time and recommends, or applies, better request values. It runs in modes: Off (recommend only), Initial (set requests at Pod creation), and Auto (update running Pods). The catch with Auto is that changing a Pod's requests requires recreating it — so the VPA evicts and restarts Pods to resize them, which is disruptive for workloads that don't tolerate restarts.
The Cluster Autoscaler
The Cluster Autoscaler watches for Pods stuck Pending because no node has room, and adds nodes (from a configured node group) to fit them. Conversely, when nodes sit underused and their Pods could fit elsewhere, it drains and removes them to save money. It is the bridge between the scheduler's "no node fits" and actual infrastructure — scaling the cluster itself, not the workloads.
Scale-down is the tricky direction. The autoscaler will not remove a node if doing so would violate a PodDisruptionBudget, evict Pods with no controller, or move Pods using local storage — so a single unmovable Pod can pin an expensive node forever. Designing workloads to be movable is what lets scale-down actually work.
How the Three Interact
The HPA and Cluster Autoscaler compose well: the HPA adds replicas, those replicas go Pending, the Cluster Autoscaler adds nodes — horizontal scaling end to end. The HPA and VPA conflict when pointed at the same resource metric: the HPA wants more replicas because CPU is high, the VPA wants bigger Pods for the same reason, and they fight. The supported pattern is HPA on a custom/external metric while VPA manages CPU/memory, or simply not running both on the same signal.
| Autoscaler | Changes | Trigger |
|---|---|---|
| HPA | Replica count | A metric vs target (Topic 27) |
| VPA | Per-Pod CPU/memory requests | Observed usage over time |
| Cluster Autoscaler | Node count | Pending Pods / underused nodes |
Faster Node Provisioning
The classic Cluster Autoscaler scales pre-defined node groups. Newer provisioners (Karpenter on AWS, and similar approaches) skip fixed node groups and provision right-sized nodes directly from the cloud to fit Pending Pods, often faster and with better bin-packing. The principle is the same — react to unschedulable Pods by adding capacity — but the granularity and speed improve. On managed clusters this is increasingly the default.
HPA — more or fewer replicas, on a load metric. Workload scaling.
VPA — bigger or smaller Pods, on observed usage. Right-sizing — restarts Pods in Auto mode.
Cluster Autoscaler — more or fewer nodes, on Pending Pods. Infrastructure scaling.
- Running VPA in Auto mode on workloads that can't tolerate the restarts it uses to resize Pods.
- Running HPA and VPA on the same resource metric, so they fight.
- Expecting scale-down to work while Pods lack PDBs, have no controller, or use node-local storage.
- Assuming the Cluster Autoscaler can split one oversized Pod across nodes — it can't; the Pod must fit one node.
- Forgetting that node scale-up takes minutes, so bursty load needs headroom, not just autoscaling.
- Use VPA in Off/recommend mode to right-size, applying changes during planned restarts, on restart-tolerant workloads.
- Keep HPA and VPA off the same metric — HPA on a custom signal, VPA on CPU/memory.
- Make workloads movable (PDBs, controllers, no node-local storage) so scale-down can reclaim nodes.
- Pair the HPA with the Cluster Autoscaler so replica growth turns into node growth automatically.
- Keep some node headroom for bursts, since provisioning new nodes is not instant.
Knowledge Check
Why is the VPA's Auto mode disruptive?
- Changing a Pod's requests requires recreating it, so the VPA evicts and restarts Pods to resize them
- It deletes the whole Deployment object and recreates it from scratch every time it adjusts a request
- It scales the entire cluster down to zero nodes while resizing
- It fully drains every node hosting the affected Pods
What triggers the Cluster Autoscaler to add a node?
- Pods stuck Pending because no existing node has room
- High CPU utilization on the Pods already running
- A new Deployment being created in the namespace
- The HPA finally reaching its configured maxReplicas ceiling
Why can a single Pod prevent the Cluster Autoscaler from removing an underused node?
- Scale-down won't evict a Pod that violates a PDB, has no controller, or uses node-local storage
- A node can never be removed once it has joined the cluster and scheduled even one Pod onto itself
- The autoscaler only ever adds nodes and never removes any of them
- The Pod's attached HPA actively blocks the node removal
You got correct