Cost and Efficiency
Kubernetes makes it easy to spend more than you need, and the waste is mostly invisible until the bill arrives. This topic is the set of practices that keep spend proportional to value — the cost lens applied to the levers from across the course.
The recurring theme: idle reserved capacity, over-requested Pods, and unattributable cost are where money leaks. Closing those without harming reliability is the whole game.
Right-Size from Reality
The biggest waste is the gap between what Pods request (and so reserve) and what they use — the scheduler reserves on requests, so over-requested Pods leave nodes half-empty that you fully pay for (Topic 25). Right-size requests from measured usage, using VPA recommendations to find the numbers (Topic 28). This single practice typically reclaims the most, because over-provisioning "to be safe" is so common.
Pack Densely and Scale to Demand
Let the scheduler and Cluster Autoscaler consolidate Pods onto fewer, fuller nodes and remove the empties — which requires workloads to be movable (PDBs, no node-local storage) so consolidation actually reclaims nodes. Scale workloads to demand with the HPA, and to zero where the pattern allows (KEDA for event-driven). Paying for capacity that sits idle overnight or between bursts is avoidable.
Match Capacity to Work
Not all work needs full-price, always-on nodes. Use spot/preemptible capacity for fault-tolerant, restartable workloads (batch, stateless behind enough replicas); reserved/committed capacity for the steady baseline; and on-demand only for what must not be interrupted (Topic 53). And watch the costs that hide off the compute line — storage (provisioned-but-unused volumes) and especially cross-zone and egress traffic, which is easy to ignore until it dominates a bill.
| Lever | Effect |
|---|---|
| Right-size requests | Reclaim reserved-but-idle capacity (usually the biggest win) |
| Bin-pack + autoscale | Fewer, fuller nodes; scale to demand and to zero |
| Spot / reserved mix | Cheaper capacity matched to interruption tolerance |
| Watch storage + egress | Catch the costs hiding off the compute line |
Make Waste Visible — Without Cutting Reliability
You cannot manage what you cannot see. Cost attribution — labeling workloads by team/product and using OpenCost/Kubecost to allocate node cost back to namespaces — turns an opaque bill into accountable line items, which is what actually drives teams to clean up (showback/chargeback). The one rule that overrides all the others: never cut reliability to save cost. Dropping replicas, PDBs, or headroom to shave the bill trades a small saving for outage risk, and an outage costs far more than the nodes. Efficiency means removing waste, not removing resilience.
Removing waste — right-size, bin-pack, spot, scale-to-demand, attribute. Cuts cost with no downside.
Cutting reliability — fewer replicas, no PDB, no headroom. A false economy — outage cost dwarfs the saving.
- Over-requesting "to be safe," reserving far more than is used and running half-empty nodes.
- Running fault-tolerant batch on full-price on-demand instead of spot.
- Ignoring storage and egress costs until they dominate the bill.
- No cost attribution, so waste is invisible and unowned.
- Cutting replicas, PDBs, or headroom to save money and inviting an outage.
- Right-size requests from measured usage; it usually reclaims the most.
- Bin-pack and scale to demand (and to zero where possible) with movable workloads.
- Match capacity to interruption tolerance: spot for batch, reserved for baseline, on-demand for critical.
- Attribute cost by team/product so waste is visible and owned.
- Never trade reliability for cost — remove waste, not resilience.
Knowledge Check
What practice usually reclaims the most wasted spend?
- Right-sizing requests from measured usage to close the reserved-but-idle gap
- Buying larger nodes with more CPU and memory allocatable per machine
- Adding more replicas to spread load more evenly across the fleet
- Disabling the HPA so Pod counts stay fixed and predictable at the peak replica number around the clock
Which capacity fits fault-tolerant, restartable batch work?
- Spot/preemptible nodes reclaimed at any time
- On-demand nodes paid at the full hourly rate
- Reserved committed capacity sized for a steady baseline
- Control-plane nodes running the API server and etcd
What is the one rule that overrides other cost levers?
- Never cut reliability (replicas, PDBs, headroom) to save cost — an outage costs more
- Always pick the cheapest available node type regardless of the workload's interruption tolerance or resource profile
- Remove all autoscaling so monthly spend stays flat and predictable
- Run every workload on spot capacity to maximize the discount
You got correct