Cost and Efficiency
Topic 68

Cost and Efficiency

CostEfficiency

Kubernetes makes it easy to spend more than you need, and the waste is mostly invisible until the bill arrives. This topic is the set of practices that keep spend proportional to value — the cost lens applied to the levers from across the course.

The recurring theme: idle reserved capacity, over-requested Pods, and unattributable cost are where money leaks. Closing those without harming reliability is the whole game.

Right-Size from Reality

The biggest waste is the gap between what Pods request (and so reserve) and what they use — the scheduler reserves on requests, so over-requested Pods leave nodes half-empty that you fully pay for (Topic 25). Right-size requests from measured usage, using VPA recommendations to find the numbers (Topic 28). This single practice typically reclaims the most, because over-provisioning "to be safe" is so common.

Pack Densely and Scale to Demand

Let the scheduler and Cluster Autoscaler consolidate Pods onto fewer, fuller nodes and remove the empties — which requires workloads to be movable (PDBs, no node-local storage) so consolidation actually reclaims nodes. Scale workloads to demand with the HPA, and to zero where the pattern allows (KEDA for event-driven). Paying for capacity that sits idle overnight or between bursts is avoidable.

Match Capacity to Work

Not all work needs full-price, always-on nodes. Use spot/preemptible capacity for fault-tolerant, restartable workloads (batch, stateless behind enough replicas); reserved/committed capacity for the steady baseline; and on-demand only for what must not be interrupted (Topic 53). And watch the costs that hide off the compute line — storage (provisioned-but-unused volumes) and especially cross-zone and egress traffic, which is easy to ignore until it dominates a bill.

LeverEffect
Right-size requestsReclaim reserved-but-idle capacity (usually the biggest win)
Bin-pack + autoscaleFewer, fuller nodes; scale to demand and to zero
Spot / reserved mixCheaper capacity matched to interruption tolerance
Watch storage + egressCatch the costs hiding off the compute line

Make Waste Visible — Without Cutting Reliability

You cannot manage what you cannot see. Cost attribution — labeling workloads by team/product and using OpenCost/Kubecost to allocate node cost back to namespaces — turns an opaque bill into accountable line items, which is what actually drives teams to clean up (showback/chargeback). The one rule that overrides all the others: never cut reliability to save cost. Dropping replicas, PDBs, or headroom to shave the bill trades a small saving for outage risk, and an outage costs far more than the nodes. Efficiency means removing waste, not removing resilience.

Removing waste vs cutting reliability

Removing waste — right-size, bin-pack, spot, scale-to-demand, attribute. Cuts cost with no downside.

Cutting reliability — fewer replicas, no PDB, no headroom. A false economy — outage cost dwarfs the saving.

Common Mistakes
  • Over-requesting "to be safe," reserving far more than is used and running half-empty nodes.
  • Running fault-tolerant batch on full-price on-demand instead of spot.
  • Ignoring storage and egress costs until they dominate the bill.
  • No cost attribution, so waste is invisible and unowned.
  • Cutting replicas, PDBs, or headroom to save money and inviting an outage.
Best Practices
  • Right-size requests from measured usage; it usually reclaims the most.
  • Bin-pack and scale to demand (and to zero where possible) with movable workloads.
  • Match capacity to interruption tolerance: spot for batch, reserved for baseline, on-demand for critical.
  • Attribute cost by team/product so waste is visible and owned.
  • Never trade reliability for cost — remove waste, not resilience.
RelatedRequests and limits — over-requesting is the main waste (Topic 25)Autoscalers — bin-packing and scale-to-demand (Topics 27-28)Cluster cost management — the operations-chapter treatment (Topic 53)

Knowledge Check

What practice usually reclaims the most wasted spend?

  • Right-sizing requests from measured usage to close the reserved-but-idle gap
  • Buying larger nodes with more CPU and memory allocatable per machine
  • Adding more replicas to spread load more evenly across the fleet
  • Disabling the HPA so Pod counts stay fixed and predictable at the peak replica number around the clock

Which capacity fits fault-tolerant, restartable batch work?

  • Spot/preemptible nodes reclaimed at any time
  • On-demand nodes paid at the full hourly rate
  • Reserved committed capacity sized for a steady baseline
  • Control-plane nodes running the API server and etcd

What is the one rule that overrides other cost levers?

  • Never cut reliability (replicas, PDBs, headroom) to save cost — an outage costs more
  • Always pick the cheapest available node type regardless of the workload's interruption tolerance or resource profile
  • Remove all autoscaling so monthly spend stays flat and predictable
  • Run every workload on spot capacity to maximize the discount

You got correct