Cost Optimization
AWS bills grow in surprising ways — a forgotten NAT Gateway, a dev cluster nobody turns off, a CloudWatch metric published per second per user. Cost optimization is partly architectural choice and partly operational discipline.
The principle behind all of it: pay for what you actually use, and know what you are paying for.
Tag, Right-Size, and Stop
Tagging is the highest-impact habit and the most underinvested — a consistent scheme (Environment, Owner, Application, CostCenter) makes Cost Explorer genuinely useful, enforced with Config rules and SCPs. The most common overage is not premium rates but paying any rate for unused resources.
Stop development environments outside business hours (a dev db.r6g.large costs the same as a production one), act on Compute Optimizer right-sizing recommendations, delete idle resources (unattached volumes, old snapshots), and cap retention defaults — CloudWatch Logs defaults to never-expire.
Commit, Tier, and Watch the Network
For steady baseline usage, commit to discounts: Savings Plans (the flexible default, up to ~70% off), Reserved Instances, Reserved DB Instances, and Spot (up to ~90% off) for fault-tolerant batch. Cover the baseline you are confident in and use On-Demand or Spot on top. Tier storage with S3 Lifecycle policies and the right EBS volume type.
Network charges surprise teams: NAT Gateway egress (use VPC endpoints), cross-AZ transfer, cross-Region transfer, and internet egress (CloudFront helps — origin-to-edge is free). Watch high-cardinality CloudWatch metrics, which cost per metric per month.
Monitor and Architect for Cost
Treat the bill like operational metrics: set a monthly Budget on every account (alerts at 50/80/100%), enable Cost Anomaly Detection, and review Cost Explorer monthly with tag-based attribution. Serverless billing is excellent at low-to-medium volume; consider provisioned alternatives once a workload is steady and high.
Architectural choices compound: CloudFront in front of origins, S3 + Spectrum for cold history, the right service for the workload shape (Aurora for OLTP, not Redshift), and Bedrock or pre-built AI APIs over training your own.
On-Demand — the variable part of the workload on top of committed baseline — no commitment, highest rate.
Savings Plans / Reserved — the steady baseline — a 1–3 year commitment for up to ~70% off.
Spot — fault-tolerant batch, CI, and training — up to ~90% off, reclaimable on two minutes' notice.
- Running with untagged resources, making cost attribution a guessing game.
- Leaving development environments running 24/7 instead of stopping them outside business hours.
- Running steady production workloads entirely On-Demand instead of covering the baseline with Savings Plans or Reserved Instances.
- Routing AWS-service traffic through NAT Gateways instead of free Gateway endpoints (S3, DynamoDB).
- Publishing high-cardinality custom CloudWatch metrics (per user, per request), which cost per metric per month.
- Reviewing cost only when the bill spikes instead of setting Budgets and Cost Anomaly Detection to catch it automatically.
- Tag everything (Environment, Owner, Application) and enforce it with Config rules and SCPs.
- Stop unused resources, right-size with Compute Optimizer, and cap retention defaults.
- Commit to Savings Plans or Reserved Instances for steady baselines; use Spot for fault-tolerant work.
- Tier storage with S3 Lifecycle policies and use VPC endpoints and CloudFront to cut network cost.
- Set a monthly Budget and Cost Anomaly Detection on every account; review Cost Explorer monthly by tag.
- Make cost-aware architectural choices (CloudFront, S3+Spectrum, right service per workload).
Knowledge Check
What is the single highest-impact cost-optimization habit?
- Consistent tagging, so Cost Explorer can attribute spend by team, project, and environment
- Subscribing to Shield Advanced across every workload running in the account
- Standardizing on the largest instance types so there is always plenty of headroom to grow into
- Disabling CloudWatch metrics and alarms to trim the monitoring line item
How should you cover a steady, predictable baseline of compute?
- With Savings Plans or Reserved Instances, using On-Demand or Spot for the variable part on top
- Entirely On-Demand at full published rates, for maximum flexibility with no one-year or three-year commitment of any kind
- Entirely Spot capacity, since it consistently offers the cheapest possible per-hour rate
- By over-provisioning fixed capacity well up front so you never have to scale at all
What is the most common AWS cost overage?
- Paying any rate at all for resources nobody is using, like dev environments running 24/7
- Paying premium On-Demand rates across a relatively small fleet of instances
- CloudFront data-transfer charges for outbound traffic to end users
- KMS key storage fees combined with the per-request charges on each cryptographic API call
How should the AWS bill be monitored?
- Like operational metrics — staged Budgets alerts, Cost Anomaly Detection, and monthly tag-based reviews
- Only when the finance team flags an unexpected problem after the invoice arrives
- Once a year at contract renewal, reviewing all account spend, service by service, across the prior twelve months
- Not at all — AWS right-sizes and shuts down idle resources for you automatically
You got correct