Cloud Cost Optimisation
debt(d5/e5/b5/t5)
Closest to 'specialist tool catches' (d5). The term's detection_hints list aws-cost-explorer, infracost, and kubecost as tools that can identify waste patterns like <20% CPU usage, missing auto-scaling, and oversized instances. These are specialist FinOps/cost tools rather than default linters or compiler checks.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix mentions right-sizing instances, switching to spot instances for batch workers, and enabling auto-scaling. While individual fixes can be small (e.g., changing instance type), comprehensive cost optimisation requires infrastructure config changes across multiple services, lifecycle rules on storage, and auto-scaling policies—a significant refactor of infrastructure-as-code.
Closest to 'persistent productivity tax' (b5). Cloud cost optimisation applies broadly (web, cli contexts per applies_to) and requires ongoing attention. Without active management, waste accumulates continuously (30-40% mentioned in why_it_matters). Every new resource provisioning decision must consider cost, creating a persistent tax on all infrastructure work, though it doesn't quite define the system's architecture.
Closest to 'notable trap' (t5). The misconception explicitly states 'Cloud is always cheaper than on-premise' — this is a documented gotcha that developers eventually learn through painful bills. Many assume cloud auto-optimises or that flexibility equals efficiency, when reality requires active management. It's a notable trap but not catastrophic since the 'wrong way' doesn't break systems, just bleeds money.
Also Known As
TL;DR
Explanation
Cloud cost optimisation pillars: Right-sizing (use metrics to pick the correct instance size — most are 30-50% oversized), Pricing models (Reserved Instances save 40-60% vs on-demand, Spot Instances save 70-90% for interruptible workloads), Waste elimination (stop idle instances, delete unused snapshots and load balancers, use S3 lifecycle rules), Architecture (serverless for spiky workloads, auto-scaling to match demand, CDN to reduce origin requests). Tools: AWS Cost Explorer, CloudHealth, Infracost. Tag all resources for cost attribution.
Common Misconception
Why It Matters
Common Mistakes
- Production-sized environments running 24/7 for dev/staging — scale them down or stop outside working hours.
- No auto-scaling — static instances sized for peak traffic pay for peak capacity 24/7.
- On-demand pricing for stable baseline workloads — Reserved Instances save 40-60% for predictable load.
- S3 buckets accumulating old versions and logs forever — apply lifecycle rules.
Code Examples
// No cost controls — surprise bill:
// 10 r5.2xlarge EC2 instances running 24/7 on-demand
// Dev, staging, and prod all same size
// 50 EBS snapshots never cleaned up
// S3 bucket: 5 years of uncompressed access logs
// Monthly bill: $12,000
// Could be: $3,000 with optimisation
// Cost-optimised architecture:
// Production: 3x r5.large Reserved (1yr) + auto-scaling
// Staging: stopped 18:00-08:00 via Lambda scheduler
// Dev: t3.small on-demand, stop when idle
// Spot instances for batch jobs: 80% cost reduction
// S3 lifecycle: transition to Glacier after 90 days
// Monthly bill: $3,200 (73% reduction)
// Infracost in CI:
// infracost diff --path . shows cost impact of each PR