FinOps & Cloud Cost Engineering Roadmap
Complete FinOps roadmap for DevOps engineers. Learn cloud cost optimization, rightsizing, Spot instances, Kubernetes cost allocation, and building a FinOps culture.
FinOps Foundations & Culture
FinOps is a practice, not just a tool
What to learn
- FinOps framework — inform, optimize, operate lifecycle
- Cloud unit economics — cost per request, per user, per transaction
- Shared responsibility — engineers own costs, finance enables
- Tagging strategy — mandatory tags for allocation
- Showback vs chargeback — cost visibility models
- FinOps Foundation certification overview (FOCP)
Key tools
AWS Cost Optimization
Every AWS service has a cheaper path
What to learn
- Savings Plans — Compute SP, EC2 Instance SP, SageMaker SP
- Reserved Instances — 1-year vs 3-year, standard vs convertible
- Spot Instances — interruption handling, Spot Fleet, mixed fleets
- EC2 rightsizing — Compute Optimizer recommendations
- S3 cost optimization — storage tiers, lifecycle policies, intelligent tiering
- NAT Gateway costs — often biggest surprise bill item
- Data transfer costs — in-region free, cross-region expensive
Key tools
Kubernetes Cost Visibility
K8s clusters are black boxes without the right tools
What to learn
- Kubecost — namespace, deployment, and team cost allocation
- OpenCost — CNCF open-source cost monitoring standard
- Request vs limit analysis — find over-provisioned workloads
- Idle resource detection — pods using <10% of requests
- Cost per namespace, per team, per application
- Chargeback with Kubecost — automated team billing
Key tools
Resource Rightsizing
Most K8s clusters are 40–60% over-provisioned
What to learn
- VPA recommendations — use for requests baseline, not limits enforcement
- Memory vs CPU rightsizing — different analysis strategies
- Namespace resource quotas — prevent cost surprises
- LimitRange defaults — ensure all pods have resources set
- Node rightsizing — pick right instance family for workload type
- Karpenter bin-packing — consolidate underutilized nodes
Key tools
Spot & Preemptible Workloads
Run 70% cheaper with the right architecture
What to learn
- Spot instance interruption — 2-minute warning, graceful shutdown
- Karpenter Spot support — automatic fallback to on-demand
- Stateless vs stateful — what can run on Spot safely
- Spot for batch workloads — training jobs, data processing
- Pod disruption budgets for Spot — maintain minimum availability
- Spot diversification — multiple instance types and AZs
Key tools
Observability Cost Control
Monitoring itself can become your biggest bill
What to learn
- Prometheus cardinality — high-cardinality labels kill TSDB performance and cost
- Recording rules — pre-compute expensive queries
- Metrics retention policies — shorter retention for dev, longer for prod
- Log sampling — reduce Loki ingestion for high-volume debug logs
- Datadog / New Relic cost — custom metrics pricing, log retention tiers
- OpenTelemetry sampling — tail-based sampling to reduce trace volume
Key tools
FinOps Culture & Automation
Embed cost awareness into every engineering decision
What to learn
- Infracost in CI/CD — show cost diff on every Terraform PR
- Budget alerts — notify teams when spend exceeds threshold
- Anomaly detection — catch cost spikes before month-end surprise
- FinOps dashboards — per-team Grafana cost visibility
- Engineering culture — cost as a non-functional requirement
- Regular cost reviews — weekly spend review cadence
Key tools
Interview Prep
DevOps Interview Prep Bundle — 1000+ Q&A
Every topic on this roadmap has interview questions in the bundle — Docker, Kubernetes, AWS, CI/CD, Linux, SRE, FinOps, System Design. Grab it before your next interview.