AI-Driven Capacity Planning for Kubernetes Clusters (2026)
How to use AI and machine learning for Kubernetes capacity planning. Covers predictive autoscaling, cost optimization, tools like StormForge and Kubecost, and building custom ML models for resource forecasting.
Your Kubernetes cluster is either overprovisioned (wasting money) or underprovisioned (risking outages). Traditional capacity planning uses rules of thumb and gut feelings — "add 30% buffer" or "scale when CPU hits 70%." These approaches either waste 40% of your cloud budget or leave you one traffic spike away from downtime.
AI-driven capacity planning replaces guesswork with data. Machine learning models analyze historical patterns, predict future demand, and automatically right-size your infrastructure. Here's how to implement it.
The Problem with Manual Capacity Planning
Traditional capacity planning fails because:
-
Traffic patterns are complex — workloads have hourly, daily, weekly, and seasonal patterns. Humans can't model multi-dimensional trends.
-
Resource requests are wrong — developers set CPU/memory requests once during deployment and never update them. Studies show 65% of Kubernetes resource requests are overprovisioned by 2x or more.
-
Reactive scaling is too late — Horizontal Pod Autoscaler (HPA) reacts to current metrics. By the time CPU hits 80%, latency has already spiked. You need to scale before the load arrives.
-
Cost optimization conflicts with reliability — without predictive models, teams err on the side of overprovisioning to avoid risk.
How AI Capacity Planning Works
Historical Data → Feature Engineering → ML Model → Predictions → Actions
↓
┌──────────────┐
│ Scale pods │
│ Resize nodes │
│ Right-size │
│ requests │
└──────────────┘
Data sources:
- Prometheus metrics (CPU, memory, network, custom metrics)
- Request latency and error rates
- Historical scaling events
- External signals (calendar events, marketing campaigns, deployments)
Predictions:
- "Tomorrow at 9 AM, the API service will need 12 pods instead of 6"
- "This deployment's actual memory usage is 256 MB, but requests are set to 1 GB"
- "Black Friday will require 3x current node capacity"
Tool 1: StormForge — ML-Powered Resource Optimization
StormForge uses machine learning to continuously right-size pod resource requests and limits:
# Install StormForge agent
helm repo add stormforge https://registry.stormforge.io/chartrepo/library
helm install stormforge-agent stormforge/stormforge-agent \
--namespace stormforge-system \
--create-namespace \
--set clusterName=productionStormForge observes actual resource usage over time and generates optimized recommendations:
# Example StormForge recommendation
apiVersion: optimize.stormforge.io/v1beta2
kind: Recommendation
metadata:
name: api-server
spec:
target:
kind: Deployment
name: api-server
recommendations:
- containerName: api
resources:
requests:
cpu: "250m" # Was: 1000m (75% reduction)
memory: "384Mi" # Was: 1Gi (62% reduction)
limits:
cpu: "500m"
memory: "512Mi"In auto-pilot mode, StormForge applies these changes automatically:
apiVersion: optimize.stormforge.io/v1beta2
kind: Application
metadata:
name: api-server
spec:
resources:
- kubernetes:
selector:
matchLabels:
app: api-server
optimization:
mode: auto # Automatically apply recommendations
settings:
reliability: high # Conservative right-sizingTool 2: Kubecost — Cost Visibility + Recommendations
Kubecost provides cost allocation and right-sizing recommendations:
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer/ \
--namespace kubecost \
--create-namespace \
--set kubecostToken="YOUR_TOKEN"Query the Kubecost API for savings:
# Get right-sizing recommendations
curl -s http://kubecost:9090/model/savings/requestSizing | jq '.[] | {
namespace: .namespace,
controller: .controllerName,
currentCPU: .currentRequest.cpu,
recommendedCPU: .recommendedRequest.cpu,
currentMemory: .currentRequest.memory,
recommendedMemory: .recommendedRequest.memory,
monthlySavings: .monthlySavings
}'Example output:
{
"namespace": "production",
"controllerName": "order-service",
"currentCPU": "2000m",
"recommendedCPU": "450m",
"currentMemory": "2Gi",
"recommendedMemory": "768Mi",
"monthlySavings": 87.50
}Tool 3: KEDA — Event-Driven Predictive Scaling
KEDA (Kubernetes Event-Driven Autoscaling) can scale based on external metrics, including ML model predictions:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-server-scaler
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 3
maxReplicaCount: 50
triggers:
# Scale based on Prometheus metrics
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_per_second
threshold: "100"
query: sum(rate(http_requests_total{app="api-server"}[2m]))
# Scale based on predicted traffic from ML model
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: predicted_traffic
threshold: "100"
query: predict_linear(http_requests_total{app="api-server"}[1h], 3600)The predict_linear function in Prometheus uses linear regression to forecast metrics. For more sophisticated predictions, push ML model outputs as custom metrics.
Building a Custom Prediction Pipeline
For teams that want full control, here's how to build a custom capacity predictor:
Step 1: Collect Historical Data
# fetch_metrics.py
from prometheus_api_client import PrometheusConnect
prom = PrometheusConnect(url="http://prometheus:9090")
# Fetch 30 days of CPU usage
cpu_data = prom.custom_query_range(
query='sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (deployment)',
start_time=datetime.now() - timedelta(days=30),
end_time=datetime.now(),
step="5m"
)
# Fetch 30 days of memory usage
memory_data = prom.custom_query_range(
query='sum(container_memory_working_set_bytes{namespace="production"}) by (deployment)',
start_time=datetime.now() - timedelta(days=30),
end_time=datetime.now(),
step="5m"
)Step 2: Train a Prediction Model
# train_model.py
import pandas as pd
from prophet import Prophet
def train_cpu_predictor(deployment_name: str, cpu_data: list) -> Prophet:
"""Train a Prophet model for CPU prediction."""
df = pd.DataFrame(cpu_data)
df.columns = ["ds", "y"] # Prophet requires these column names
df["ds"] = pd.to_datetime(df["ds"], unit="s")
model = Prophet(
changepoint_prior_scale=0.05,
seasonality_mode="multiplicative",
daily_seasonality=True,
weekly_seasonality=True,
)
model.fit(df)
return model
def predict_next_24h(model: Prophet) -> pd.DataFrame:
"""Predict CPU usage for the next 24 hours."""
future = model.make_future_dataframe(periods=288, freq="5min") # 24h at 5m intervals
forecast = future[["ds"]].copy()
forecast = model.predict(future)
return forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(288)Step 3: Convert Predictions to Scaling Decisions
# scaler.py
from kubernetes import client, config
config.load_incluster_config()
apps_v1 = client.AppsV1Api()
def calculate_replicas(predicted_cpu: float, cpu_per_pod: float = 0.5, target_utilization: float = 0.7) -> int:
"""Calculate required replicas from predicted CPU."""
required_capacity = predicted_cpu / target_utilization
replicas = max(2, int(required_capacity / cpu_per_pod) + 1) # Min 2 for HA
return replicas
def scale_deployment(namespace: str, deployment: str, replicas: int):
"""Pre-scale a deployment based on prediction."""
current = apps_v1.read_namespaced_deployment(deployment, namespace)
current_replicas = current.spec.replicas
if replicas != current_replicas:
apps_v1.patch_namespaced_deployment_scale(
deployment, namespace,
{"spec": {"replicas": replicas}}
)
print(f"Scaled {deployment}: {current_replicas} → {replicas}")Step 4: Run as a CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: capacity-predictor
namespace: platform
spec:
schedule: "*/15 * * * *" # Every 15 minutes
jobTemplate:
spec:
template:
spec:
serviceAccountName: capacity-predictor
containers:
- name: predictor
image: ghcr.io/my-org/capacity-predictor:latest
env:
- name: PROMETHEUS_URL
value: "http://prometheus.monitoring:9090"
- name: TARGET_NAMESPACE
value: "production"
restartPolicy: OnFailureReal-World Impact: Before and After
A mid-size SaaS company (500 pods, $45K/month AWS bill) implemented AI capacity planning:
| Metric | Before | After | Change |
|---|---|---|---|
| Monthly cloud cost | $45,000 | $28,000 | -38% |
| Average CPU utilization | 22% | 58% | +164% |
| P99 latency during traffic spikes | 1200ms | 350ms | -71% |
| OOMKilled events per month | 12 | 1 | -92% |
| Manual scaling interventions | 8/month | 0 | -100% |
The savings came from:
- Right-sizing overprovisioned pods (biggest impact)
- Predictive scaling before traffic spikes (eliminated latency spikes)
- Automated node pool optimization (fewer idle nodes)
Getting Started: A 4-Week Plan
Week 1: Visibility
- Install Kubecost or OpenCost
- Export Prometheus metrics for 2+ weeks
- Identify your top 10 most overprovisioned deployments
Week 2: Quick wins
- Apply Kubecost right-sizing recommendations for non-critical workloads
- Set up VPA (Vertical Pod Autoscaler) in recommend-only mode
- Track cost reduction
Week 3: Predictive scaling
- Deploy KEDA with Prometheus triggers
- Use
predict_linearfor simple forecasting - Set up pre-scaling for known traffic patterns (business hours, batch jobs)
Week 4: ML models
- Train Prophet models on historical data
- Deploy the prediction CronJob
- Set up monitoring for prediction accuracy
For learning the Kubernetes fundamentals that make capacity planning effective, KodeKloud offers excellent hands-on labs covering resource management, autoscaling, and cluster operations.
The best capacity plan isn't the one with the most buffer — it's the one that knows what's coming.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
FinOps for DevOps Engineers: How to Cut Cloud Bills by 40% in 2026
Cloud costs are out of control at most companies. FinOps is the discipline that fixes it — and DevOps engineers are the most important people in any FinOps implementation. Here is everything you need to know.
GPU Diversification: Why NVIDIA's Kubernetes Monopoly Is Ending in 2026
NVIDIA has dominated GPU computing in Kubernetes for years. But AMD, Intel, and custom accelerators are breaking that monopoly. Here's why GPU diversification is inevitable.