All Articles

AI-Driven Capacity Planning for Kubernetes Clusters (2026)

How to use AI and machine learning for Kubernetes capacity planning. Covers predictive autoscaling, cost optimization, tools like StormForge and Kubecost, and building custom ML models for resource forecasting.

DevOpsBoysMar 28, 20265 min read
Share:Tweet

Your Kubernetes cluster is either overprovisioned (wasting money) or underprovisioned (risking outages). Traditional capacity planning uses rules of thumb and gut feelings — "add 30% buffer" or "scale when CPU hits 70%." These approaches either waste 40% of your cloud budget or leave you one traffic spike away from downtime.

AI-driven capacity planning replaces guesswork with data. Machine learning models analyze historical patterns, predict future demand, and automatically right-size your infrastructure. Here's how to implement it.

The Problem with Manual Capacity Planning

Traditional capacity planning fails because:

  1. Traffic patterns are complex — workloads have hourly, daily, weekly, and seasonal patterns. Humans can't model multi-dimensional trends.

  2. Resource requests are wrong — developers set CPU/memory requests once during deployment and never update them. Studies show 65% of Kubernetes resource requests are overprovisioned by 2x or more.

  3. Reactive scaling is too late — Horizontal Pod Autoscaler (HPA) reacts to current metrics. By the time CPU hits 80%, latency has already spiked. You need to scale before the load arrives.

  4. Cost optimization conflicts with reliability — without predictive models, teams err on the side of overprovisioning to avoid risk.

How AI Capacity Planning Works

Historical Data → Feature Engineering → ML Model → Predictions → Actions
                                                          ↓
                                                   ┌──────────────┐
                                                   │ Scale pods   │
                                                   │ Resize nodes │
                                                   │ Right-size   │
                                                   │ requests     │
                                                   └──────────────┘

Data sources:

  • Prometheus metrics (CPU, memory, network, custom metrics)
  • Request latency and error rates
  • Historical scaling events
  • External signals (calendar events, marketing campaigns, deployments)

Predictions:

  • "Tomorrow at 9 AM, the API service will need 12 pods instead of 6"
  • "This deployment's actual memory usage is 256 MB, but requests are set to 1 GB"
  • "Black Friday will require 3x current node capacity"

Tool 1: StormForge — ML-Powered Resource Optimization

StormForge uses machine learning to continuously right-size pod resource requests and limits:

bash
# Install StormForge agent
helm repo add stormforge https://registry.stormforge.io/chartrepo/library
helm install stormforge-agent stormforge/stormforge-agent \
  --namespace stormforge-system \
  --create-namespace \
  --set clusterName=production

StormForge observes actual resource usage over time and generates optimized recommendations:

yaml
# Example StormForge recommendation
apiVersion: optimize.stormforge.io/v1beta2
kind: Recommendation
metadata:
  name: api-server
spec:
  target:
    kind: Deployment
    name: api-server
  recommendations:
    - containerName: api
      resources:
        requests:
          cpu: "250m"      # Was: 1000m (75% reduction)
          memory: "384Mi"   # Was: 1Gi (62% reduction)
        limits:
          cpu: "500m"
          memory: "512Mi"

In auto-pilot mode, StormForge applies these changes automatically:

yaml
apiVersion: optimize.stormforge.io/v1beta2
kind: Application
metadata:
  name: api-server
spec:
  resources:
    - kubernetes:
        selector:
          matchLabels:
            app: api-server
  optimization:
    mode: auto  # Automatically apply recommendations
    settings:
      reliability: high  # Conservative right-sizing

Tool 2: Kubecost — Cost Visibility + Recommendations

Kubecost provides cost allocation and right-sizing recommendations:

bash
helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer/ \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="YOUR_TOKEN"

Query the Kubecost API for savings:

bash
# Get right-sizing recommendations
curl -s http://kubecost:9090/model/savings/requestSizing | jq '.[] | {
  namespace: .namespace,
  controller: .controllerName,
  currentCPU: .currentRequest.cpu,
  recommendedCPU: .recommendedRequest.cpu,
  currentMemory: .currentRequest.memory,
  recommendedMemory: .recommendedRequest.memory,
  monthlySavings: .monthlySavings
}'

Example output:

json
{
  "namespace": "production",
  "controllerName": "order-service",
  "currentCPU": "2000m",
  "recommendedCPU": "450m",
  "currentMemory": "2Gi",
  "recommendedMemory": "768Mi",
  "monthlySavings": 87.50
}

Tool 3: KEDA — Event-Driven Predictive Scaling

KEDA (Kubernetes Event-Driven Autoscaling) can scale based on external metrics, including ML model predictions:

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-server-scaler
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 3
  maxReplicaCount: 50
  triggers:
    # Scale based on Prometheus metrics
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_per_second
        threshold: "100"
        query: sum(rate(http_requests_total{app="api-server"}[2m]))
 
    # Scale based on predicted traffic from ML model
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: predicted_traffic
        threshold: "100"
        query: predict_linear(http_requests_total{app="api-server"}[1h], 3600)

The predict_linear function in Prometheus uses linear regression to forecast metrics. For more sophisticated predictions, push ML model outputs as custom metrics.

Building a Custom Prediction Pipeline

For teams that want full control, here's how to build a custom capacity predictor:

Step 1: Collect Historical Data

python
# fetch_metrics.py
from prometheus_api_client import PrometheusConnect
 
prom = PrometheusConnect(url="http://prometheus:9090")
 
# Fetch 30 days of CPU usage
cpu_data = prom.custom_query_range(
    query='sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (deployment)',
    start_time=datetime.now() - timedelta(days=30),
    end_time=datetime.now(),
    step="5m"
)
 
# Fetch 30 days of memory usage
memory_data = prom.custom_query_range(
    query='sum(container_memory_working_set_bytes{namespace="production"}) by (deployment)',
    start_time=datetime.now() - timedelta(days=30),
    end_time=datetime.now(),
    step="5m"
)

Step 2: Train a Prediction Model

python
# train_model.py
import pandas as pd
from prophet import Prophet
 
def train_cpu_predictor(deployment_name: str, cpu_data: list) -> Prophet:
    """Train a Prophet model for CPU prediction."""
    df = pd.DataFrame(cpu_data)
    df.columns = ["ds", "y"]  # Prophet requires these column names
    df["ds"] = pd.to_datetime(df["ds"], unit="s")
 
    model = Prophet(
        changepoint_prior_scale=0.05,
        seasonality_mode="multiplicative",
        daily_seasonality=True,
        weekly_seasonality=True,
    )
    model.fit(df)
    return model
 
def predict_next_24h(model: Prophet) -> pd.DataFrame:
    """Predict CPU usage for the next 24 hours."""
    future = model.make_future_dataframe(periods=288, freq="5min")  # 24h at 5m intervals
    forecast = future[["ds"]].copy()
    forecast = model.predict(future)
    return forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail(288)

Step 3: Convert Predictions to Scaling Decisions

python
# scaler.py
from kubernetes import client, config
 
config.load_incluster_config()
apps_v1 = client.AppsV1Api()
 
def calculate_replicas(predicted_cpu: float, cpu_per_pod: float = 0.5, target_utilization: float = 0.7) -> int:
    """Calculate required replicas from predicted CPU."""
    required_capacity = predicted_cpu / target_utilization
    replicas = max(2, int(required_capacity / cpu_per_pod) + 1)  # Min 2 for HA
    return replicas
 
def scale_deployment(namespace: str, deployment: str, replicas: int):
    """Pre-scale a deployment based on prediction."""
    current = apps_v1.read_namespaced_deployment(deployment, namespace)
    current_replicas = current.spec.replicas
 
    if replicas != current_replicas:
        apps_v1.patch_namespaced_deployment_scale(
            deployment, namespace,
            {"spec": {"replicas": replicas}}
        )
        print(f"Scaled {deployment}: {current_replicas}{replicas}")

Step 4: Run as a CronJob

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: capacity-predictor
  namespace: platform
spec:
  schedule: "*/15 * * * *"  # Every 15 minutes
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: capacity-predictor
          containers:
            - name: predictor
              image: ghcr.io/my-org/capacity-predictor:latest
              env:
                - name: PROMETHEUS_URL
                  value: "http://prometheus.monitoring:9090"
                - name: TARGET_NAMESPACE
                  value: "production"
          restartPolicy: OnFailure

Real-World Impact: Before and After

A mid-size SaaS company (500 pods, $45K/month AWS bill) implemented AI capacity planning:

MetricBeforeAfterChange
Monthly cloud cost$45,000$28,000-38%
Average CPU utilization22%58%+164%
P99 latency during traffic spikes1200ms350ms-71%
OOMKilled events per month121-92%
Manual scaling interventions8/month0-100%

The savings came from:

  • Right-sizing overprovisioned pods (biggest impact)
  • Predictive scaling before traffic spikes (eliminated latency spikes)
  • Automated node pool optimization (fewer idle nodes)

Getting Started: A 4-Week Plan

Week 1: Visibility

  • Install Kubecost or OpenCost
  • Export Prometheus metrics for 2+ weeks
  • Identify your top 10 most overprovisioned deployments

Week 2: Quick wins

  • Apply Kubecost right-sizing recommendations for non-critical workloads
  • Set up VPA (Vertical Pod Autoscaler) in recommend-only mode
  • Track cost reduction

Week 3: Predictive scaling

  • Deploy KEDA with Prometheus triggers
  • Use predict_linear for simple forecasting
  • Set up pre-scaling for known traffic patterns (business hours, batch jobs)

Week 4: ML models

  • Train Prophet models on historical data
  • Deploy the prediction CronJob
  • Set up monitoring for prediction accuracy

For learning the Kubernetes fundamentals that make capacity planning effective, KodeKloud offers excellent hands-on labs covering resource management, autoscaling, and cluster operations.


The best capacity plan isn't the one with the most buffer — it's the one that knows what's coming.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments