Build an AI Kubernetes Resource Rightsizer with Claude API
Build a Python script that reads kubectl top output and current resource requests/limits, sends it to Claude API (claude-haiku-4-5), and gets back specific CPU/memory rightsizing recommendations to cut cloud costs by 30-40%.
Kubernetes resource requests and limits are almost always wrong. Teams set them once at deployment time and forget them. Over-provisioned pods waste money. Under-provisioned pods cause OOMKills and throttling. Manual rightsizing takes hours per cluster.
This guide builds a Python script that reads actual pod usage from kubectl top, compares it against current requests/limits, and uses Claude API to generate specific rightsizing recommendations — the kind your SRE team would take hours to produce manually.
What We're Building
A script that:
- Pulls pod CPU/memory usage via
kubectl top pods --all-namespaces - Reads current requests/limits via
kubectl get pods - Sends both to Claude claude-haiku-4-5-20251001 (cheap + fast for structured analysis)
- Outputs a rightsizing report with specific new values per pod
Cost to run: roughly $0.002 per cluster scan with Haiku.
Prerequisites
pip install anthropic kubernetesYou need kubectl configured with access to your cluster and an Anthropic API key.
export ANTHROPIC_API_KEY="sk-ant-..."Full Python Script
import subprocess
import json
import os
from anthropic import Anthropic
client = Anthropic()
def run_kubectl(args: list[str]) -> str:
result = subprocess.run(
["kubectl"] + args,
capture_output=True,
text=True,
timeout=30
)
if result.returncode != 0:
raise RuntimeError(f"kubectl error: {result.stderr}")
return result.stdout
def get_pod_usage() -> list[dict]:
"""Get actual CPU/memory usage from kubectl top."""
output = run_kubectl(["top", "pods", "--all-namespaces", "--no-headers"])
pods = []
for line in output.strip().split("\n"):
if not line:
continue
parts = line.split()
if len(parts) >= 4:
pods.append({
"namespace": parts[0],
"name": parts[1],
"cpu_usage": parts[2], # e.g. "45m"
"memory_usage": parts[3] # e.g. "128Mi"
})
return pods
def get_pod_resources() -> list[dict]:
"""Get current requests/limits for all pods."""
output = run_kubectl([
"get", "pods", "--all-namespaces",
"-o", "json"
])
data = json.loads(output)
resources = []
for item in data.get("items", []):
namespace = item["metadata"]["namespace"]
name = item["metadata"]["name"]
for container in item["spec"].get("containers", []):
res = container.get("resources", {})
requests = res.get("requests", {})
limits = res.get("limits", {})
resources.append({
"namespace": namespace,
"pod": name,
"container": container["name"],
"cpu_request": requests.get("cpu", "not set"),
"memory_request": requests.get("memory", "not set"),
"cpu_limit": limits.get("cpu", "not set"),
"memory_limit": limits.get("memory", "not set"),
})
return resources
def merge_usage_and_resources(
usage: list[dict],
resources: list[dict]
) -> list[dict]:
"""Match usage data with resource config."""
usage_map = {
f"{p['namespace']}/{p['name']}": p for p in usage
}
merged = []
for r in resources:
key = f"{r['namespace']}/{r['pod']}"
u = usage_map.get(key, {})
merged.append({
**r,
"cpu_usage": u.get("cpu_usage", "unknown"),
"memory_usage": u.get("memory_usage", "unknown"),
})
return merged
def get_rightsizing_recommendations(pod_data: list[dict]) -> str:
"""Send pod data to Claude and get rightsizing recommendations."""
# Limit to first 30 pods to keep prompt size manageable
sample = pod_data[:30]
pod_summary = "\n".join([
f"- {p['namespace']}/{p['pod']} ({p['container']}): "
f"usage={p['cpu_usage']} CPU / {p['memory_usage']} RAM | "
f"request={p['cpu_request']} CPU / {p['memory_request']} RAM | "
f"limit={p['cpu_limit']} CPU / {p['memory_limit']} RAM"
for p in sample
])
prompt = f"""You are a Kubernetes cost optimization expert. Analyze the pod resource usage vs requests/limits below and provide rightsizing recommendations.
Pod data (format: namespace/pod (container): usage | request | limit):
{pod_summary}
For each pod that is significantly over-provisioned or under-provisioned, recommend new values.
Rules:
- CPU request should be ~1.5x the average usage
- Memory request should be ~1.3x the average usage
- CPU limit should be 2-3x the request (allow for burst)
- Memory limit should be 1.5x the request (OOMKill prevention)
- Skip pods where "not set" or "unknown" - they need manual review
- Flag any pod using >80% of its limit as at-risk
Respond in this format:
## Rightsizing Recommendations
### Over-provisioned (safe to reduce):
[list pods with current vs recommended values]
### Under-provisioned (at risk, increase immediately):
[list pods with current vs recommended values]
### At-risk (using >80% of limit):
[list pods]
### Estimated cost savings:
[rough estimate based on reduction in CPU/memory requests]"""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=2000,
messages=[
{"role": "user", "content": prompt}
]
)
return response.content[0].text
def main():
print("Fetching pod usage from kubectl top...")
usage = get_pod_usage()
print(f"Found {len(usage)} pods with usage data")
print("Fetching current resource requests/limits...")
resources = get_pod_resources()
print(f"Found {len(resources)} container resource configs")
print("Merging data...")
merged = merge_usage_and_resources(usage, resources)
print("Sending to Claude claude-haiku-4-5 for analysis...")
recommendations = get_rightsizing_recommendations(merged)
print("\n" + "="*60)
print(recommendations)
print("="*60)
# Save to file
with open("rightsizing-report.md", "w") as f:
f.write(recommendations)
print("\nReport saved to rightsizing-report.md")
if __name__ == "__main__":
main()Sample Output
==============================
## Rightsizing Recommendations
### Over-provisioned (safe to reduce):
- **production/payment-api (app)**
Current: request=500m CPU / 1Gi RAM, limit=1000m CPU / 2Gi RAM
Usage: 45m CPU / 128Mi RAM
Recommended: request=70m CPU / 170Mi RAM, limit=150m CPU / 260Mi RAM
Savings: ~430m CPU, ~860Mi RAM per replica (3 replicas = significant)
- **staging/frontend-v2 (nginx)**
Current: request=200m CPU / 512Mi RAM, limit=500m CPU / 1Gi RAM
Usage: 8m CPU / 32Mi RAM
Recommended: request=15m CPU / 45Mi RAM, limit=30m CPU / 70Mi RAM
### Under-provisioned (at risk, increase immediately):
- **production/ml-inference (model-server)**
Current: request=500m CPU / 2Gi RAM, limit=500m CPU / 2Gi RAM
Usage: 498m CPU / 1.9Gi RAM
Recommended: request=750m CPU / 2.5Gi RAM, limit=1500m CPU / 3.8Gi RAM
Risk: CPU throttling and near OOMKill threshold
### At-risk (using >80% of limit):
- production/ml-inference — 99% CPU, 95% memory (CRITICAL)
- production/redis-cache — 87% memory
### Estimated cost savings:
Reducing over-provisioned pods could save approximately 2.1 vCPU and 6Gi RAM
across 8 pods. On EKS with m5.xlarge nodes (~$0.192/hr), this represents
roughly $35-50/month savings or an opportunity to remove 1-2 nodes.
==============================
Apply the Changes
Once you have the recommendations, update your Helm values or Kubernetes manifests:
# values.yaml
resources:
requests:
cpu: "70m"
memory: "170Mi"
limits:
cpu: "150m"
memory: "260Mi"Then roll out with zero downtime:
helm upgrade my-app ./chart -f values.yamlRun as a Weekly CronJob
Add to your CI/CD or run as a Kubernetes CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: rightsizer
spec:
schedule: "0 9 * * 1" # Every Monday at 9am
jobTemplate:
spec:
template:
spec:
serviceAccountName: rightsizer-sa
containers:
- name: rightsizer
image: python:3.12-slim
command: ["python", "/scripts/rightsizer.py"]
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic-secret
key: api-key
restartPolicy: OnFailureThe RBAC for the service account needs get/list on pods:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rightsizer-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list"]Real-World Results
Teams running this weekly on EKS clusters with 50-200 pods typically find:
- 20-40% of pods are over-provisioned by 3x or more
- 5-10% are under-provisioned and actively throttled
- Rightsizing saves $200-800/month on mid-size clusters
The key insight Claude provides isn't just the math — it's the prioritization: which pods are at-risk right now vs which are safe to reduce gradually. That prioritization saves SRE time and prevents incidents from aggressive rightsizing.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Argo Workflows vs Prefect vs Airflow — Best for ML Pipelines 2026
Choosing a workflow orchestrator for your ML pipelines? Argo Workflows, Prefect, and Apache Airflow each have distinct strengths. Here's which to pick for your use case.
Best DevOps Tools Every Engineer Should Know in 2026
A comprehensive guide to the essential DevOps tools for containers, CI/CD, infrastructure, monitoring, and security — curated for practicing engineers.
Build an AI AWS Security Auditor with Claude API and Boto3
Use Python, boto3, and the Claude API to automatically audit your AWS environment for security misconfigurations and get AI-powered remediation recommendations.