🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Build an AI-Powered DevOps Chatbot with Streamlit on Kubernetes

Build a DevOps assistant chatbot that answers infrastructure questions, generates kubectl commands, and explains errors — deployed as a Streamlit app on Kubernetes.

DevOpsBoysJun 3, 20265 min read
Share:Tweet

A chatbot that knows your infrastructure, answers DevOps questions in plain English, and runs inside your cluster where it has direct access to kubectl context.

Here's how to build and deploy it.


What We're Building

Engineer types: "Why is my deployment not rolling out?"

Chatbot responds:
"Your deployment my-app in namespace production has 0/3 pods ready.
Looking at the events, the issue is an ImagePullBackOff on the new
tag v2.1.0 — the image doesn't exist in your registry.

Run this to confirm:
kubectl describe deployment my-app -n production

Fix: push the correct image tag or revert:
kubectl rollout undo deployment/my-app -n production"

Architecture

Streamlit UI (browser)
    → FastAPI backend (in-cluster)
    → Claude API (for reasoning)
    → kubectl/K8s API (for live cluster data)

The key: the chatbot runs INSIDE the cluster so it can query real cluster state and include it in the AI context.


Setup

bash
pip install streamlit anthropic kubernetes fastapi uvicorn

Step 1: Kubernetes Context Tool

The chatbot calls these functions to get live cluster data:

python
# k8s_tools.py
from kubernetes import client, config
import json
 
def setup():
    try:
        config.load_incluster_config()   # Running inside cluster
    except:
        config.load_kube_config()        # Local dev
 
setup()
 
def get_pods(namespace: str = "default") -> str:
    v1 = client.CoreV1Api()
    pods = v1.list_namespaced_pod(namespace)
    result = []
    for pod in pods.items:
        status = pod.status.phase
        restarts = sum(c.restart_count for c in (pod.status.container_statuses or []))
        result.append(f"{pod.metadata.name}: {status}, restarts={restarts}")
    return "\n".join(result) or "No pods found"
 
def get_deployments(namespace: str = "default") -> str:
    apps = client.AppsV1Api()
    deps = apps.list_namespaced_deployment(namespace)
    result = []
    for d in deps.items:
        desired = d.spec.replicas or 0
        ready = d.status.ready_replicas or 0
        result.append(f"{d.metadata.name}: {ready}/{desired} ready")
    return "\n".join(result) or "No deployments found"
 
def get_recent_events(namespace: str = "default") -> str:
    v1 = client.CoreV1Api()
    events = v1.list_namespaced_event(namespace)
    # Only warning events, most recent 10
    warnings = [
        f"[{e.reason}] {e.involved_object.name}: {e.message}"
        for e in sorted(events.items, key=lambda x: x.last_timestamp or "", reverse=True)
        if e.type == "Warning"
    ]
    return "\n".join(warnings[:10]) or "No warning events"
 
def get_node_status() -> str:
    v1 = client.CoreV1Api()
    nodes = v1.list_node()
    result = []
    for node in nodes.items:
        conditions = {c.type: c.status for c in node.status.conditions}
        ready = conditions.get("Ready", "Unknown")
        result.append(f"{node.metadata.name}: Ready={ready}")
    return "\n".join(result)

Step 2: Claude-Powered Chat Backend

python
# chatbot.py
import anthropic
import os
from k8s_tools import get_pods, get_deployments, get_recent_events, get_node_status
 
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
 
SYSTEM_PROMPT = """You are a senior DevOps engineer assistant with direct access to a Kubernetes cluster.
 
You help engineers:
- Debug pod failures and deployment issues
- Generate kubectl commands for specific tasks
- Explain Kubernetes concepts and errors
- Suggest fixes for infrastructure problems
 
When answering, always:
1. Be specific and include actual commands
2. Explain WHY not just WHAT
3. Mention risks for destructive operations
4. Format kubectl commands in code blocks
 
You have access to real-time cluster data that will be provided with each question."""
 
def get_cluster_context(namespace: str = "default") -> str:
    """Build cluster context to include with every question."""
    return f"""Current cluster state (namespace: {namespace}):
 
PODS:
{get_pods(namespace)}
 
DEPLOYMENTS:
{get_deployments(namespace)}
 
RECENT WARNINGS:
{get_recent_events(namespace)}
 
NODES:
{get_node_status()}"""
 
def chat(messages: list, namespace: str = "default") -> str:
    """Send messages to Claude with cluster context."""
    
    cluster_context = get_cluster_context(namespace)
    
    # Add context to the latest user message
    enhanced_messages = messages[:-1] + [{
        "role": "user",
        "content": f"{messages[-1]['content']}\n\n---\n{cluster_context}"
    }]
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1500,
        system=SYSTEM_PROMPT,
        messages=enhanced_messages
    )
    
    return response.content[0].text

Step 3: Streamlit UI

python
# app.py
import streamlit as st
from chatbot import chat
 
st.set_page_config(
    page_title="DevOps Assistant",
    page_icon="⚙️",
    layout="wide"
)
 
st.title("⚙️ DevOps Assistant")
st.caption("Ask me anything about your Kubernetes cluster")
 
# Namespace selector in sidebar
with st.sidebar:
    st.header("Settings")
    namespace = st.selectbox(
        "Namespace",
        ["default", "production", "staging", "kube-system"],
        index=0
    )
    
    if st.button("Clear Chat"):
        st.session_state.messages = []
        st.rerun()
    
    st.divider()
    st.markdown("**Example questions:**")
    examples = [
        "Why aren't my pods starting?",
        "How do I rollback my deployment?",
        "What's using the most memory?",
        "Show me all failing pods",
        "How do I scale my app to 5 replicas?",
    ]
    for ex in examples:
        if st.button(ex, use_container_width=True):
            st.session_state.pending_question = ex
 
# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []
 
# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
 
# Handle example button clicks
if "pending_question" in st.session_state:
    prompt = st.session_state.pending_question
    del st.session_state.pending_question
else:
    prompt = st.chat_input("Ask about your cluster...")
 
if prompt:
    # Add user message
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    
    # Get AI response
    with st.chat_message("assistant"):
        with st.spinner("Checking cluster..."):
            response = chat(st.session_state.messages, namespace)
        st.markdown(response)
    
    st.session_state.messages.append({"role": "assistant", "content": response})

Step 4: Deploy to Kubernetes

dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY *.py .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.address=0.0.0.0", "--server.port=8501"]
yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: devops-chatbot
  namespace: tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: devops-chatbot
  template:
    spec:
      serviceAccountName: devops-chatbot  # needs K8s API access
      containers:
        - name: chatbot
          image: your-registry/devops-chatbot:latest
          ports:
            - containerPort: 8501
          env:
            - name: ANTHROPIC_API_KEY
              valueFrom:
                secretKeyRef:
                  name: chatbot-secrets
                  key: anthropic-key
---
# RBAC: read-only cluster access
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: devops-chatbot
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["pods", "deployments", "events", "nodes", "services", "jobs"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: devops-chatbot
subjects:
  - kind: ServiceAccount
    name: devops-chatbot
    namespace: tools
roleRef:
  kind: ClusterRole
  name: devops-chatbot
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: devops-chatbot
  namespace: tools
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth  # protect it!
spec:
  rules:
    - host: chatbot.internal.yourcompany.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: devops-chatbot
                port:
                  number: 8501

Add Conversation Memory

For multi-turn conversations where context matters:

python
# Keep last 10 messages (5 turns) to stay within token limits
MAX_HISTORY = 10
 
def chat_with_memory(new_message: str, history: list, namespace: str) -> tuple[str, list]:
    history.append({"role": "user", "content": new_message})
    
    # Trim history if too long
    if len(history) > MAX_HISTORY:
        history = history[-MAX_HISTORY:]
    
    response = chat(history, namespace)
    history.append({"role": "assistant", "content": response})
    
    return response, history

Cost

Each question: ~1000–2000 input tokens (cluster context + history) + 500 output tokens ≈ $0.005–0.01 per question at Claude Sonnet pricing.

For a team of 10 engineers asking 20 questions/day = $1–2/day. Cheaper than one hour of debugging time.

Get your Anthropic API key to start building. Deploy Streamlit apps on Kubernetes — KodeKloud has hands-on labs for the full deployment stack.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments