Build LLM-Powered Runbook Automation with Haystack and Kubernetes
Turn your static runbooks into an AI system that answers 'what do I do when X happens' with step-by-step instructions retrieved from your actual documentation.
Your runbooks are in Confluence. When an alert fires at 2 AM, your on-call engineer has to find the right page, read through it, and figure out what to do. This system does that for them — retrieves the relevant runbook section and generates step-by-step instructions.
Architecture: RAG for Runbooks
Alert fires (PagerDuty/Slack)
→ Extract: what service, what error
→ Haystack: find relevant runbook sections (vector search)
→ Claude: generate step-by-step fix using retrieved context
→ Slack: "Here's what to do:"
RAG (Retrieval Augmented Generation) means the AI uses YOUR runbooks — not generic knowledge — to answer questions.
Setup
pip install haystack-ai anthropic-haystack \
sentence-transformers qdrant-client anthropicStep 1: Index Your Runbooks
# indexer.py
from haystack import Document, Pipeline
from haystack.components.writers import DocumentWriter
from haystack.components.converters import MarkdownToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
# Setup vector store (Qdrant on Kubernetes)
document_store = QdrantDocumentStore(
url="http://qdrant.vector-db.svc.cluster.local:6333",
index="runbooks",
embedding_dim=384,
recreate_index=False,
)
def index_runbooks(runbook_dir: str):
"""Index all markdown runbook files."""
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", MarkdownToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(
split_by="sentence",
split_length=5, # 5 sentences per chunk
split_overlap=2, # 2 sentence overlap for context
))
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(
model="all-MiniLM-L6-v2" # Fast, good quality, 384 dims
))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")
# Find all runbook files
import glob
runbook_files = glob.glob(f"{runbook_dir}/**/*.md", recursive=True)
print(f"Indexing {len(runbook_files)} runbooks...")
indexing_pipeline.run({
"converter": {
"sources": runbook_files,
"meta": [{"source": f, "type": "runbook"} for f in runbook_files]
}
})
print(f"Indexed {document_store.count_documents()} chunks")
# Or index from text directly (e.g., from Confluence API)
def index_document(title: str, content: str, service: str):
"""Index a single runbook document."""
splitter = DocumentSplitter(split_by="sentence", split_length=5, split_overlap=2)
embedder = SentenceTransformersDocumentEmbedder(model="all-MiniLM-L6-v2")
doc = Document(content=content, meta={"title": title, "service": service})
chunks = splitter.run(documents=[doc])["documents"]
embedded = embedder.run(documents=chunks)["documents"]
document_store.write_documents(embedded)Step 2: RAG Query Pipeline
# rag_pipeline.py
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.components.generators.anthropic import AnthropicGenerator
from haystack.components.builders import PromptBuilder
RUNBOOK_PROMPT = """You are a DevOps on-call assistant. Use the provided runbook excerpts to give step-by-step instructions for the current incident.
Current Incident:
Service: {{ service }}
Alert: {{ alert }}
Error: {{ error_message }}
Relevant Runbook Sections:
{% for doc in documents %}
--- From: {{ doc.meta.get('title', 'Runbook') }} ---
{{ doc.content }}
{% endfor %}
Provide:
1. Immediate action (do this first)
2. Diagnosis steps (what to check)
3. Fix steps (in order)
4. Escalation criteria (when to wake someone else up)
Be specific with commands. Include actual kubectl/aws/etc commands from the runbooks."""
def create_rag_pipeline():
pipeline = Pipeline()
pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(
model="all-MiniLM-L6-v2"
))
pipeline.add_component("retriever", QdrantEmbeddingRetriever(
document_store=document_store,
top_k=5, # Retrieve 5 most relevant chunks
))
pipeline.add_component("prompt_builder", PromptBuilder(template=RUNBOOK_PROMPT))
pipeline.add_component("llm", AnthropicGenerator(
model="claude-sonnet-4-6",
generation_kwargs={"max_tokens": 1500}
))
pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.prompt")
return pipeline
rag_pipeline = create_rag_pipeline()
def get_runbook_guidance(service: str, alert: str, error_message: str) -> str:
"""Get runbook guidance for an incident."""
# Construct search query from incident details
query = f"{service} {alert} {error_message}"
result = rag_pipeline.run({
"text_embedder": {"text": query},
"prompt_builder": {
"service": service,
"alert": alert,
"error_message": error_message
}
})
return result["llm"]["replies"][0]Step 3: Slack Integration
# slack_handler.py
from slack_sdk import WebClient
import os
slack = WebClient(token=os.getenv("SLACK_BOT_TOKEN"))
def handle_alert(payload: dict):
"""Called when a PagerDuty/Alertmanager alert fires."""
service = payload.get("labels", {}).get("service", "unknown")
alertname = payload.get("labels", {}).get("alertname", "")
description = payload.get("annotations", {}).get("description", "")
guidance = get_runbook_guidance(service, alertname, description)
slack.chat_postMessage(
channel="#on-call",
blocks=[
{
"type": "header",
"text": {"type": "plain_text", "text": f"🚨 {alertname} — {service}"}
},
{
"type": "section",
"text": {"type": "mrkdwn", "text": f"*Alert:* {description}"}
},
{
"type": "divider"
},
{
"type": "section",
"text": {"type": "mrkdwn", "text": f"*Runbook Guidance:*\n{guidance}"}
}
]
)Deploy Qdrant on Kubernetes
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
namespace: vector-db
spec:
serviceName: qdrant
replicas: 1
selector:
matchLabels:
app: qdrant
template:
spec:
containers:
- name: qdrant
image: qdrant/qdrant:v1.8.0
ports:
- containerPort: 6333
volumeMounts:
- name: storage
mountPath: /qdrant/storage
volumeClaimTemplates:
- metadata:
name: storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
name: qdrant
namespace: vector-db
spec:
selector:
app: qdrant
ports:
- port: 6333
targetPort: 6333Example: What the System Returns
Alert: KubernetesPodCrashLooping on payment-service in production
System response in Slack:
🚨 KubernetesPodCrashLooping — payment-service
Runbook Guidance:
1. IMMEDIATE ACTION
kubectl describe pod -l app=payment-service -n production
Look for: OOMKilled, exit code 137
2. DIAGNOSIS
kubectl logs -l app=payment-service --previous -n production | tail -100
Check for: DB connection errors, null pointer exceptions, config missing
3. FIX STEPS
a) If OOMKilled: kubectl set resources deployment/payment-service \
-c payment-service --limits=memory=512Mi -n production
b) If DB connection: kubectl get secret db-credentials -n production \
-o yaml | grep host
Verify DB_HOST env var matches actual RDS endpoint
c) If config missing: Check ConfigMap is mounted correctly
4. ESCALATE IF:
- More than 3 pods crashing simultaneously
- DB itself is unreachable (run: nc -zv <db-host> 5432)
- Error involves payment processing (PCI scope — escalate to security team)
This replaces 20 minutes of searching Confluence with a 10-second automated response.
Get your Anthropic API key to build this. The system works best with well-structured runbooks — the better your docs, the better the AI guidance.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build an AI-Powered DevOps Chatbot with Streamlit on Kubernetes
Build a DevOps assistant chatbot that answers infrastructure questions, generates kubectl commands, and explains errors — deployed as a Streamlit app on Kubernetes.
Build a Natural Language kubectl — Ask Questions to Your Cluster
Build a CLI tool that lets you describe what you want in plain English and generates the correct kubectl command — powered by Claude API.
Build a Self-Healing Kubernetes Cluster with AI Agents
Build an AI agent that monitors your Kubernetes cluster, detects issues, diagnoses root causes using Claude, and automatically applies safe fixes — without human intervention.