🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Build LLM-Powered Runbook Automation with Haystack and Kubernetes

Turn your static runbooks into an AI system that answers 'what do I do when X happens' with step-by-step instructions retrieved from your actual documentation.

DevOpsBoysJun 3, 20264 min read
Share:Tweet

Your runbooks are in Confluence. When an alert fires at 2 AM, your on-call engineer has to find the right page, read through it, and figure out what to do. This system does that for them — retrieves the relevant runbook section and generates step-by-step instructions.


Architecture: RAG for Runbooks

Alert fires (PagerDuty/Slack)
    → Extract: what service, what error
    → Haystack: find relevant runbook sections (vector search)
    → Claude: generate step-by-step fix using retrieved context
    → Slack: "Here's what to do:"

RAG (Retrieval Augmented Generation) means the AI uses YOUR runbooks — not generic knowledge — to answer questions.


Setup

bash
pip install haystack-ai anthropic-haystack \
  sentence-transformers qdrant-client anthropic

Step 1: Index Your Runbooks

python
# indexer.py
from haystack import Document, Pipeline
from haystack.components.writers import DocumentWriter
from haystack.components.converters import MarkdownToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
 
# Setup vector store (Qdrant on Kubernetes)
document_store = QdrantDocumentStore(
    url="http://qdrant.vector-db.svc.cluster.local:6333",
    index="runbooks",
    embedding_dim=384,
    recreate_index=False,
)
 
def index_runbooks(runbook_dir: str):
    """Index all markdown runbook files."""
    
    indexing_pipeline = Pipeline()
    indexing_pipeline.add_component("converter", MarkdownToDocument())
    indexing_pipeline.add_component("splitter", DocumentSplitter(
        split_by="sentence",
        split_length=5,      # 5 sentences per chunk
        split_overlap=2,     # 2 sentence overlap for context
    ))
    indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(
        model="all-MiniLM-L6-v2"  # Fast, good quality, 384 dims
    ))
    indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
    
    indexing_pipeline.connect("converter", "splitter")
    indexing_pipeline.connect("splitter", "embedder")
    indexing_pipeline.connect("embedder", "writer")
    
    # Find all runbook files
    import glob
    runbook_files = glob.glob(f"{runbook_dir}/**/*.md", recursive=True)
    
    print(f"Indexing {len(runbook_files)} runbooks...")
    
    indexing_pipeline.run({
        "converter": {
            "sources": runbook_files,
            "meta": [{"source": f, "type": "runbook"} for f in runbook_files]
        }
    })
    
    print(f"Indexed {document_store.count_documents()} chunks")
 
# Or index from text directly (e.g., from Confluence API)
def index_document(title: str, content: str, service: str):
    """Index a single runbook document."""
    splitter = DocumentSplitter(split_by="sentence", split_length=5, split_overlap=2)
    embedder = SentenceTransformersDocumentEmbedder(model="all-MiniLM-L6-v2")
    
    doc = Document(content=content, meta={"title": title, "service": service})
    chunks = splitter.run(documents=[doc])["documents"]
    embedded = embedder.run(documents=chunks)["documents"]
    document_store.write_documents(embedded)

Step 2: RAG Query Pipeline

python
# rag_pipeline.py
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.components.generators.anthropic import AnthropicGenerator
from haystack.components.builders import PromptBuilder
 
RUNBOOK_PROMPT = """You are a DevOps on-call assistant. Use the provided runbook excerpts to give step-by-step instructions for the current incident.
 
Current Incident:
Service: {{ service }}
Alert: {{ alert }}
Error: {{ error_message }}
 
Relevant Runbook Sections:
{% for doc in documents %}
--- From: {{ doc.meta.get('title', 'Runbook') }} ---
{{ doc.content }}
{% endfor %}
 
Provide:
1. Immediate action (do this first)
2. Diagnosis steps (what to check)
3. Fix steps (in order)
4. Escalation criteria (when to wake someone else up)
 
Be specific with commands. Include actual kubectl/aws/etc commands from the runbooks."""
 
def create_rag_pipeline():
    pipeline = Pipeline()
    
    pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(
        model="all-MiniLM-L6-v2"
    ))
    pipeline.add_component("retriever", QdrantEmbeddingRetriever(
        document_store=document_store,
        top_k=5,  # Retrieve 5 most relevant chunks
    ))
    pipeline.add_component("prompt_builder", PromptBuilder(template=RUNBOOK_PROMPT))
    pipeline.add_component("llm", AnthropicGenerator(
        model="claude-sonnet-4-6",
        generation_kwargs={"max_tokens": 1500}
    ))
    
    pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
    pipeline.connect("retriever.documents", "prompt_builder.documents")
    pipeline.connect("prompt_builder.prompt", "llm.prompt")
    
    return pipeline
 
rag_pipeline = create_rag_pipeline()
 
def get_runbook_guidance(service: str, alert: str, error_message: str) -> str:
    """Get runbook guidance for an incident."""
    
    # Construct search query from incident details
    query = f"{service} {alert} {error_message}"
    
    result = rag_pipeline.run({
        "text_embedder": {"text": query},
        "prompt_builder": {
            "service": service,
            "alert": alert,
            "error_message": error_message
        }
    })
    
    return result["llm"]["replies"][0]

Step 3: Slack Integration

python
# slack_handler.py
from slack_sdk import WebClient
import os
 
slack = WebClient(token=os.getenv("SLACK_BOT_TOKEN"))
 
def handle_alert(payload: dict):
    """Called when a PagerDuty/Alertmanager alert fires."""
    
    service = payload.get("labels", {}).get("service", "unknown")
    alertname = payload.get("labels", {}).get("alertname", "")
    description = payload.get("annotations", {}).get("description", "")
    
    guidance = get_runbook_guidance(service, alertname, description)
    
    slack.chat_postMessage(
        channel="#on-call",
        blocks=[
            {
                "type": "header",
                "text": {"type": "plain_text", "text": f"🚨 {alertname} — {service}"}
            },
            {
                "type": "section",
                "text": {"type": "mrkdwn", "text": f"*Alert:* {description}"}
            },
            {
                "type": "divider"
            },
            {
                "type": "section",
                "text": {"type": "mrkdwn", "text": f"*Runbook Guidance:*\n{guidance}"}
            }
        ]
    )

Deploy Qdrant on Kubernetes

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: vector-db
spec:
  serviceName: qdrant
  replicas: 1
  selector:
    matchLabels:
      app: qdrant
  template:
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.8.0
          ports:
            - containerPort: 6333
          volumeMounts:
            - name: storage
              mountPath: /qdrant/storage
  volumeClaimTemplates:
    - metadata:
        name: storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
  name: qdrant
  namespace: vector-db
spec:
  selector:
    app: qdrant
  ports:
    - port: 6333
      targetPort: 6333

Example: What the System Returns

Alert: KubernetesPodCrashLooping on payment-service in production

System response in Slack:

🚨 KubernetesPodCrashLooping — payment-service

Runbook Guidance:

1. IMMEDIATE ACTION
   kubectl describe pod -l app=payment-service -n production
   Look for: OOMKilled, exit code 137

2. DIAGNOSIS
   kubectl logs -l app=payment-service --previous -n production | tail -100
   Check for: DB connection errors, null pointer exceptions, config missing

3. FIX STEPS
   a) If OOMKilled: kubectl set resources deployment/payment-service \
      -c payment-service --limits=memory=512Mi -n production
   b) If DB connection: kubectl get secret db-credentials -n production \
      -o yaml | grep host
      Verify DB_HOST env var matches actual RDS endpoint
   c) If config missing: Check ConfigMap is mounted correctly

4. ESCALATE IF:
   - More than 3 pods crashing simultaneously
   - DB itself is unreachable (run: nc -zv <db-host> 5432)
   - Error involves payment processing (PCI scope — escalate to security team)

This replaces 20 minutes of searching Confluence with a 10-second automated response.

Get your Anthropic API key to build this. The system works best with well-structured runbooks — the better your docs, the better the AI guidance.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments