Build an AI DevOps Onboarding Assistant with Claude API

Build a RAG-based chatbot with Claude API that answers new engineer questions from your runbooks and docs. Full Python FastAPI code, cosine similarity retrieval, and Slack bot deployment.

Every DevOps team has the same problem: a new engineer joins, asks "how do I deploy to staging?", and spends two hours reading outdated Confluence pages before someone on Slack finally answers them. Multiply this across every new hire and it's a serious drag.

In this post, we'll build an AI onboarding assistant using the Claude API that reads your actual runbooks and answers questions about your infrastructure — in your team's voice, based on your actual docs.

What We're Building

A FastAPI backend that accepts questions
Simple in-memory RAG (Retrieval-Augmented Generation) using cosine similarity
Claude claude-haiku-4-5-20251001 to generate answers (cheap + fast)
Runbooks stored as plain text files in a directory
Optional: Slack bot wrapper so engineers can ask in #onboarding

Prerequisites

bash

pip install anthropic fastapi uvicorn numpy scikit-learn python-dotenv

Set your API key:

bash

export ANTHROPIC_API_KEY=sk-ant-...

Project Structure

onboarding-bot/
  docs/
    deploy-staging.txt
    aws-access.txt
    on-call-rotation.txt
    kubernetes-access.txt
  main.py
  retriever.py
  .env

Step 1: Load and Embed Documents

We'll use scikit-learn's TfidfVectorizer for simple in-memory retrieval. No external vector database needed.

python

# retriever.py
import os
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
 
class DocumentRetriever:
    def __init__(self, docs_dir: str):
        self.docs = []
        self.doc_names = []
        self._load_docs(docs_dir)
        self._build_index()
 
    def _load_docs(self, docs_dir: str):
        for filename in os.listdir(docs_dir):
            if filename.endswith(".txt"):
                path = os.path.join(docs_dir, filename)
                with open(path, "r") as f:
                    content = f.read()
                self.docs.append(content)
                self.doc_names.append(filename)
        print(f"Loaded {len(self.docs)} documents")
 
    def _build_index(self):
        self.vectorizer = TfidfVectorizer(stop_words="english")
        self.tfidf_matrix = self.vectorizer.fit_transform(self.docs)
 
    def retrieve(self, query: str, top_k: int = 3) -> list[dict]:
        query_vec = self.vectorizer.transform([query])
        scores = cosine_similarity(query_vec, self.tfidf_matrix)[0]
        top_indices = np.argsort(scores)[::-1][:top_k]
 
        results = []
        for idx in top_indices:
            if scores[idx] > 0.05:  # filter out irrelevant docs
                results.append({
                    "source": self.doc_names[idx],
                    "content": self.docs[idx],
                    "score": float(scores[idx])
                })
        return results

Step 2: FastAPI Backend with Claude

python

# main.py
import os
import anthropic
from fastapi import FastAPI
from pydantic import BaseModel
from retriever import DocumentRetriever
from dotenv import load_dotenv
 
load_dotenv()
 
app = FastAPI(title="DevOps Onboarding Assistant")
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
retriever = DocumentRetriever(docs_dir="docs/")
 
class QuestionRequest(BaseModel):
    question: str
 
class AnswerResponse(BaseModel):
    answer: str
    sources: list[str]
 
@app.post("/ask", response_model=AnswerResponse)
async def ask_question(request: QuestionRequest):
    # Retrieve relevant docs
    relevant_docs = retriever.retrieve(request.question, top_k=3)
 
    if not relevant_docs:
        return AnswerResponse(
            answer="I couldn't find relevant documentation for your question. Try asking in #platform-team.",
            sources=[]
        )
 
    # Build context from retrieved docs
    context_parts = []
    sources = []
    for doc in relevant_docs:
        context_parts.append(f"[{doc['source']}]\n{doc['content']}")
        sources.append(doc['source'])
 
    context = "\n\n---\n\n".join(context_parts)
 
    # Call Claude
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system="""You are a helpful DevOps onboarding assistant for our engineering team.
Answer questions based only on the provided documentation.
Be concise and practical. If the docs don't cover something, say so and suggest asking in Slack.
Format commands with backticks. Use numbered steps for procedures.""",
        messages=[
            {
                "role": "user",
                "content": f"Documentation:\n\n{context}\n\nQuestion: {request.question}"
            }
        ]
    )
 
    return AnswerResponse(
        answer=message.content[0].text,
        sources=sources
    )
 
@app.get("/health")
async def health():
    return {"status": "ok", "docs_loaded": len(retriever.docs)}

Step 3: Sample Runbook Files

# docs/deploy-staging.txt

## Deploying to Staging

Staging environment: https://staging.internal.yourcompany.com

Prerequisites:
- AWS CLI configured with the 'devops' profile
- kubectl configured for the staging cluster

Steps:
1. Build and push the Docker image:
   docker build -t your-app:${VERSION} .
   docker tag your-app:${VERSION} 123456789.dkr.ecr.ap-south-1.amazonaws.com/your-app:${VERSION}
   aws ecr get-login-password --region ap-south-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.ap-south-1.amazonaws.com
   docker push 123456789.dkr.ecr.ap-south-1.amazonaws.com/your-app:${VERSION}

2. Update the Helm chart:
   helm upgrade your-app charts/your-app \
     --namespace staging \
     --set image.tag=${VERSION} \
     --values values/staging.yaml

3. Verify the rollout:
   kubectl rollout status deployment/your-app -n staging

On-call contact for staging issues: #platform-team on Slack

Step 4: Run It

bash

uvicorn main:main --reload --port 8000

Test it:

bash

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I deploy to staging?"}'

Response:

json

{
  "answer": "To deploy to staging:\n\n1. Build and push your Docker image to ECR using the `devops` AWS profile...",
  "sources": ["deploy-staging.txt"]
}

Step 5: Slack Bot Wrapper

Install the Slack SDK:

bash

pip install slack-bolt

python

# slack_bot.py
import os
import requests
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
 
app = App(token=os.environ["SLACK_BOT_TOKEN"])
API_URL = "http://localhost:8000/ask"
 
@app.event("app_mention")
def handle_mention(event, say):
    question = event["text"].replace(f"<@{event['user']}>", "").strip()
 
    response = requests.post(API_URL, json={"question": question})
    data = response.json()
 
    sources_text = ", ".join(data["sources"]) if data["sources"] else "no docs matched"
    say(f"{data['answer']}\n\n_Sources: {sources_text}_")
 
if __name__ == "__main__":
    SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start()

Invite the bot to #onboarding, and engineers can @ mention it with questions. Answers come back in seconds with source references.

What to Add Next

Swap TF-IDF for sentence-transformers embeddings for better semantic search
Store embeddings in Chroma or Qdrant for larger doc sets (100+ files)
Add a /refresh endpoint to reload docs without restarting
Log unanswered questions to a Google Sheet so you know what docs to write
Deploy as a Docker container on ECS Fargate or as a K8s deployment

The TF-IDF approach here works well for up to ~50 documents. At that scale, it's fast, free, and requires zero external dependencies beyond the Claude API.

Build an AI DevOps Onboarding Assistant with Claude API

What We're Building

Prerequisites

Project Structure

Step 1: Load and Embed Documents

Step 2: FastAPI Backend with Claude

Step 3: Sample Runbook Files

Step 4: Run It

Step 5: Slack Bot Wrapper

What to Add Next

Stay ahead of the curve

Related Articles

LLM Error Handling: Fallbacks, Retries, and Circuit Breakers in Production

LLM Multi-Agent Orchestration with LangGraph in Production

LLM Routing: Automatically Select the Right Model in Production

Comments