Build an AI DevOps Onboarding Assistant with Claude API
Build a RAG-based chatbot with Claude API that answers new engineer questions from your runbooks and docs. Full Python FastAPI code, cosine similarity retrieval, and Slack bot deployment.
Every DevOps team has the same problem: a new engineer joins, asks "how do I deploy to staging?", and spends two hours reading outdated Confluence pages before someone on Slack finally answers them. Multiply this across every new hire and it's a serious drag.
In this post, we'll build an AI onboarding assistant using the Claude API that reads your actual runbooks and answers questions about your infrastructure — in your team's voice, based on your actual docs.
What We're Building
- A FastAPI backend that accepts questions
- Simple in-memory RAG (Retrieval-Augmented Generation) using cosine similarity
- Claude
claude-haiku-4-5-20251001to generate answers (cheap + fast) - Runbooks stored as plain text files in a directory
- Optional: Slack bot wrapper so engineers can ask in
#onboarding
Prerequisites
pip install anthropic fastapi uvicorn numpy scikit-learn python-dotenvSet your API key:
export ANTHROPIC_API_KEY=sk-ant-...Project Structure
onboarding-bot/
docs/
deploy-staging.txt
aws-access.txt
on-call-rotation.txt
kubernetes-access.txt
main.py
retriever.py
.env
Step 1: Load and Embed Documents
We'll use scikit-learn's TfidfVectorizer for simple in-memory retrieval. No external vector database needed.
# retriever.py
import os
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
class DocumentRetriever:
def __init__(self, docs_dir: str):
self.docs = []
self.doc_names = []
self._load_docs(docs_dir)
self._build_index()
def _load_docs(self, docs_dir: str):
for filename in os.listdir(docs_dir):
if filename.endswith(".txt"):
path = os.path.join(docs_dir, filename)
with open(path, "r") as f:
content = f.read()
self.docs.append(content)
self.doc_names.append(filename)
print(f"Loaded {len(self.docs)} documents")
def _build_index(self):
self.vectorizer = TfidfVectorizer(stop_words="english")
self.tfidf_matrix = self.vectorizer.fit_transform(self.docs)
def retrieve(self, query: str, top_k: int = 3) -> list[dict]:
query_vec = self.vectorizer.transform([query])
scores = cosine_similarity(query_vec, self.tfidf_matrix)[0]
top_indices = np.argsort(scores)[::-1][:top_k]
results = []
for idx in top_indices:
if scores[idx] > 0.05: # filter out irrelevant docs
results.append({
"source": self.doc_names[idx],
"content": self.docs[idx],
"score": float(scores[idx])
})
return resultsStep 2: FastAPI Backend with Claude
# main.py
import os
import anthropic
from fastapi import FastAPI
from pydantic import BaseModel
from retriever import DocumentRetriever
from dotenv import load_dotenv
load_dotenv()
app = FastAPI(title="DevOps Onboarding Assistant")
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
retriever = DocumentRetriever(docs_dir="docs/")
class QuestionRequest(BaseModel):
question: str
class AnswerResponse(BaseModel):
answer: str
sources: list[str]
@app.post("/ask", response_model=AnswerResponse)
async def ask_question(request: QuestionRequest):
# Retrieve relevant docs
relevant_docs = retriever.retrieve(request.question, top_k=3)
if not relevant_docs:
return AnswerResponse(
answer="I couldn't find relevant documentation for your question. Try asking in #platform-team.",
sources=[]
)
# Build context from retrieved docs
context_parts = []
sources = []
for doc in relevant_docs:
context_parts.append(f"[{doc['source']}]\n{doc['content']}")
sources.append(doc['source'])
context = "\n\n---\n\n".join(context_parts)
# Call Claude
message = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
system="""You are a helpful DevOps onboarding assistant for our engineering team.
Answer questions based only on the provided documentation.
Be concise and practical. If the docs don't cover something, say so and suggest asking in Slack.
Format commands with backticks. Use numbered steps for procedures.""",
messages=[
{
"role": "user",
"content": f"Documentation:\n\n{context}\n\nQuestion: {request.question}"
}
]
)
return AnswerResponse(
answer=message.content[0].text,
sources=sources
)
@app.get("/health")
async def health():
return {"status": "ok", "docs_loaded": len(retriever.docs)}Step 3: Sample Runbook Files
# docs/deploy-staging.txt
## Deploying to Staging
Staging environment: https://staging.internal.yourcompany.com
Prerequisites:
- AWS CLI configured with the 'devops' profile
- kubectl configured for the staging cluster
Steps:
1. Build and push the Docker image:
docker build -t your-app:${VERSION} .
docker tag your-app:${VERSION} 123456789.dkr.ecr.ap-south-1.amazonaws.com/your-app:${VERSION}
aws ecr get-login-password --region ap-south-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.ap-south-1.amazonaws.com
docker push 123456789.dkr.ecr.ap-south-1.amazonaws.com/your-app:${VERSION}
2. Update the Helm chart:
helm upgrade your-app charts/your-app \
--namespace staging \
--set image.tag=${VERSION} \
--values values/staging.yaml
3. Verify the rollout:
kubectl rollout status deployment/your-app -n staging
On-call contact for staging issues: #platform-team on Slack
Step 4: Run It
uvicorn main:main --reload --port 8000Test it:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "How do I deploy to staging?"}'Response:
{
"answer": "To deploy to staging:\n\n1. Build and push your Docker image to ECR using the `devops` AWS profile...",
"sources": ["deploy-staging.txt"]
}Step 5: Slack Bot Wrapper
Install the Slack SDK:
pip install slack-bolt# slack_bot.py
import os
import requests
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
app = App(token=os.environ["SLACK_BOT_TOKEN"])
API_URL = "http://localhost:8000/ask"
@app.event("app_mention")
def handle_mention(event, say):
question = event["text"].replace(f"<@{event['user']}>", "").strip()
response = requests.post(API_URL, json={"question": question})
data = response.json()
sources_text = ", ".join(data["sources"]) if data["sources"] else "no docs matched"
say(f"{data['answer']}\n\n_Sources: {sources_text}_")
if __name__ == "__main__":
SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"]).start()Invite the bot to #onboarding, and engineers can @ mention it with questions. Answers come back in seconds with source references.
What to Add Next
- Swap TF-IDF for sentence-transformers embeddings for better semantic search
- Store embeddings in Chroma or Qdrant for larger doc sets (100+ files)
- Add a
/refreshendpoint to reload docs without restarting - Log unanswered questions to a Google Sheet so you know what docs to write
- Deploy as a Docker container on ECS Fargate or as a K8s deployment
The TF-IDF approach here works well for up to ~50 documents. At that scale, it's fast, free, and requires zero external dependencies beyond the Claude API.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
LLM Error Handling: Fallbacks, Retries, and Circuit Breakers in Production
Build production-grade LLM error handling in Python. Covers exponential backoff, fallback chains, circuit breaker pattern, timeout budgets, and dead letter queues using tenacity.
LLM Multi-Agent Orchestration with LangGraph in Production
Build a production-ready multi-agent system with LangGraph for DevOps automation — Planner, Executor, and Reviewer agents with shared state, conditional edges, human-in-the-loop checkpoints, and LangSmith observability.
LLM Routing: Automatically Select the Right Model in Production
Build a model router in Python that picks cheap vs expensive LLMs based on query complexity. Covers cost-based routing, latency fallbacks, LiteLLM router, and tracking routing decisions with the Anthropic SDK.