Build a Semantic Search for Your DevOps Docs Using Vector Database (2026)
Tired of grepping through runbooks? Build a semantic search that finds relevant docs by meaning, not keywords — using embeddings, pgvector, and the Claude API.
Your team's runbooks, postmortems, and architecture docs are valuable — but nobody reads them because grep-based search is terrible. Here's how to build semantic search that actually works.
What We're Building
A search tool that:
- Indexes your runbooks, postmortems, and docs into a vector database
- Accepts natural language queries: "how do we handle cert-manager failures?"
- Returns relevant documents ranked by semantic similarity (not keyword match)
- Uses Claude to summarize the top results into a direct answer
- Deployable as a Slack bot or web tool
Why Vector Search Beats Grep
Grep-based search: "cert-manager" → finds docs with that exact word
Vector search: "TLS certificate renewal failing" → finds docs about cert-manager, certificate rotation, ACME challenges — even if they don't use those exact words
Vector search understands meaning, not just keywords. This matters when runbooks use different terminology than the incident happening.
Stack
- pgvector — PostgreSQL extension for storing and searching embeddings
- Voyage AI embeddings (or OpenAI, or local models) — convert text to vectors
- Claude API — synthesize search results into answers
- FastAPI — REST API
- Python — scripts to index documents
Setup
pip install anthropic voyageai psycopg2-binary pgvector fastapi uvicorn \
python-dotenv markdown beautifulsoup4# .env
ANTHROPIC_API_KEY=sk-ant-...
VOYAGE_API_KEY=pa-...
DATABASE_URL=postgresql://user:password@localhost:5432/devops_searchStep 1: Set Up pgvector
-- Run in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
source TEXT, -- file path or URL
doc_type TEXT, -- runbook, postmortem, architecture, cheatsheet
embedding vector(1024), -- Voyage AI produces 1024-dim vectors
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Step 2: Indexer — Convert Docs to Vectors
# indexer.py
import os
import glob
import voyageai
import psycopg2
from psycopg2.extras import execute_values
from pathlib import Path
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
def chunk_document(content: str, chunk_size: int = 800) -> list[str]:
"""Split long documents into overlapping chunks"""
words = content.split()
chunks = []
overlap = 100
for i in range(0, len(words), chunk_size - overlap):
chunk = " ".join(words[i:i + chunk_size])
if chunk:
chunks.append(chunk)
return chunks
def index_markdown_files(docs_dir: str):
"""Index all .md and .mdx files in a directory"""
files = glob.glob(f"{docs_dir}/**/*.md", recursive=True)
files += glob.glob(f"{docs_dir}/**/*.mdx", recursive=True)
print(f"Found {len(files)} files to index")
for filepath in files:
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
# Remove MDX frontmatter
if content.startswith('---'):
end = content.find('---', 3)
frontmatter = content[3:end]
content = content[end+3:].strip()
title = next((line.split('title:')[1].strip().strip('"')
for line in frontmatter.split('\n')
if line.startswith('title:')), Path(filepath).stem)
else:
title = Path(filepath).stem
# Split into chunks
chunks = chunk_document(content)
if not chunks:
continue
# Get embeddings in batch (max 128 per request for Voyage)
print(f"Indexing: {title} ({len(chunks)} chunks)")
embeddings = voyage.embed(
chunks,
model="voyage-large-2",
input_type="document"
).embeddings
# Store in database
with conn.cursor() as cur:
# Delete existing entries for this file
cur.execute("DELETE FROM documents WHERE source = %s", (filepath,))
# Insert new chunks
data = [
(f"{title} (part {i+1})", chunk, filepath, "runbook", embedding)
for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
]
execute_values(cur, """
INSERT INTO documents (title, content, source, doc_type, embedding)
VALUES %s
""", data)
conn.commit()
print("Indexing complete!")
# Run indexing
index_markdown_files("./runbooks")
index_markdown_files("./postmortems")
index_markdown_files("./architecture-docs")Step 3: Search + Answer Function
# search.py
import os
import voyageai
import psycopg2
import anthropic
from dataclasses import dataclass
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
@dataclass
class SearchResult:
title: str
content: str
source: str
similarity: float
def semantic_search(query: str, top_k: int = 5) -> list[SearchResult]:
"""Find most relevant documents for a query"""
# Convert query to vector
query_embedding = voyage.embed(
[query],
model="voyage-large-2",
input_type="query"
).embeddings[0]
# Search using cosine similarity
with conn.cursor() as cur:
cur.execute("""
SELECT title, content, source,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_embedding, query_embedding, top_k))
results = cur.fetchall()
return [
SearchResult(title=r[0], content=r[1], source=r[2], similarity=r[3])
for r in results
]
def answer_from_docs(query: str) -> dict:
"""Search docs and generate an answer using Claude"""
# Find relevant docs
results = semantic_search(query, top_k=4)
if not results or results[0].similarity < 0.5:
return {
"answer": "No relevant documentation found for this query.",
"sources": []
}
# Build context from top results
context = "\n\n---\n\n".join([
f"**{r.title}** (relevance: {r.similarity:.0%})\n{r.content[:1000]}"
for r in results
])
# Generate answer with Claude
response = claude.messages.create(
model="claude-opus-4-7",
max_tokens=800,
system="""You are a senior DevOps engineer answering questions using internal documentation.
Rules:
- Answer directly based on the provided documentation
- If the docs don't fully answer the question, say so
- Include specific commands or steps when available
- Be concise — engineers need fast answers during incidents""",
messages=[{
"role": "user",
"content": f"""Question: {query}
Relevant documentation:
{context}
Answer the question based on the documentation above."""
}]
)
return {
"answer": response.content[0].text,
"sources": [{"title": r.title, "source": r.source, "relevance": f"{r.similarity:.0%}"}
for r in results[:3]]
}Step 4: FastAPI
# main.py
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from search import answer_from_docs, semantic_search
app = FastAPI(title="DevOps Docs Search")
class SearchQuery(BaseModel):
query: str
mode: str = "answer" # "answer" or "search"
@app.post("/search")
async def search(request: SearchQuery):
if request.mode == "answer":
return answer_from_docs(request.query)
else:
results = semantic_search(request.query)
return {"results": [
{"title": r.title, "content": r.content[:300] + "...",
"source": r.source, "relevance": f"{r.similarity:.0%}"}
for r in results
]}
@app.get("/", response_class=HTMLResponse)
async def ui():
return """
<!DOCTYPE html>
<html>
<head>
<title>DevOps Docs Search</title>
<style>
body { font-family: monospace; max-width: 800px; margin: 40px auto; background: #0f0f0f; color: #e0e0e0; padding: 20px; }
input { width: 80%; padding: 10px; background: #1a1a1a; color: #e0e0e0; border: 1px solid #333; border-radius: 4px; }
button { padding: 10px 20px; background: #7c3aed; color: white; border: none; border-radius: 4px; cursor: pointer; margin-left: 8px; }
#answer { margin-top: 20px; padding: 16px; background: #1a1a1a; border-left: 3px solid #7c3aed; white-space: pre-wrap; display: none; }
#sources { margin-top: 12px; font-size: 12px; color: #666; }
h1 { color: #7c3aed; }
</style>
</head>
<body>
<h1>🔍 DevOps Docs Search</h1>
<p>Ask anything about your runbooks, postmortems, and architecture docs.</p>
<input id="query" placeholder="e.g. How do we handle cert-manager failures?"
onkeypress="if(event.key==='Enter') search()" />
<button onclick="search()">Search</button>
<div id="answer"></div>
<div id="sources"></div>
<script>
async function search() {
const query = document.getElementById('query').value;
document.getElementById('answer').textContent = 'Searching...';
document.getElementById('answer').style.display = 'block';
const res = await fetch('/search', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({query, mode: 'answer'})
});
const data = await res.json();
document.getElementById('answer').textContent = data.answer;
document.getElementById('sources').innerHTML =
'Sources: ' + data.sources?.map(s => s.title + ' (' + s.relevance + ')').join(', ');
}
</script>
</body>
</html>
"""
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8090)Step 5: Slack Bot Integration
# slack_search.py
from slack_bolt import App
from search import answer_from_docs
slack = App(token=os.environ["SLACK_BOT_TOKEN"])
@slack.message("?ask")
def handle_ask(message, say):
query = message["text"].replace("?ask", "").strip()
result = answer_from_docs(query)
say({
"text": result["answer"],
"blocks": [
{"type": "section", "text": {"type": "mrkdwn", "text": result["answer"]}},
{"type": "context", "elements": [
{"type": "mrkdwn",
"text": "Sources: " + " | ".join([f"`{s['title']}`" for s in result["sources"]])}
]}
]
})Usage in Slack: ?ask how do we rotate database credentials in vault?
Keeping Docs Fresh
Set up a cron job to re-index when docs change:
# Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: docs-indexer
spec:
schedule: "0 2 * * *" # nightly at 2am
jobTemplate:
spec:
template:
spec:
containers:
- name: indexer
image: your-registry/docs-indexer:latest
command: ["python", "indexer.py"]
env:
- name: VOYAGE_API_KEY
valueFrom:
secretKeyRef:
name: ai-secrets
key: voyage-keyCost
| Component | Cost |
|---|---|
| Voyage AI embeddings (1000 docs) | ~$0.10 one-time |
| pgvector (RDS or self-hosted) | $20–50/month |
| Claude API (100 searches/day) | ~$3–5/month |
| Total | ~$25–55/month |
Cheap for the value — your team stops spending 20 minutes searching for how to do something they've done before.
The best part: this same system can index your Confluence, GitHub wikis, or any Markdown source. Build it once, and suddenly your organization's knowledge is actually findable.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build an AI-Powered Incident Report Generator with Claude API (2026)
Writing postmortems takes 2-3 hours. Here's how to build an AI tool that generates a structured incident report from Slack logs, metrics screenshots, and alert data in minutes.
Build an AI Kubernetes Troubleshooter with Claude (2026)
Build a CLI tool that automatically diagnoses Kubernetes issues — OOMKilled, CrashLoopBackOff, pending pods — by gathering cluster state and asking Claude what's wrong and how to fix it.
Cursor AI for DevOps Engineers — Complete Workflow Guide (2026)
Cursor is the AI IDE that 92% of developers are switching to. Here's how DevOps engineers actually use it — Terraform, Kubernetes YAML, bash scripts, Dockerfile review, and more.