🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Build a Semantic Search for Your DevOps Docs Using Vector Database (2026)

Tired of grepping through runbooks? Build a semantic search that finds relevant docs by meaning, not keywords — using embeddings, pgvector, and the Claude API.

DevOpsBoysMay 9, 20266 min read
Share:Tweet

Your team's runbooks, postmortems, and architecture docs are valuable — but nobody reads them because grep-based search is terrible. Here's how to build semantic search that actually works.


What We're Building

A search tool that:

  • Indexes your runbooks, postmortems, and docs into a vector database
  • Accepts natural language queries: "how do we handle cert-manager failures?"
  • Returns relevant documents ranked by semantic similarity (not keyword match)
  • Uses Claude to summarize the top results into a direct answer
  • Deployable as a Slack bot or web tool

Why Vector Search Beats Grep

Grep-based search: "cert-manager" → finds docs with that exact word

Vector search: "TLS certificate renewal failing" → finds docs about cert-manager, certificate rotation, ACME challenges — even if they don't use those exact words

Vector search understands meaning, not just keywords. This matters when runbooks use different terminology than the incident happening.


Stack

  • pgvector — PostgreSQL extension for storing and searching embeddings
  • Voyage AI embeddings (or OpenAI, or local models) — convert text to vectors
  • Claude API — synthesize search results into answers
  • FastAPI — REST API
  • Python — scripts to index documents

Setup

bash
pip install anthropic voyageai psycopg2-binary pgvector fastapi uvicorn \
  python-dotenv markdown beautifulsoup4
bash
# .env
ANTHROPIC_API_KEY=sk-ant-...
VOYAGE_API_KEY=pa-...
DATABASE_URL=postgresql://user:password@localhost:5432/devops_search

Step 1: Set Up pgvector

sql
-- Run in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
 
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    source TEXT,           -- file path or URL
    doc_type TEXT,         -- runbook, postmortem, architecture, cheatsheet
    embedding vector(1024), -- Voyage AI produces 1024-dim vectors
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);
 
-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Step 2: Indexer — Convert Docs to Vectors

python
# indexer.py
import os
import glob
import voyageai
import psycopg2
from psycopg2.extras import execute_values
from pathlib import Path
 
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
 
def chunk_document(content: str, chunk_size: int = 800) -> list[str]:
    """Split long documents into overlapping chunks"""
    words = content.split()
    chunks = []
    overlap = 100
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk:
            chunks.append(chunk)
    return chunks
 
def index_markdown_files(docs_dir: str):
    """Index all .md and .mdx files in a directory"""
    files = glob.glob(f"{docs_dir}/**/*.md", recursive=True)
    files += glob.glob(f"{docs_dir}/**/*.mdx", recursive=True)
    
    print(f"Found {len(files)} files to index")
    
    for filepath in files:
        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Remove MDX frontmatter
        if content.startswith('---'):
            end = content.find('---', 3)
            frontmatter = content[3:end]
            content = content[end+3:].strip()
            title = next((line.split('title:')[1].strip().strip('"') 
                         for line in frontmatter.split('\n') 
                         if line.startswith('title:')), Path(filepath).stem)
        else:
            title = Path(filepath).stem
        
        # Split into chunks
        chunks = chunk_document(content)
        if not chunks:
            continue
        
        # Get embeddings in batch (max 128 per request for Voyage)
        print(f"Indexing: {title} ({len(chunks)} chunks)")
        embeddings = voyage.embed(
            chunks,
            model="voyage-large-2",
            input_type="document"
        ).embeddings
        
        # Store in database
        with conn.cursor() as cur:
            # Delete existing entries for this file
            cur.execute("DELETE FROM documents WHERE source = %s", (filepath,))
            
            # Insert new chunks
            data = [
                (f"{title} (part {i+1})", chunk, filepath, "runbook", embedding)
                for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
            ]
            execute_values(cur, """
                INSERT INTO documents (title, content, source, doc_type, embedding)
                VALUES %s
            """, data)
        
        conn.commit()
    
    print("Indexing complete!")
 
# Run indexing
index_markdown_files("./runbooks")
index_markdown_files("./postmortems")
index_markdown_files("./architecture-docs")

Step 3: Search + Answer Function

python
# search.py
import os
import voyageai
import psycopg2
import anthropic
from dataclasses import dataclass
 
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
 
@dataclass
class SearchResult:
    title: str
    content: str
    source: str
    similarity: float
 
def semantic_search(query: str, top_k: int = 5) -> list[SearchResult]:
    """Find most relevant documents for a query"""
    # Convert query to vector
    query_embedding = voyage.embed(
        [query],
        model="voyage-large-2",
        input_type="query"
    ).embeddings[0]
    
    # Search using cosine similarity
    with conn.cursor() as cur:
        cur.execute("""
            SELECT title, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (query_embedding, query_embedding, top_k))
        
        results = cur.fetchall()
    
    return [
        SearchResult(title=r[0], content=r[1], source=r[2], similarity=r[3])
        for r in results
    ]
 
def answer_from_docs(query: str) -> dict:
    """Search docs and generate an answer using Claude"""
    # Find relevant docs
    results = semantic_search(query, top_k=4)
    
    if not results or results[0].similarity < 0.5:
        return {
            "answer": "No relevant documentation found for this query.",
            "sources": []
        }
    
    # Build context from top results
    context = "\n\n---\n\n".join([
        f"**{r.title}** (relevance: {r.similarity:.0%})\n{r.content[:1000]}"
        for r in results
    ])
    
    # Generate answer with Claude
    response = claude.messages.create(
        model="claude-opus-4-7",
        max_tokens=800,
        system="""You are a senior DevOps engineer answering questions using internal documentation.
        
Rules:
- Answer directly based on the provided documentation
- If the docs don't fully answer the question, say so
- Include specific commands or steps when available
- Be concise — engineers need fast answers during incidents""",
        messages=[{
            "role": "user",
            "content": f"""Question: {query}
 
Relevant documentation:
{context}
 
Answer the question based on the documentation above."""
        }]
    )
    
    return {
        "answer": response.content[0].text,
        "sources": [{"title": r.title, "source": r.source, "relevance": f"{r.similarity:.0%}"} 
                   for r in results[:3]]
    }

Step 4: FastAPI

python
# main.py
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from search import answer_from_docs, semantic_search
 
app = FastAPI(title="DevOps Docs Search")
 
class SearchQuery(BaseModel):
    query: str
    mode: str = "answer"  # "answer" or "search"
 
@app.post("/search")
async def search(request: SearchQuery):
    if request.mode == "answer":
        return answer_from_docs(request.query)
    else:
        results = semantic_search(request.query)
        return {"results": [
            {"title": r.title, "content": r.content[:300] + "...", 
             "source": r.source, "relevance": f"{r.similarity:.0%}"}
            for r in results
        ]}
 
@app.get("/", response_class=HTMLResponse)
async def ui():
    return """
<!DOCTYPE html>
<html>
<head>
    <title>DevOps Docs Search</title>
    <style>
        body { font-family: monospace; max-width: 800px; margin: 40px auto; background: #0f0f0f; color: #e0e0e0; padding: 20px; }
        input { width: 80%; padding: 10px; background: #1a1a1a; color: #e0e0e0; border: 1px solid #333; border-radius: 4px; }
        button { padding: 10px 20px; background: #7c3aed; color: white; border: none; border-radius: 4px; cursor: pointer; margin-left: 8px; }
        #answer { margin-top: 20px; padding: 16px; background: #1a1a1a; border-left: 3px solid #7c3aed; white-space: pre-wrap; display: none; }
        #sources { margin-top: 12px; font-size: 12px; color: #666; }
        h1 { color: #7c3aed; }
    </style>
</head>
<body>
    <h1>🔍 DevOps Docs Search</h1>
    <p>Ask anything about your runbooks, postmortems, and architecture docs.</p>
    <input id="query" placeholder="e.g. How do we handle cert-manager failures?" 
           onkeypress="if(event.key==='Enter') search()" />
    <button onclick="search()">Search</button>
    <div id="answer"></div>
    <div id="sources"></div>
    <script>
    async function search() {
        const query = document.getElementById('query').value;
        document.getElementById('answer').textContent = 'Searching...';
        document.getElementById('answer').style.display = 'block';
        const res = await fetch('/search', {
            method: 'POST',
            headers: {'Content-Type': 'application/json'},
            body: JSON.stringify({query, mode: 'answer'})
        });
        const data = await res.json();
        document.getElementById('answer').textContent = data.answer;
        document.getElementById('sources').innerHTML = 
            'Sources: ' + data.sources?.map(s => s.title + ' (' + s.relevance + ')').join(', ');
    }
    </script>
</body>
</html>
"""
 
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8090)

Step 5: Slack Bot Integration

python
# slack_search.py
from slack_bolt import App
from search import answer_from_docs
 
slack = App(token=os.environ["SLACK_BOT_TOKEN"])
 
@slack.message("?ask")
def handle_ask(message, say):
    query = message["text"].replace("?ask", "").strip()
    result = answer_from_docs(query)
    
    say({
        "text": result["answer"],
        "blocks": [
            {"type": "section", "text": {"type": "mrkdwn", "text": result["answer"]}},
            {"type": "context", "elements": [
                {"type": "mrkdwn", 
                 "text": "Sources: " + " | ".join([f"`{s['title']}`" for s in result["sources"]])}
            ]}
        ]
    })

Usage in Slack: ?ask how do we rotate database credentials in vault?


Keeping Docs Fresh

Set up a cron job to re-index when docs change:

yaml
# Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: docs-indexer
spec:
  schedule: "0 2 * * *"   # nightly at 2am
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: indexer
            image: your-registry/docs-indexer:latest
            command: ["python", "indexer.py"]
            env:
            - name: VOYAGE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: ai-secrets
                  key: voyage-key

Cost

ComponentCost
Voyage AI embeddings (1000 docs)~$0.10 one-time
pgvector (RDS or self-hosted)$20–50/month
Claude API (100 searches/day)~$3–5/month
Total~$25–55/month

Cheap for the value — your team stops spending 20 minutes searching for how to do something they've done before.


The best part: this same system can index your Confluence, GitHub wikis, or any Markdown source. Build it once, and suddenly your organization's knowledge is actually findable.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments