🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Build a Semantic Search for Your DevOps Docs Using Vector Database (2026)

Tired of grepping through runbooks? Build a semantic search that finds relevant docs by meaning, not keywords — using embeddings, pgvector, and the Claude API.

DevOpsBoys6 min read
Share:Tweet

Your team's runbooks, postmortems, and architecture docs are valuable — but nobody reads them because grep-based search is terrible. Here's how to build semantic search that actually works.


What We're Building

A search tool that:

  • Indexes your runbooks, postmortems, and docs into a vector database
  • Accepts natural language queries: "how do we handle cert-manager failures?"
  • Returns relevant documents ranked by semantic similarity (not keyword match)
  • Uses Claude to summarize the top results into a direct answer
  • Deployable as a Slack bot or web tool

Why Vector Search Beats Grep

Grep-based search: "cert-manager" → finds docs with that exact word

Vector search: "TLS certificate renewal failing" → finds docs about cert-manager, certificate rotation, ACME challenges — even if they don't use those exact words

Vector search understands meaning, not just keywords. This matters when runbooks use different terminology than the incident happening.


Stack

  • pgvector — PostgreSQL extension for storing and searching embeddings
  • Voyage AI embeddings (or OpenAI, or local models) — convert text to vectors
  • Claude API — synthesize search results into answers
  • FastAPI — REST API
  • Python — scripts to index documents

Setup

bash
pip install anthropic voyageai psycopg2-binary pgvector fastapi uvicorn \
  python-dotenv markdown beautifulsoup4
bash
# .env
ANTHROPIC_API_KEY=sk-ant-...
VOYAGE_API_KEY=pa-...
DATABASE_URL=postgresql://user:password@localhost:5432/devops_search

Step 1: Set Up pgvector

sql
-- Run in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
 
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    source TEXT,           -- file path or URL
    doc_type TEXT,         -- runbook, postmortem, architecture, cheatsheet
    embedding vector(1024), -- Voyage AI produces 1024-dim vectors
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);
 
-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

Step 2: Indexer — Convert Docs to Vectors

python
# indexer.py
import os
import glob
import voyageai
import psycopg2
from psycopg2.extras import execute_values
from pathlib import Path
 
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
 
def chunk_document(content: str, chunk_size: int = 800) -> list[str]:
    """Split long documents into overlapping chunks"""
    words = content.split()
    chunks = []
    overlap = 100
    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk:
            chunks.append(chunk)
    return chunks
 
def index_markdown_files(docs_dir: str):
    """Index all .md and .mdx files in a directory"""
    files = glob.glob(f"{docs_dir}/**/*.md", recursive=True)
    files += glob.glob(f"{docs_dir}/**/*.mdx", recursive=True)
    
    print(f"Found {len(files)} files to index")
    
    for filepath in files:
        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
        
        # Remove MDX frontmatter
        if content.startswith('---'):
            end = content.find('---', 3)
            frontmatter = content[3:end]
            content = content[end+3:].strip()
            title = next((line.split('title:')[1].strip().strip('"') 
                         for line in frontmatter.split('\n') 
                         if line.startswith('title:')), Path(filepath).stem)
        else:
            title = Path(filepath).stem
        
        # Split into chunks
        chunks = chunk_document(content)
        if not chunks:
            continue
        
        # Get embeddings in batch (max 128 per request for Voyage)
        print(f"Indexing: {title} ({len(chunks)} chunks)")
        embeddings = voyage.embed(
            chunks,
            model="voyage-large-2",
            input_type="document"
        ).embeddings
        
        # Store in database
        with conn.cursor() as cur:
            # Delete existing entries for this file
            cur.execute("DELETE FROM documents WHERE source = %s", (filepath,))
            
            # Insert new chunks
            data = [
                (f"{title} (part {i+1})", chunk, filepath, "runbook", embedding)
                for i, (chunk, embedding) in enumerate(zip(chunks, embeddings))
            ]
            execute_values(cur, """
                INSERT INTO documents (title, content, source, doc_type, embedding)
                VALUES %s
            """, data)
        
        conn.commit()
    
    print("Indexing complete!")
 
# Run indexing
index_markdown_files("./runbooks")
index_markdown_files("./postmortems")
index_markdown_files("./architecture-docs")

Step 3: Search + Answer Function

python
# search.py
import os
import voyageai
import psycopg2
import anthropic
from dataclasses import dataclass
 
voyage = voyageai.Client(api_key=os.environ["VOYAGE_API_KEY"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
conn = psycopg2.connect(os.environ["DATABASE_URL"])
 
@dataclass
class SearchResult:
    title: str
    content: str
    source: str
    similarity: float
 
def semantic_search(query: str, top_k: int = 5) -> list[SearchResult]:
    """Find most relevant documents for a query"""
    # Convert query to vector
    query_embedding = voyage.embed(
        [query],
        model="voyage-large-2",
        input_type="query"
    ).embeddings[0]
    
    # Search using cosine similarity
    with conn.cursor() as cur:
        cur.execute("""
            SELECT title, content, source,
                   1 - (embedding <=> %s::vector) AS similarity
            FROM documents
            ORDER BY embedding <=> %s::vector
            LIMIT %s
        """, (query_embedding, query_embedding, top_k))
        
        results = cur.fetchall()
    
    return [
        SearchResult(title=r[0], content=r[1], source=r[2], similarity=r[3])
        for r in results
    ]
 
def answer_from_docs(query: str) -> dict:
    """Search docs and generate an answer using Claude"""
    # Find relevant docs
    results = semantic_search(query, top_k=4)
    
    if not results or results[0].similarity < 0.5:
        return {
            "answer": "No relevant documentation found for this query.",
            "sources": []
        }
    
    # Build context from top results
    context = "\n\n---\n\n".join([
        f"**{r.title}** (relevance: {r.similarity:.0%})\n{r.content[:1000]}"
        for r in results
    ])
    
    # Generate answer with Claude
    response = claude.messages.create(
        model="claude-opus-4-7",
        max_tokens=800,
        system="""You are a senior DevOps engineer answering questions using internal documentation.
        
Rules:
- Answer directly based on the provided documentation
- If the docs don't fully answer the question, say so
- Include specific commands or steps when available
- Be concise — engineers need fast answers during incidents""",
        messages=[{
            "role": "user",
            "content": f"""Question: {query}
 
Relevant documentation:
{context}
 
Answer the question based on the documentation above."""
        }]
    )
    
    return {
        "answer": response.content[0].text,
        "sources": [{"title": r.title, "source": r.source, "relevance": f"{r.similarity:.0%}"} 
                   for r in results[:3]]
    }

Step 4: FastAPI

python
# main.py
from fastapi import FastAPI
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from search import answer_from_docs, semantic_search
 
app = FastAPI(title="DevOps Docs Search")
 
class SearchQuery(BaseModel):
    query: str
    mode: str = "answer"  # "answer" or "search"
 
@app.post("/search")
async def search(request: SearchQuery):
    if request.mode == "answer":
        return answer_from_docs(request.query)
    else:
        results = semantic_search(request.query)
        return {"results": [
            {"title": r.title, "content": r.content[:300] + "...", 
             "source": r.source, "relevance": f"{r.similarity:.0%}"}
            for r in results
        ]}
 
@app.get("/", response_class=HTMLResponse)
async def ui():
    return """
<!DOCTYPE html>
<html>
<head>
    <title>DevOps Docs Search</title>
    <style>
        body { font-family: monospace; max-width: 800px; margin: 40px auto; background: #0f0f0f; color: #e0e0e0; padding: 20px; }
        input { width: 80%; padding: 10px; background: #1a1a1a; color: #e0e0e0; border: 1px solid #333; border-radius: 4px; }
        button { padding: 10px 20px; background: #7c3aed; color: white; border: none; border-radius: 4px; cursor: pointer; margin-left: 8px; }
        #answer { margin-top: 20px; padding: 16px; background: #1a1a1a; border-left: 3px solid #7c3aed; white-space: pre-wrap; display: none; }
        #sources { margin-top: 12px; font-size: 12px; color: #666; }
        h1 { color: #7c3aed; }
    </style>
</head>
<body>
    <h1>🔍 DevOps Docs Search</h1>
    <p>Ask anything about your runbooks, postmortems, and architecture docs.</p>
    <input id="query" placeholder="e.g. How do we handle cert-manager failures?" 
           onkeypress="if(event.key==='Enter') search()" />
    <button onclick="search()">Search</button>
    <div id="answer"></div>
    <div id="sources"></div>
    <script>
    async function search() {
        const query = document.getElementById('query').value;
        document.getElementById('answer').textContent = 'Searching...';
        document.getElementById('answer').style.display = 'block';
        const res = await fetch('/search', {
            method: 'POST',
            headers: {'Content-Type': 'application/json'},
            body: JSON.stringify({query, mode: 'answer'})
        });
        const data = await res.json();
        document.getElementById('answer').textContent = data.answer;
        document.getElementById('sources').innerHTML = 
            'Sources: ' + data.sources?.map(s => s.title + ' (' + s.relevance + ')').join(', ');
    }
    </script>
</body>
</html>
"""
 
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8090)

Step 5: Slack Bot Integration

python
# slack_search.py
from slack_bolt import App
from search import answer_from_docs
 
slack = App(token=os.environ["SLACK_BOT_TOKEN"])
 
@slack.message("?ask")
def handle_ask(message, say):
    query = message["text"].replace("?ask", "").strip()
    result = answer_from_docs(query)
    
    say({
        "text": result["answer"],
        "blocks": [
            {"type": "section", "text": {"type": "mrkdwn", "text": result["answer"]}},
            {"type": "context", "elements": [
                {"type": "mrkdwn", 
                 "text": "Sources: " + " | ".join([f"`{s['title']}`" for s in result["sources"]])}
            ]}
        ]
    })

Usage in Slack: ?ask how do we rotate database credentials in vault?


Keeping Docs Fresh

Set up a cron job to re-index when docs change:

yaml
# Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: docs-indexer
spec:
  schedule: "0 2 * * *"   # nightly at 2am
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: indexer
            image: your-registry/docs-indexer:latest
            command: ["python", "indexer.py"]
            env:
            - name: VOYAGE_API_KEY
              valueFrom:
                secretKeyRef:
                  name: ai-secrets
                  key: voyage-key

Cost

ComponentCost
Voyage AI embeddings (1000 docs)~$0.10 one-time
pgvector (RDS or self-hosted)$20–50/month
Claude API (100 searches/day)~$3–5/month
Total~$25–55/month

Cheap for the value — your team stops spending 20 minutes searching for how to do something they've done before.


The best part: this same system can index your Confluence, GitHub wikis, or any Markdown source. Build it once, and suddenly your organization's knowledge is actually findable.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments