🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Set Up Qdrant Vector Database on Kubernetes for RAG Applications

Qdrant is the fastest open-source vector database for RAG pipelines. Here's how to deploy it on Kubernetes with persistent storage, set up collections, and connect it to LangChain or LlamaIndex.

DevOpsBoysMay 17, 20264 min read
Share:Tweet

If you're building RAG (Retrieval-Augmented Generation) applications with LLMs, you need a vector database. Qdrant is fast, open source, and runs well on Kubernetes. Here's the full setup.


What is Qdrant?

Qdrant stores vectors (floating-point arrays that represent text/image embeddings) and lets you search for semantically similar items.

In a RAG pipeline:

Your docs → Embedding model → Vectors → Qdrant
User query → Embedding model → Query vector → Qdrant similarity search → Relevant docs → LLM

Qdrant vs alternatives:

  • Qdrant: Rust-based, fastest query speed, best self-hosted experience
  • Pinecone: Fully managed, no self-hosting option
  • Weaviate: Feature-rich but heavier
  • Chroma: Simple, Python-native, great for dev but not production-grade

Deploy on Kubernetes

bash
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo update
 
helm install qdrant qdrant/qdrant \
  --namespace qdrant \
  --create-namespace \
  --set replicaCount=1 \
  --set persistence.size=10Gi \
  --set persistence.storageClass=gp3

Option 2 — Custom Manifests (More Control)

yaml
# qdrant-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: qdrant
spec:
  serviceName: qdrant
  replicas: 1
  selector:
    matchLabels:
      app: qdrant
  template:
    metadata:
      labels:
        app: qdrant
    spec:
      containers:
        - name: qdrant
          image: qdrant/qdrant:v1.9.0
          ports:
            - containerPort: 6333   # HTTP REST API
            - containerPort: 6334   # gRPC
          env:
            - name: QDRANT__SERVICE__API_KEY
              valueFrom:
                secretKeyRef:
                  name: qdrant-secret
                  key: api-key
          resources:
            requests:
              memory: "512Mi"
              cpu: "250m"
            limits:
              memory: "2Gi"
              cpu: "2"
          volumeMounts:
            - name: qdrant-storage
              mountPath: /qdrant/storage
          readinessProbe:
            httpGet:
              path: /healthz
              port: 6333
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz
              port: 6333
            initialDelaySeconds: 30
            periodSeconds: 30
  volumeClaimTemplates:
    - metadata:
        name: qdrant-storage
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: gp3
        resources:
          requests:
            storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: qdrant
  namespace: qdrant
spec:
  selector:
    app: qdrant
  ports:
    - name: http
      port: 6333
      targetPort: 6333
    - name: grpc
      port: 6334
      targetPort: 6334
  type: ClusterIP
bash
# Create API key secret
kubectl create secret generic qdrant-secret \
  --from-literal=api-key=your-strong-api-key-here \
  -n qdrant
 
kubectl apply -f qdrant-deployment.yaml

Verify Qdrant is Running

bash
# Port-forward to test locally
kubectl port-forward svc/qdrant 6333:6333 -n qdrant
 
# Check health
curl http://localhost:6333/healthz
 
# Check collections (should be empty initially)
curl http://localhost:6333/collections \
  -H "api-key: your-strong-api-key-here"

Connect Your RAG Application

With LangChain

python
# rag_pipeline.py
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_anthropic import ChatAnthropic
from langchain.chains import RetrievalQA
from qdrant_client import QdrantClient
import os
 
QDRANT_URL = "http://qdrant.qdrant.svc.cluster.local:6333"
QDRANT_API_KEY = os.environ["QDRANT_API_KEY"]
COLLECTION_NAME = "devops-docs"
 
# Initialize embedding model
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
 
# Connect to Qdrant
qdrant_client = QdrantClient(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY
)
 
def ingest_documents(documents: list[str], source_names: list[str]):
    """Split and embed documents, store in Qdrant"""
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50
    )
    
    texts = []
    metadatas = []
    
    for doc, source in zip(documents, source_names):
        chunks = splitter.split_text(doc)
        texts.extend(chunks)
        metadatas.extend([{"source": source, "chunk": i} 
                          for i in range(len(chunks))])
    
    # Store in Qdrant
    vectorstore = Qdrant.from_texts(
        texts=texts,
        embedding=embeddings,
        metadatas=metadatas,
        url=QDRANT_URL,
        api_key=QDRANT_API_KEY,
        collection_name=COLLECTION_NAME
    )
    
    print(f"Ingested {len(texts)} chunks into Qdrant")
    return vectorstore
 
 
def create_qa_chain():
    """Create a RAG QA chain using Qdrant + Claude"""
    vectorstore = Qdrant(
        client=qdrant_client,
        collection_name=COLLECTION_NAME,
        embeddings=embeddings
    )
    
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    llm = ChatAnthropic(
        model="claude-sonnet-4-6",
        api_key=os.environ["ANTHROPIC_API_KEY"]
    )
    
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True
    )
    
    return qa_chain
 
 
# Usage
if __name__ == "__main__":
    # Ingest your DevOps runbooks/docs
    docs = [
        "To debug a Kubernetes pod: kubectl describe pod <name>...",
        "Terraform plan shows unexpected destroy when...",
    ]
    sources = ["k8s-runbook.md", "terraform-guide.md"]
    
    ingest_documents(docs, sources)
    
    qa = create_qa_chain()
    result = qa.invoke({"query": "How do I debug a crashing Kubernetes pod?"})
    
    print("Answer:", result["result"])
    print("\nSources:")
    for doc in result["source_documents"]:
        print(f"  - {doc.metadata['source']}")

Production Configuration

Enable Qdrant Cluster Mode (Multiple Replicas)

yaml
# For production: 3-node Qdrant cluster
helm upgrade qdrant qdrant/qdrant \
  --set replicaCount=3 \
  --set config.cluster.enabled=true \
  --set config.cluster.p2p.port=6335 \
  --set service.type=ClusterIP

Add Ingress for External Access

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: qdrant-ingress
  namespace: qdrant
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  rules:
    - host: qdrant.internal.mycompany.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: qdrant
                port:
                  number: 6333

Backup Collections to S3

python
import boto3
from qdrant_client import QdrantClient
 
def backup_qdrant_to_s3(collection_name: str, s3_bucket: str):
    client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
    
    # Create snapshot
    snapshot_info = client.create_snapshot(collection_name=collection_name)
    snapshot_name = snapshot_info.name
    
    # Download snapshot
    snapshot_data = client.get_snapshot(
        collection_name=collection_name,
        snapshot_name=snapshot_name
    )
    
    # Upload to S3
    s3 = boto3.client('s3')
    s3.put_object(
        Bucket=s3_bucket,
        Key=f"qdrant-backups/{collection_name}/{snapshot_name}",
        Body=snapshot_data
    )
    
    print(f"Backup complete: s3://{s3_bucket}/qdrant-backups/{collection_name}/{snapshot_name}")

Sizing Guide

Use CaseVectorsRAM NeededStorage
Dev/testing<100K512Mi2Gi
Small app100K–1M2Gi10Gi
Production1M–10M4–8Gi50Gi
Large scale10M+16Gi+200Gi+

Vector size matters: all-MiniLM-L6-v2 (384 dims) uses ~1.5KB per vector. OpenAI text-embedding-3-small (1536 dims) uses ~6KB per vector.


Qdrant on Kubernetes gives you a production-grade vector database that you fully control — no per-query pricing, no vendor lock-in.

For ML infrastructure and Kubernetes hands-on labs, KodeKloud has courses on containerized ML workloads and GPU-accelerated Kubernetes setups.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments