Deploy Dify AI Platform on Kubernetes (2026)
Run Dify — the open-source LLM application platform — on your own Kubernetes cluster. Complete guide with Helm, persistent storage, Ingress, and connecting local models via Ollama.
Dify is an open-source platform for building and running LLM applications — workflows, RAG pipelines, AI agents, and chatbots. Instead of everyone on your team hacking together their own LLM scripts, Dify gives you a shared platform to build, deploy, and monitor AI applications.
This guide deploys Dify on Kubernetes with Helm, connects it to Ollama (local models), and exposes it via Ingress.
What Dify Does
Dify provides:
- Visual workflow builder — drag-and-drop LLM pipelines
- RAG pipeline — document ingestion + vector search + LLM answer generation
- API gateway — publish AI apps as REST APIs
- Monitoring — token usage, latency, error rates per application
- Multi-model — connect OpenAI, Anthropic, local Ollama models, any OpenAI-compatible API
Use cases for DevOps teams:
- Internal Slack bot that answers infra questions from runbooks
- Log analysis tool that explains errors in plain English
- Incident post-mortem assistant
- Terraform review bot
Prerequisites
- Kubernetes cluster (1.24+)
- Helm 3+
- At least 4 vCPU, 8GB RAM for the Dify stack
- PV provisioner (EBS, local-path, etc.)
- Optional: GPU node + Ollama deployment for local models
Step 1: Add the Helm Repository
helm repo add dify https://langgenius.github.io/dify-helm
helm repo updateStep 2: Configure Values
# dify-values.yaml
global:
host: "dify.internal.mycompany.com"
enableTLS: false # Set true if using cert-manager
# Secret key — generate with: openssl rand -hex 32
secretKey: "your-32-byte-hex-secret-here"
# Ingress
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
hosts:
- host: dify.internal.mycompany.com
paths:
- path: /
pathType: Prefix
# Database (built-in PostgreSQL)
postgresql:
enabled: true
auth:
postgresPassword: "change-me-strong-password"
database: dify
# Redis (built-in)
redis:
enabled: true
# Vector store (Weaviate or pgvector)
vectordb:
type: weaviate
weaviate:
enabled: true
# Storage (for uploaded files)
storage:
type: local # or s3
# API service
api:
replicas: 1
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
# Worker (async tasks)
worker:
replicas: 1
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "2Gi"
# Web frontend
web:
replicas: 1
# Persistence
persistence:
enabled: true
storageClass: "gp3"
size: 20GiStep 3: Deploy Dify
kubectl create namespace dify
helm install dify dify/dify \
-n dify \
-f dify-values.yaml
# Watch deployment
kubectl get pods -n dify -wExpected pods after deployment:
NAME READY STATUS
dify-api-xxx 1/1 Running
dify-worker-xxx 1/1 Running
dify-web-xxx 1/1 Running
dify-postgresql-xxx 1/1 Running
dify-redis-xxx 1/1 Running
dify-weaviate-xxx 1/1 Running
Step 4: Initial Setup
# Get the Ingress URL
kubectl get ingress -n dify
# Or port-forward for local access
kubectl port-forward svc/dify-web 3000:80 -n difyOpen http://localhost:3000 (or your Ingress URL).
- Complete the admin setup wizard
- Create admin account
- Navigate to Settings → Model Providers
Step 5: Connect Models
Option A: OpenAI API
Settings → Model Providers → OpenAI → Add API Key
Option B: Anthropic Claude
Settings → Model Providers → Anthropic → Add API Key
Option C: Local Models via Ollama
If you have Ollama running in the same cluster:
# Deploy Ollama first
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: dify
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
volumeMounts:
- name: models
mountPath: /root/.ollama
resources:
limits:
memory: "16Gi"
cpu: "8"
volumes:
- name: models
persistentVolumeClaim:
claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: dify
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
EOF
# Pull a model
kubectl exec -it deployment/ollama -n dify -- ollama pull llama3.2:latest
kubectl exec -it deployment/ollama -n dify -- ollama pull nomic-embed-textIn Dify Settings → Model Providers → Add Model:
- Provider: Ollama
- Base URL:
http://ollama.dify.svc.cluster.local:11434 - Model:
llama3.2
Step 6: Create Your First RAG Application
Example: DevOps Runbook Assistant
- In Dify, go to Knowledge → Create Dataset
- Upload your runbooks (PDF, Markdown, or paste text)
- Dify will chunk and embed them into Weaviate
Create an application:
- Studio → Create Application → Chatbot
- Select your LLM model
- Add system prompt:
You are a DevOps assistant for our engineering team.
Answer questions based only on the provided runbooks and documentation.
If you don't know something from the documentation, say so clearly.
Format commands in code blocks.
- Add the runbook dataset as context
- Publish as API
Your DevOps bot API endpoint:
curl -X POST https://dify.internal.mycompany.com/v1/chat-messages \
-H "Authorization: Bearer app-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"inputs": {},
"query": "How do I restart a pod without downtime?",
"response_mode": "blocking",
"conversation_id": "",
"user": "engineer-123"
}'Step 7: Slack Integration
# slack_bot.py — connect Slack to Dify
import os
from slack_bolt import App
import requests
app = App(token=os.environ["SLACK_BOT_TOKEN"])
DIFY_API_KEY = os.environ["DIFY_API_KEY"]
DIFY_URL = "https://dify.internal.mycompany.com/v1/chat-messages"
@app.message()
def handle_message(message, say):
user_question = message.get("text", "")
response = requests.post(
DIFY_URL,
headers={"Authorization": f"Bearer {DIFY_API_KEY}"},
json={
"inputs": {},
"query": user_question,
"response_mode": "blocking",
"user": message.get("user", "slack-user")
}
)
answer = response.json().get("answer", "Sorry, I couldn't get an answer.")
say(answer)
if __name__ == "__main__":
app.start(port=int(os.environ.get("PORT", 3000)))Monitoring Dify
Dify exposes application-level metrics:
- Token usage per app
- Average response latency
- Request volume
- Error rates
Access via: Settings → Monitoring in the Dify UI.
For infrastructure metrics, add Prometheus scraping for PostgreSQL, Redis, and Weaviate.
Upgrade
helm repo update
helm upgrade dify dify/dify \
-n dify \
-f dify-values.yamlRelated: Deploy Qwen2.5-Coder on Kubernetes | AI-Powered Incident Response with LLM Runbooks | Build AI SLO Budget Tracker
Affiliate note: For production Dify deployments, Weaviate Cloud handles the vector database so you don't manage it yourself (free sandbox tier). Anthropic Claude API is the recommended model for RAG pipelines due to its large context window and instruction-following ability.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build an AI Kubernetes Runbook Generator with LLMs (2026)
Manual runbooks go stale. Build a system that watches your Kubernetes cluster, detects incidents, and generates step-by-step runbooks automatically using LLMs. Full implementation with Python, kubectl, and Ollama.
Build an AI Alert Classifier for Grafana Using LLMs (2026)
Tired of noisy Grafana alerts that wake you up for nothing? Build an AI layer that classifies incoming alerts as actionable or noise, enriches them with context, and routes them intelligently — using Claude or GPT-4 as the reasoning engine.
Build an AI Kubernetes Troubleshooter with Claude (2026)
Build a CLI tool that automatically diagnoses Kubernetes issues — OOMKilled, CrashLoopBackOff, pending pods — by gathering cluster state and asking Claude what's wrong and how to fix it.