Deploy Dify AI Platform on Kubernetes (2026)

Run Dify — the open-source LLM application platform — on your own Kubernetes cluster. Complete guide with Helm, persistent storage, Ingress, and connecting local models via Ollama.

Dify is an open-source platform for building and running LLM applications — workflows, RAG pipelines, AI agents, and chatbots. Instead of everyone on your team hacking together their own LLM scripts, Dify gives you a shared platform to build, deploy, and monitor AI applications.

This guide deploys Dify on Kubernetes with Helm, connects it to Ollama (local models), and exposes it via Ingress.

What Dify Does

Dify provides:

Visual workflow builder — drag-and-drop LLM pipelines
RAG pipeline — document ingestion + vector search + LLM answer generation
API gateway — publish AI apps as REST APIs
Monitoring — token usage, latency, error rates per application
Multi-model — connect OpenAI, Anthropic, local Ollama models, any OpenAI-compatible API

Use cases for DevOps teams:

Internal Slack bot that answers infra questions from runbooks
Log analysis tool that explains errors in plain English
Incident post-mortem assistant
Terraform review bot

Prerequisites

Kubernetes cluster (1.24+)
Helm 3+
At least 4 vCPU, 8GB RAM for the Dify stack
PV provisioner (EBS, local-path, etc.)
Optional: GPU node + Ollama deployment for local models

Step 1: Add the Helm Repository

bash

helm repo add dify https://langgenius.github.io/dify-helm
helm repo update

Step 2: Configure Values

yaml

# dify-values.yaml
 
global:
  host: "dify.internal.mycompany.com"
  enableTLS: false    # Set true if using cert-manager
 
# Secret key — generate with: openssl rand -hex 32
secretKey: "your-32-byte-hex-secret-here"
 
# Ingress
ingress:
  enabled: true
  className: nginx
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
  hosts:
  - host: dify.internal.mycompany.com
    paths:
    - path: /
      pathType: Prefix
 
# Database (built-in PostgreSQL)
postgresql:
  enabled: true
  auth:
    postgresPassword: "change-me-strong-password"
    database: dify
 
# Redis (built-in)
redis:
  enabled: true
 
# Vector store (Weaviate or pgvector)
vectordb:
  type: weaviate
  weaviate:
    enabled: true
 
# Storage (for uploaded files)
storage:
  type: local    # or s3
 
# API service
api:
  replicas: 1
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2"
      memory: "4Gi"
 
# Worker (async tasks)
worker:
  replicas: 1
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2"
      memory: "2Gi"
 
# Web frontend
web:
  replicas: 1
 
# Persistence
persistence:
  enabled: true
  storageClass: "gp3"
  size: 20Gi

Step 3: Deploy Dify

bash

kubectl create namespace dify
 
helm install dify dify/dify \
  -n dify \
  -f dify-values.yaml
 
# Watch deployment
kubectl get pods -n dify -w

Expected pods after deployment:

NAME                        READY   STATUS    
dify-api-xxx                1/1     Running
dify-worker-xxx             1/1     Running
dify-web-xxx                1/1     Running
dify-postgresql-xxx         1/1     Running
dify-redis-xxx              1/1     Running
dify-weaviate-xxx           1/1     Running

Step 4: Initial Setup

bash

# Get the Ingress URL
kubectl get ingress -n dify
 
# Or port-forward for local access
kubectl port-forward svc/dify-web 3000:80 -n dify

Open http://localhost:3000 (or your Ingress URL).

Complete the admin setup wizard
Create admin account
Navigate to Settings → Model Providers

# Deploy Ollama first
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: dify
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: models
          mountPath: /root/.ollama
        resources:
          limits:
            memory: "16Gi"
            cpu: "8"
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: dify
spec:
  selector:
    app: ollama
  ports:
  - port: 11434
    targetPort: 11434
EOF
 
# Pull a model
kubectl exec -it deployment/ollama -n dify -- ollama pull llama3.2:latest
kubectl exec -it deployment/ollama -n dify -- ollama pull nomic-embed-text

In Dify Settings → Model Providers → Add Model:

Provider: Ollama
Base URL: http://ollama.dify.svc.cluster.local:11434
Model: llama3.2

Step 6: Create Your First RAG Application

Example: DevOps Runbook Assistant

In Dify, go to Knowledge → Create Dataset
Upload your runbooks (PDF, Markdown, or paste text)
Dify will chunk and embed them into Weaviate

Create an application:

Studio → Create Application → Chatbot
Select your LLM model
Add system prompt:

You are a DevOps assistant for our engineering team. 
Answer questions based only on the provided runbooks and documentation.
If you don't know something from the documentation, say so clearly.
Format commands in code blocks.

Add the runbook dataset as context
Publish as API

Your DevOps bot API endpoint:

bash

curl -X POST https://dify.internal.mycompany.com/v1/chat-messages \
  -H "Authorization: Bearer app-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {},
    "query": "How do I restart a pod without downtime?",
    "response_mode": "blocking",
    "conversation_id": "",
    "user": "engineer-123"
  }'

Step 7: Slack Integration

python

# slack_bot.py — connect Slack to Dify
import os
from slack_bolt import App
import requests
 
app = App(token=os.environ["SLACK_BOT_TOKEN"])
DIFY_API_KEY = os.environ["DIFY_API_KEY"]
DIFY_URL = "https://dify.internal.mycompany.com/v1/chat-messages"
 
@app.message()
def handle_message(message, say):
    user_question = message.get("text", "")
 
    response = requests.post(
        DIFY_URL,
        headers={"Authorization": f"Bearer {DIFY_API_KEY}"},
        json={
            "inputs": {},
            "query": user_question,
            "response_mode": "blocking",
            "user": message.get("user", "slack-user")
        }
    )
 
    answer = response.json().get("answer", "Sorry, I couldn't get an answer.")
    say(answer)
 
if __name__ == "__main__":
    app.start(port=int(os.environ.get("PORT", 3000)))

Monitoring Dify

Dify exposes application-level metrics:

Token usage per app
Average response latency
Request volume
Error rates

Access via: Settings → Monitoring in the Dify UI.

For infrastructure metrics, add Prometheus scraping for PostgreSQL, Redis, and Weaviate.

Upgrade

bash

helm repo update
helm upgrade dify dify/dify \
  -n dify \
  -f dify-values.yaml

Related: Deploy Qwen2.5-Coder on Kubernetes | AI-Powered Incident Response with LLM Runbooks | Build AI SLO Budget Tracker

Affiliate note: For production Dify deployments, Weaviate Cloud handles the vector database so you don't manage it yourself (free sandbox tier). Anthropic Claude API is the recommended model for RAG pipelines due to its large context window and instruction-following ability.

Deploy Dify AI Platform on Kubernetes (2026)

What Dify Does

Prerequisites

Step 1: Add the Helm Repository

Step 2: Configure Values

Step 3: Deploy Dify

Step 4: Initial Setup

Step 5: Connect Models

Option A: OpenAI API

Option B: Anthropic Claude

Option C: Local Models via Ollama

Step 6: Create Your First RAG Application

Example: DevOps Runbook Assistant

Step 7: Slack Integration

Monitoring Dify

Upgrade

Stay ahead of the curve

Related Articles

Build AI-Powered Kubernetes Policy Enforcer with OPA and Claude

Build an AI Kubernetes Runbook Generator with LLMs (2026)

Build an AI Alert Classifier for Grafana Using LLMs (2026)

Comments