🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Deploy Dify AI Platform on Kubernetes (2026)

Run Dify — the open-source LLM application platform — on your own Kubernetes cluster. Complete guide with Helm, persistent storage, Ingress, and connecting local models via Ollama.

DevOpsBoysMay 26, 20265 min read
Share:Tweet

Dify is an open-source platform for building and running LLM applications — workflows, RAG pipelines, AI agents, and chatbots. Instead of everyone on your team hacking together their own LLM scripts, Dify gives you a shared platform to build, deploy, and monitor AI applications.

This guide deploys Dify on Kubernetes with Helm, connects it to Ollama (local models), and exposes it via Ingress.


What Dify Does

Dify provides:

  • Visual workflow builder — drag-and-drop LLM pipelines
  • RAG pipeline — document ingestion + vector search + LLM answer generation
  • API gateway — publish AI apps as REST APIs
  • Monitoring — token usage, latency, error rates per application
  • Multi-model — connect OpenAI, Anthropic, local Ollama models, any OpenAI-compatible API

Use cases for DevOps teams:

  • Internal Slack bot that answers infra questions from runbooks
  • Log analysis tool that explains errors in plain English
  • Incident post-mortem assistant
  • Terraform review bot

Prerequisites

  • Kubernetes cluster (1.24+)
  • Helm 3+
  • At least 4 vCPU, 8GB RAM for the Dify stack
  • PV provisioner (EBS, local-path, etc.)
  • Optional: GPU node + Ollama deployment for local models

Step 1: Add the Helm Repository

bash
helm repo add dify https://langgenius.github.io/dify-helm
helm repo update

Step 2: Configure Values

yaml
# dify-values.yaml
 
global:
  host: "dify.internal.mycompany.com"
  enableTLS: false    # Set true if using cert-manager
 
# Secret key — generate with: openssl rand -hex 32
secretKey: "your-32-byte-hex-secret-here"
 
# Ingress
ingress:
  enabled: true
  className: nginx
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
  hosts:
  - host: dify.internal.mycompany.com
    paths:
    - path: /
      pathType: Prefix
 
# Database (built-in PostgreSQL)
postgresql:
  enabled: true
  auth:
    postgresPassword: "change-me-strong-password"
    database: dify
 
# Redis (built-in)
redis:
  enabled: true
 
# Vector store (Weaviate or pgvector)
vectordb:
  type: weaviate
  weaviate:
    enabled: true
 
# Storage (for uploaded files)
storage:
  type: local    # or s3
 
# API service
api:
  replicas: 1
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2"
      memory: "4Gi"
 
# Worker (async tasks)
worker:
  replicas: 1
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2"
      memory: "2Gi"
 
# Web frontend
web:
  replicas: 1
 
# Persistence
persistence:
  enabled: true
  storageClass: "gp3"
  size: 20Gi

Step 3: Deploy Dify

bash
kubectl create namespace dify
 
helm install dify dify/dify \
  -n dify \
  -f dify-values.yaml
 
# Watch deployment
kubectl get pods -n dify -w

Expected pods after deployment:

NAME                        READY   STATUS    
dify-api-xxx                1/1     Running
dify-worker-xxx             1/1     Running
dify-web-xxx                1/1     Running
dify-postgresql-xxx         1/1     Running
dify-redis-xxx              1/1     Running
dify-weaviate-xxx           1/1     Running

Step 4: Initial Setup

bash
# Get the Ingress URL
kubectl get ingress -n dify
 
# Or port-forward for local access
kubectl port-forward svc/dify-web 3000:80 -n dify

Open http://localhost:3000 (or your Ingress URL).

  1. Complete the admin setup wizard
  2. Create admin account
  3. Navigate to Settings → Model Providers

Step 5: Connect Models

Option A: OpenAI API

Settings → Model Providers → OpenAI → Add API Key

Option B: Anthropic Claude

Settings → Model Providers → Anthropic → Add API Key

Option C: Local Models via Ollama

If you have Ollama running in the same cluster:

bash
# Deploy Ollama first
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: dify
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: models
          mountPath: /root/.ollama
        resources:
          limits:
            memory: "16Gi"
            cpu: "8"
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: ollama-models
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: dify
spec:
  selector:
    app: ollama
  ports:
  - port: 11434
    targetPort: 11434
EOF
 
# Pull a model
kubectl exec -it deployment/ollama -n dify -- ollama pull llama3.2:latest
kubectl exec -it deployment/ollama -n dify -- ollama pull nomic-embed-text

In Dify Settings → Model Providers → Add Model:

  • Provider: Ollama
  • Base URL: http://ollama.dify.svc.cluster.local:11434
  • Model: llama3.2

Step 6: Create Your First RAG Application

Example: DevOps Runbook Assistant

  1. In Dify, go to KnowledgeCreate Dataset
  2. Upload your runbooks (PDF, Markdown, or paste text)
  3. Dify will chunk and embed them into Weaviate

Create an application:

  1. StudioCreate ApplicationChatbot
  2. Select your LLM model
  3. Add system prompt:
You are a DevOps assistant for our engineering team. 
Answer questions based only on the provided runbooks and documentation.
If you don't know something from the documentation, say so clearly.
Format commands in code blocks.
  1. Add the runbook dataset as context
  2. Publish as API

Your DevOps bot API endpoint:

bash
curl -X POST https://dify.internal.mycompany.com/v1/chat-messages \
  -H "Authorization: Bearer app-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {},
    "query": "How do I restart a pod without downtime?",
    "response_mode": "blocking",
    "conversation_id": "",
    "user": "engineer-123"
  }'

Step 7: Slack Integration

python
# slack_bot.py — connect Slack to Dify
import os
from slack_bolt import App
import requests
 
app = App(token=os.environ["SLACK_BOT_TOKEN"])
DIFY_API_KEY = os.environ["DIFY_API_KEY"]
DIFY_URL = "https://dify.internal.mycompany.com/v1/chat-messages"
 
@app.message()
def handle_message(message, say):
    user_question = message.get("text", "")
 
    response = requests.post(
        DIFY_URL,
        headers={"Authorization": f"Bearer {DIFY_API_KEY}"},
        json={
            "inputs": {},
            "query": user_question,
            "response_mode": "blocking",
            "user": message.get("user", "slack-user")
        }
    )
 
    answer = response.json().get("answer", "Sorry, I couldn't get an answer.")
    say(answer)
 
if __name__ == "__main__":
    app.start(port=int(os.environ.get("PORT", 3000)))

Monitoring Dify

Dify exposes application-level metrics:

  • Token usage per app
  • Average response latency
  • Request volume
  • Error rates

Access via: Settings → Monitoring in the Dify UI.

For infrastructure metrics, add Prometheus scraping for PostgreSQL, Redis, and Weaviate.


Upgrade

bash
helm repo update
helm upgrade dify dify/dify \
  -n dify \
  -f dify-values.yaml

Related: Deploy Qwen2.5-Coder on Kubernetes | AI-Powered Incident Response with LLM Runbooks | Build AI SLO Budget Tracker

Affiliate note: For production Dify deployments, Weaviate Cloud handles the vector database so you don't manage it yourself (free sandbox tier). Anthropic Claude API is the recommended model for RAG pipelines due to its large context window and instruction-following ability.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments