All Articles

Build a Slack ChatOps Bot for Kubernetes Alerts Using Claude API (2026)

Build a Slack bot that receives Kubernetes Alertmanager webhooks, calls Claude AI to explain the alert and suggest fixes, then posts actionable runbook steps in Slack.

DevOpsBoysApr 12, 20265 min read
Share:Tweet

On-call gets paged at 2am: "KubePodCrashLooping — my-app — production". Without context, you scramble. With a ChatOps bot + LLM, Slack immediately shows: what it means, likely causes, and fix commands.

Here's how to build it.

Architecture

Alertmanager → webhook → Node.js bot → Claude API → Slack
                              ↓
                       Alert context
                       K8s cluster info
                       Runbook lookup

The bot:

  1. Receives webhook from Alertmanager when an alert fires
  2. Enriches it with pod/node details from Kubernetes API
  3. Asks Claude to explain the alert and generate fix steps
  4. Posts a formatted message to a Slack channel

Prerequisites

  • Kubernetes cluster with Prometheus + Alertmanager
  • Slack workspace (free tier works)
  • Claude API key (get at console.anthropic.com)
  • Node.js 20+

Step 1: Create Slack App

  1. Go to api.slack.com/appsCreate New App
  2. From scratch → name: k8s-alertbot → your workspace
  3. OAuth & Permissions → Bot Token Scopes → add: chat:write, channels:read
  4. Install to Workspace → copy Bot User OAuth Token (xoxb-...)
  5. Invite the bot to your alerts channel: /invite @k8s-alertbot

Step 2: Build the Bot

bash
mkdir k8s-alertbot && cd k8s-alertbot
npm init -y
npm install @slack/web-api @anthropic-ai/sdk express @kubernetes/client-node

bot.js

javascript
const express = require('express')
const { WebClient } = require('@slack/web-api')
const Anthropic = require('@anthropic-ai/sdk')
const k8s = require('@kubernetes/client-node')
 
const app = express()
app.use(express.json())
 
const slack = new WebClient(process.env.SLACK_BOT_TOKEN)
const claude = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })
 
// Kubernetes client (in-cluster config when running in K8s)
const kc = new k8s.KubeConfig()
kc.loadFromDefault()   // uses in-cluster ServiceAccount if in K8s, else ~/.kube/config
const coreApi = kc.makeApiClient(k8s.CoreV1Api)
 
// Fetch pod details for context
async function getPodContext(namespace, podName) {
  try {
    const pod = await coreApi.readNamespacedPod(podName, namespace)
    const p = pod.body
    return {
      name: p.metadata.name,
      namespace: p.metadata.namespace,
      phase: p.status.phase,
      containers: p.status.containerStatuses?.map(c => ({
        name: c.name,
        ready: c.ready,
        restartCount: c.restartCount,
        state: JSON.stringify(c.state)
      })) || [],
      conditions: p.status.conditions?.map(c => `${c.type}=${c.status}`).join(', ') || 'none',
      nodeName: p.spec.nodeName
    }
  } catch (e) {
    return { error: `Could not fetch pod: ${e.message}` }
  }
}
 
// Ask Claude to explain alert and suggest fixes
async function explainAlert(alertName, labels, annotations, podContext) {
  const prompt = `
You are a Kubernetes SRE assistant. An alert just fired.
 
Alert: ${alertName}
Labels: ${JSON.stringify(labels, null, 2)}
Annotations: ${JSON.stringify(annotations, null, 2)}
 
${podContext.error ? `Pod fetch failed: ${podContext.error}` : `
Pod context:
- Phase: ${podContext.phase}
- Node: ${podContext.nodeName}
- Containers: ${JSON.stringify(podContext.containers, null, 2)}
- Conditions: ${podContext.conditions}
`}
 
Provide:
1. **What happened** — 1-2 sentence plain English explanation
2. **Most likely causes** — top 3 causes with probability
3. **Immediate fix commands** — exact kubectl commands to diagnose and fix
4. **Escalate if** — when to wake up a senior engineer
 
Keep it concise. Use code blocks for commands.
`
 
  const response = await claude.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 800,
    messages: [{ role: 'user', content: prompt }]
  })
 
  return response.content[0].text
}
 
// Alertmanager webhook handler
app.post('/webhook', async (req, res) => {
  res.sendStatus(200)  // respond immediately to Alertmanager
 
  const { alerts } = req.body
  
  for (const alert of alerts) {
    if (alert.status !== 'firing') continue
 
    const { alertname, namespace, pod } = alert.labels
    const channel = process.env.SLACK_CHANNEL || '#alerts'
 
    try {
      // Get pod context if available
      let podContext = {}
      if (namespace && pod) {
        podContext = await getPodContext(namespace, pod)
      }
 
      // Get Claude's analysis
      const analysis = await explainAlert(
        alertname,
        alert.labels,
        alert.annotations,
        podContext
      )
 
      // Post to Slack with rich formatting
      await slack.chat.postMessage({
        channel,
        text: `🚨 Alert: ${alertname}`,
        blocks: [
          {
            type: 'header',
            text: { type: 'plain_text', text: `🚨 ${alertname}` }
          },
          {
            type: 'section',
            fields: [
              { type: 'mrkdwn', text: `*Namespace:*\n${namespace || 'N/A'}` },
              { type: 'mrkdwn', text: `*Pod:*\n${pod || 'N/A'}` },
              { type: 'mrkdwn', text: `*Severity:*\n${alert.labels.severity || 'unknown'}` },
              { type: 'mrkdwn', text: `*Started:*\n${new Date(alert.startsAt).toLocaleString()}` }
            ]
          },
          {
            type: 'divider'
          },
          {
            type: 'section',
            text: { type: 'mrkdwn', text: `*🤖 AI Analysis:*\n${analysis}` }
          },
          {
            type: 'context',
            elements: [
              { type: 'mrkdwn', text: `📊 <${alert.generatorURL}|View in Prometheus>` }
            ]
          }
        ]
      })
 
      console.log(`Alert ${alertname} sent to ${channel}`)
    } catch (err) {
      console.error('Error processing alert:', err)
    }
  }
})
 
app.get('/health', (req, res) => res.json({ status: 'ok' }))
 
app.listen(3000, () => console.log('Alert bot listening on :3000'))

Step 3: Deploy to Kubernetes

yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: k8s-alertbot
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: k8s-alertbot
  template:
    metadata:
      labels:
        app: k8s-alertbot
    spec:
      serviceAccountName: alertbot-sa
      containers:
      - name: alertbot
        image: your-registry/k8s-alertbot:latest
        ports:
        - containerPort: 3000
        env:
        - name: SLACK_BOT_TOKEN
          valueFrom:
            secretKeyRef:
              name: alertbot-secrets
              key: slack-token
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: alertbot-secrets
              key: anthropic-key
        - name: SLACK_CHANNEL
          value: "#ops-alerts"
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
 
---
apiVersion: v1
kind: Service
metadata:
  name: k8s-alertbot
  namespace: monitoring
spec:
  selector:
    app: k8s-alertbot
  ports:
  - port: 3000
    targetPort: 3000
 
---
# RBAC: bot needs to read pods
apiVersion: v1
kind: ServiceAccount
metadata:
  name: alertbot-sa
  namespace: monitoring
 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: alertbot-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "nodes", "events"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list"]
 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: alertbot-reader
subjects:
- kind: ServiceAccount
  name: alertbot-sa
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: alertbot-reader
  apiGroup: rbac.authorization.k8s.io
bash
# Create secrets
kubectl create secret generic alertbot-secrets \
  --from-literal=slack-token=xoxb-your-token \
  --from-literal=anthropic-key=sk-ant-your-key \
  -n monitoring
 
kubectl apply -f deployment.yaml

Step 4: Configure Alertmanager

yaml
# alertmanager.yaml
route:
  receiver: 'slack-bot'
  group_by: ['alertname', 'namespace']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
 
receivers:
- name: 'slack-bot'
  webhook_configs:
  - url: 'http://k8s-alertbot.monitoring:3000/webhook'
    send_resolved: false

Step 5: Build & Push Docker Image

dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY bot.js .
EXPOSE 3000
CMD ["node", "bot.js"]
bash
docker build -t your-registry/k8s-alertbot:latest .
docker push your-registry/k8s-alertbot:latest

Example Slack Output

When KubePodCrashLooping fires:

🚨 KubePodCrashLooping
Namespace: production    Pod: my-app-7d9f8c-xkp2q
Severity: critical       Started: Apr 12, 2026, 2:13 AM

🤖 AI Analysis:
**What happened**
The pod my-app-7d9f8c-xkp2q has restarted 8 times in the last 10 minutes, 
indicating the container is crashing immediately after starting.

**Most likely causes**
1. (70%) Application startup error — OOMKilled or config missing
2. (20%) Liveness probe too aggressive, killing healthy pods
3. (10%) Broken init container blocking main container

**Immediate fix commands**
kubectl logs my-app-7d9f8c-xkp2q -n production --previous
kubectl describe pod my-app-7d9f8c-xkp2q -n production
kubectl get events -n production --sort-by='.lastTimestamp'

**Escalate if**
Restart count > 15 or logs show panic/OOMKilled. Wake the on-call lead.

📊 View in Prometheus

Cost Estimate

  • Claude API: ~$0.002 per alert (800 tokens)
  • 100 alerts/day = $0.20/day = ~$6/month
  • Anthropic API free tier covers initial testing

Resources

This bot pays for itself the first time it prevents a 2am escalation by giving on-call engineers instant context.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments