Build a DevOps Automation Bot with LLM Function Calling (2026)

Use Claude or GPT-4o function calling to build a DevOps bot that can check pod status, scale deployments, query logs, and trigger pipelines — all from plain English commands in Slack or terminal.

Instead of teaching your team 50 kubectl commands, what if they could just ask: "Scale the payments service to 5 replicas" or "Why is the checkout pod crashing?" and an AI does it?

This guide builds a DevOps automation bot using LLM function calling — where the model decides which real infrastructure action to take based on your natural language request.

How Function Calling Works

Function calling (also called "tool use") lets you give an LLM a set of functions it can invoke. The model reads your message, decides which function to call, and returns structured arguments — you execute the function and optionally feed the result back.

User: "How many replicas does the payments deployment have?"
    │
    ▼
LLM decides: call get_deployment_info(namespace="default", name="payments")
    │
    ▼
Your code runs: kubectl get deployment payments -o json
    │
    ▼
Result fed back to LLM: {"replicas": 3, "available": 3, "image": "payments:v2.1"}
    │
    ▼
LLM responds: "The payments deployment has 3 replicas, all available. Running image payments:v2.1."

The LLM never directly touches your infrastructure — it tells your code what to run, your code runs it safely.

Project Setup

bash

mkdir devops-bot && cd devops-bot
pip install anthropic kubernetes python-dotenv rich

devops-bot/
├── bot.py          # Main bot loop
├── tools.py        # DevOps tool implementations
├── k8s_client.py   # Kubernetes client wrapper
└── .env            # ANTHROPIC_API_KEY

Step 1: Define the DevOps Tools

python

# tools.py
from kubernetes import client, config
import subprocess
import json
 
# Load kubeconfig
try:
    config.load_incluster_config()   # Running inside k8s
except:
    config.load_kube_config()        # Running locally
 
apps_v1 = client.AppsV1Api()
core_v1 = client.CoreV1Api()
 
 
def get_deployment_info(namespace: str, name: str) -> dict:
    """Get details about a Kubernetes deployment."""
    try:
        dep = apps_v1.read_namespaced_deployment(name=name, namespace=namespace)
        return {
            "name": dep.metadata.name,
            "namespace": dep.metadata.namespace,
            "replicas": dep.spec.replicas,
            "available_replicas": dep.status.available_replicas or 0,
            "ready_replicas": dep.status.ready_replicas or 0,
            "image": dep.spec.template.spec.containers[0].image,
            "labels": dep.metadata.labels,
        }
    except client.exceptions.ApiException as e:
        return {"error": f"Deployment not found: {e.reason}"}
 
 
def list_pods(namespace: str, label_selector: str = "") -> dict:
    """List pods in a namespace, optionally filtered by labels."""
    try:
        pods = core_v1.list_namespaced_pod(
            namespace=namespace,
            label_selector=label_selector
        )
        pod_list = []
        for pod in pods.items:
            pod_list.append({
                "name": pod.metadata.name,
                "status": pod.status.phase,
                "ready": all(
                    c.ready for c in (pod.status.container_statuses or [])
                ),
                "restarts": sum(
                    c.restart_count for c in (pod.status.container_statuses or [])
                ),
                "node": pod.spec.node_name,
            })
        return {"pods": pod_list, "count": len(pod_list)}
    except client.exceptions.ApiException as e:
        return {"error": str(e)}
 
 
def scale_deployment(namespace: str, name: str, replicas: int) -> dict:
    """Scale a deployment to the specified number of replicas."""
    if replicas < 0 or replicas > 50:
        return {"error": f"Replica count {replicas} is out of safe range (0-50)"}
    
    try:
        apps_v1.patch_namespaced_deployment_scale(
            name=name,
            namespace=namespace,
            body={"spec": {"replicas": replicas}}
        )
        return {
            "success": True,
            "message": f"Scaled {namespace}/{name} to {replicas} replicas"
        }
    except client.exceptions.ApiException as e:
        return {"error": str(e)}
 
 
def get_pod_logs(namespace: str, pod_name: str, tail_lines: int = 50) -> dict:
    """Get recent logs from a pod."""
    try:
        logs = core_v1.read_namespaced_pod_log(
            name=pod_name,
            namespace=namespace,
            tail_lines=tail_lines,
            timestamps=True
        )
        return {"logs": logs, "pod": pod_name}
    except client.exceptions.ApiException as e:
        return {"error": str(e)}
 
 
def get_pod_events(namespace: str, pod_name: str) -> dict:
    """Get Kubernetes events for a specific pod — useful for debugging crashes."""
    try:
        events = core_v1.list_namespaced_event(
            namespace=namespace,
            field_selector=f"involvedObject.name={pod_name}"
        )
        event_list = [
            {
                "type": e.type,
                "reason": e.reason,
                "message": e.message,
                "count": e.count,
                "last_time": str(e.last_timestamp),
            }
            for e in events.items
        ]
        return {"events": event_list}
    except client.exceptions.ApiException as e:
        return {"error": str(e)}
 
 
def list_deployments(namespace: str) -> dict:
    """List all deployments in a namespace."""
    try:
        deps = apps_v1.list_namespaced_deployment(namespace=namespace)
        return {
            "deployments": [
                {
                    "name": d.metadata.name,
                    "replicas": d.spec.replicas,
                    "available": d.status.available_replicas or 0,
                    "image": d.spec.template.spec.containers[0].image,
                }
                for d in deps.items
            ]
        }
    except client.exceptions.ApiException as e:
        return {"error": str(e)}
 
 
# Map function names to actual functions
TOOL_FUNCTIONS = {
    "get_deployment_info": get_deployment_info,
    "list_pods": list_pods,
    "scale_deployment": scale_deployment,
    "get_pod_logs": get_pod_logs,
    "get_pod_events": get_pod_events,
    "list_deployments": list_deployments,
}

Step 2: Define Tools for the LLM

python

# tool_definitions.py — what we tell Claude about our tools
 
TOOLS = [
    {
        "name": "get_deployment_info",
        "description": "Get details about a specific Kubernetes deployment including replica count, available replicas, and current image.",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string", "description": "Kubernetes namespace"},
                "name": {"type": "string", "description": "Deployment name"}
            },
            "required": ["namespace", "name"]
        }
    },
    {
        "name": "list_pods",
        "description": "List pods in a Kubernetes namespace. Use label_selector to filter (e.g. 'app=payments').",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string"},
                "label_selector": {"type": "string", "description": "Optional label selector like 'app=myapp'"}
            },
            "required": ["namespace"]
        }
    },
    {
        "name": "scale_deployment",
        "description": "Scale a Kubernetes deployment to a specified number of replicas.",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string"},
                "name": {"type": "string"},
                "replicas": {"type": "integer", "description": "Target replica count (0-50)"}
            },
            "required": ["namespace", "name", "replicas"]
        }
    },
    {
        "name": "get_pod_logs",
        "description": "Get recent logs from a specific pod. Use when debugging errors or crashes.",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string"},
                "pod_name": {"type": "string"},
                "tail_lines": {"type": "integer", "default": 50, "description": "Number of log lines to return"}
            },
            "required": ["namespace", "pod_name"]
        }
    },
    {
        "name": "get_pod_events",
        "description": "Get Kubernetes events for a pod. Essential for diagnosing CrashLoopBackOff, OOMKilled, and scheduling failures.",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string"},
                "pod_name": {"type": "string"}
            },
            "required": ["namespace", "pod_name"]
        }
    },
    {
        "name": "list_deployments",
        "description": "List all deployments in a Kubernetes namespace with their current status.",
        "input_schema": {
            "type": "object",
            "properties": {
                "namespace": {"type": "string"}
            },
            "required": ["namespace"]
        }
    }
]

Step 3: The Bot Main Loop

python

# bot.py
import anthropic
import json
from rich.console import Console
from rich.markdown import Markdown
from tools import TOOL_FUNCTIONS
from tool_definitions import TOOLS
 
console = Console()
claude = anthropic.Anthropic()
 
SYSTEM_PROMPT = """You are a DevOps assistant with access to Kubernetes cluster tools.
When users ask about deployments, pods, or logs — use your tools to get real data, 
then explain it clearly. When scaling or modifying resources, confirm what you're 
about to do before executing. Default namespace is 'default' unless specified."""
 
 
def run_tool(tool_name: str, tool_input: dict) -> str:
    """Execute a tool and return the result as a string."""
    if tool_name not in TOOL_FUNCTIONS:
        return json.dumps({"error": f"Unknown tool: {tool_name}"})
    
    console.print(f"[dim]→ Running: {tool_name}({tool_input})[/dim]")
    
    result = TOOL_FUNCTIONS[tool_name](**tool_input)
    return json.dumps(result, indent=2, default=str)
 
 
def chat(messages: list) -> str:
    """Send messages to Claude and handle tool use in a loop."""
    while True:
        response = claude.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages,
        )
        
        # If no tool use, return the text response
        if response.stop_reason == "end_turn":
            text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    text += block.text
            return text
        
        # Handle tool use
        if response.stop_reason == "tool_use":
            # Add assistant's response to messages
            messages.append({"role": "assistant", "content": response.content})
            
            # Execute each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = run_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
            
            # Add tool results to messages
            messages.append({"role": "user", "content": tool_results})
            # Continue the loop — Claude will now respond with final answer
 
 
def main():
    console.print("[bold cyan]DevOps Bot[/bold cyan] — Ask me about your Kubernetes cluster")
    console.print("Type 'exit' to quit\n")
    
    conversation = []
    
    while True:
        user_input = console.input("[bold green]You:[/bold green] ").strip()
        
        if user_input.lower() in ("exit", "quit"):
            break
        
        if not user_input:
            continue
        
        conversation.append({"role": "user", "content": user_input})
        
        response = chat(conversation)
        console.print("\n[bold blue]Bot:[/bold blue]")
        console.print(Markdown(response))
        console.print()
        
        conversation.append({"role": "assistant", "content": response})
 
 
if __name__ == "__main__":
    main()

Step 4: Run It

bash

export ANTHROPIC_API_KEY=sk-ant-your-key
python bot.py

DevOps Bot — Ask me about your Kubernetes cluster

You: What's running in the production namespace?
→ Running: list_deployments({"namespace": "production"})

Bot:
Here's what's running in the **production** namespace:

| Deployment | Replicas | Available | Image |
|-----------|----------|-----------|-------|
| api-server | 3 | 3 | api:v2.4.1 |
| payments | 2 | 2 | payments:v1.9.0 |
| frontend | 4 | 3 | frontend:v3.1.2 |

⚠️ **frontend** has only 3/4 replicas available — one pod may be unhealthy.

You: Why is the frontend pod unhealthy?
→ Running: list_pods({"namespace": "production", "label_selector": "app=frontend"})
→ Running: get_pod_events({"namespace": "production", "pod_name": "frontend-7d9f8b-xk2p"})

Bot:
The pod **frontend-7d9f8b-xk2p** is in **CrashLoopBackOff** with 8 restarts.

Events show:
- `OOMKilled` — the container was killed because it exceeded its memory limit (256Mi)

**Recommendation:** The frontend container needs more memory. You can scale vertically 
by increasing the memory limit in the deployment spec, or check for a memory leak in 
the recent frontend:v3.1.2 release.

You: Scale payments to 4 replicas
→ Running: scale_deployment({"namespace": "production", "name": "payments", "replicas": 4})

Bot:
Done. The **payments** deployment has been scaled from 2 → **4 replicas** in the 
production namespace. New pods will be scheduled momentarily.

Add Safety Guardrails

For production use, add confirmation for destructive actions:

python

DESTRUCTIVE_TOOLS = {"scale_deployment", "delete_pod", "restart_deployment"}
 
def run_tool_with_confirmation(tool_name: str, tool_input: dict) -> str:
    if tool_name in DESTRUCTIVE_TOOLS:
        console.print(f"\n[yellow]⚠ About to run: {tool_name}({tool_input})[/yellow]")
        confirm = console.input("Confirm? (yes/no): ").strip().lower()
        if confirm != "yes":
            return json.dumps({"cancelled": "User cancelled the action"})
    
    return run_tool(tool_name, tool_input)

Deploy as a Slack Bot

To expose this as a Slack bot, wrap it in a FastAPI endpoint:

python

from fastapi import FastAPI, Request
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.fastapi.async_handler import AsyncSlackRequestHandler
 
slack_app = AsyncApp(token=os.environ["SLACK_BOT_TOKEN"])
 
@slack_app.message("")
async def handle_message(message, say):
    user_text = message.get("text", "")
    conversation = [{"role": "user", "content": user_text}]
    response = chat(conversation)
    await say(response)

Now your team can ask Kubernetes questions directly in Slack #devops channel.

Extend with More Tools

python

# Easy to add more tools:
def trigger_github_actions_workflow(repo: str, workflow: str, ref: str = "main") -> dict:
    """Trigger a GitHub Actions workflow via API."""
    ...
 
def get_cloudwatch_metrics(service: str, metric: str, period_minutes: int = 30) -> dict:
    """Fetch AWS CloudWatch metrics for a service."""
    ...
 
def get_recent_deployments(namespace: str, count: int = 5) -> dict:
    """Get the last N deployments across all services."""
    ...

Each new function you add expands what your bot can do without changing any of the LLM logic.

For deeper learning on building AI agents with tool use, the Anthropic documentation on tool use is excellent. For production-grade agent frameworks, LangChain and LlamaIndex build on the same patterns with more batteries included.

Function calling turns an LLM from a text generator into an actual operator that can read and modify your infrastructure — while keeping you in control of what actions are actually executed.

Build a DevOps Automation Bot with LLM Function Calling (2026)

How Function Calling Works

Project Setup

Step 1: Define the DevOps Tools

Step 2: Define Tools for the LLM

Step 3: The Bot Main Loop

Step 4: Run It

Add Safety Guardrails

Deploy as a Slack Bot

Extend with More Tools

Stay ahead of the curve

Related Articles

AI-Driven Capacity Planning for Kubernetes Clusters (2026)

AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds

Argo Rollouts vs Flagger — Which Canary Deployment Tool Should You Use? (2026)

Comments