🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Build an AI DevOps Daily Digest with Claude API

Build a Python script that collects pending PRs, firing Prometheus alerts, Kubernetes warnings, and failed CI jobs, then uses Claude API to generate a prioritized morning briefing posted to Slack.

DevOpsBoys6 min read
Share:Tweet

Every morning at standup someone asks "anything on fire?" and three people speak at once. This script fixes that — it collects all the important signals from your stack, sends them to Claude API, and posts a prioritized morning briefing to Slack before you pour your first coffee.

What the Digest Collects

  • Pending pull requests older than 24 hours (needs review)
  • Currently firing Prometheus alerts
  • Kubernetes Warning events from the last 8 hours
  • GitHub Actions jobs that failed overnight

Claude turns all of that into a prioritized 5-point briefing in plain English.

Prerequisites

bash
pip install anthropic requests slack-sdk

Environment variables needed:

GITHUB_TOKEN=ghp_...
GITHUB_REPO=org/repo-name
PROMETHEUS_URL=http://prometheus:9090
KUBECONFIG=/path/to/kubeconfig
SLACK_BOT_TOKEN=xoxb-...
SLACK_CHANNEL=#devops-morning
ANTHROPIC_API_KEY=sk-ant-...

The Script

python
#!/usr/bin/env python3
"""
DevOps Daily Digest — collects signals from your stack and posts
a Claude-generated morning briefing to Slack.
"""
 
import os
import json
import subprocess
from datetime import datetime, timedelta, timezone
 
import anthropic
import requests
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
 
 
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
GITHUB_REPO = os.environ["GITHUB_REPO"]
PROMETHEUS_URL = os.environ.get("PROMETHEUS_URL", "http://prometheus:9090")
SLACK_BOT_TOKEN = os.environ["SLACK_BOT_TOKEN"]
SLACK_CHANNEL = os.environ.get("SLACK_CHANNEL", "#devops-morning")
ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
 
 
def get_stale_prs():
    """PRs open for more than 24 hours with no review."""
    headers = {
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Accept": "application/vnd.github+json",
    }
    url = f"https://api.github.com/repos/{GITHUB_REPO}/pulls?state=open&per_page=50"
    resp = requests.get(url, headers=headers, timeout=10)
    resp.raise_for_status()
 
    cutoff = datetime.now(timezone.utc) - timedelta(hours=24)
    stale = []
    for pr in resp.json():
        created = datetime.fromisoformat(pr["created_at"].replace("Z", "+00:00"))
        if created < cutoff:
            stale.append({
                "title": pr["title"],
                "author": pr["user"]["login"],
                "url": pr["html_url"],
                "age_hours": int((datetime.now(timezone.utc) - created).total_seconds() / 3600),
                "labels": [l["name"] for l in pr.get("labels", [])],
            })
    return stale
 
 
def get_firing_alerts():
    """Prometheus alerts that are currently firing."""
    try:
        url = f"{PROMETHEUS_URL}/api/v1/alerts"
        resp = requests.get(url, timeout=10)
        resp.raise_for_status()
        alerts = resp.json().get("data", {}).get("alerts", [])
        firing = [
            {
                "name": a["labels"].get("alertname", "unknown"),
                "severity": a["labels"].get("severity", "unknown"),
                "namespace": a["labels"].get("namespace", ""),
                "summary": a.get("annotations", {}).get("summary", ""),
            }
            for a in alerts
            if a.get("state") == "firing"
        ]
        return firing
    except Exception as e:
        return [{"error": f"Could not reach Prometheus: {e}"}]
 
 
def get_k8s_warnings():
    """Recent Kubernetes Warning events."""
    try:
        result = subprocess.run(
            [
                "kubectl", "get", "events",
                "--all-namespaces",
                "--field-selector=type=Warning",
                "--sort-by=.lastTimestamp",
                "-o", "json",
            ],
            capture_output=True, text=True, timeout=15
        )
        if result.returncode != 0:
            return [{"error": result.stderr.strip()}]
 
        events = json.loads(result.stdout).get("items", [])
        cutoff = datetime.now(timezone.utc) - timedelta(hours=8)
        recent = []
        for ev in events:
            ts_raw = ev.get("lastTimestamp") or ev.get("eventTime", "")
            if not ts_raw:
                continue
            ts = datetime.fromisoformat(ts_raw.replace("Z", "+00:00"))
            if ts > cutoff:
                recent.append({
                    "namespace": ev["metadata"]["namespace"],
                    "reason": ev.get("reason", ""),
                    "message": ev.get("message", "")[:200],
                    "object": ev.get("involvedObject", {}).get("name", ""),
                    "count": ev.get("count", 1),
                })
        return recent[-20:]  # cap at 20 events
    except Exception as e:
        return [{"error": str(e)}]
 
 
def get_failed_ci_jobs():
    """GitHub Actions workflow runs that failed in the last 12 hours."""
    headers = {
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Accept": "application/vnd.github+json",
    }
    since = (datetime.now(timezone.utc) - timedelta(hours=12)).isoformat()
    url = (
        f"https://api.github.com/repos/{GITHUB_REPO}/actions/runs"
        f"?status=failure&created=>{since}&per_page=20"
    )
    try:
        resp = requests.get(url, headers=headers, timeout=10)
        resp.raise_for_status()
        runs = resp.json().get("workflow_runs", [])
        return [
            {
                "workflow": r["name"],
                "branch": r["head_branch"],
                "actor": r["triggering_actor"]["login"],
                "url": r["html_url"],
            }
            for r in runs
        ]
    except Exception as e:
        return [{"error": str(e)}]
 
 
def build_prompt(prs, alerts, k8s_events, failed_jobs):
    today = datetime.now().strftime("%A, %B %d %Y")
    return f"""You are a senior SRE writing a morning briefing for a DevOps team. Today is {today}.
 
Summarize the following data into a prioritized briefing. 
Use this format:
1. Start with a one-line status: CRITICAL / NEEDS ATTENTION / ALL CLEAR
2. List up to 5 bullet points, most urgent first
3. Each bullet: one sentence, actionable, include names/links where useful
4. End with one line: "Suggested first action: ..."
 
Keep it under 200 words. No headers, no markdown tables. Plain bullets only.
 
=== FIRING ALERTS ({len(alerts)}) ===
{json.dumps(alerts, indent=2)}
 
=== STALE PULL REQUESTS ({len(prs)}) ===
{json.dumps(prs, indent=2)}
 
=== KUBERNETES WARNINGS ({len(k8s_events)}) ===
{json.dumps(k8s_events, indent=2)}
 
=== FAILED CI JOBS ({len(failed_jobs)}) ===
{json.dumps(failed_jobs, indent=2)}
"""
 
 
def call_claude(prompt):
    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    return message.content[0].text
 
 
def post_to_slack(text):
    client = WebClient(token=SLACK_BOT_TOKEN)
    try:
        client.chat_postMessage(
            channel=SLACK_CHANNEL,
            text=f":morning: *DevOps Morning Briefing — {datetime.now().strftime('%b %d')}*\n\n{text}",
            unfurl_links=False,
        )
        print("Posted to Slack.")
    except SlackApiError as e:
        print(f"Slack error: {e.response['error']}")
 
 
def main():
    print("Collecting data...")
    prs = get_stale_prs()
    alerts = get_firing_alerts()
    k8s_events = get_k8s_warnings()
    failed_jobs = get_failed_ci_jobs()
 
    print(f"  PRs: {len(prs)}, Alerts: {len(alerts)}, K8s events: {len(k8s_events)}, Failed jobs: {len(failed_jobs)}")
 
    prompt = build_prompt(prs, alerts, k8s_events, failed_jobs)
 
    print("Calling Claude API...")
    briefing = call_claude(prompt)
    print("\n--- BRIEFING ---")
    print(briefing)
    print("----------------\n")
 
    post_to_slack(briefing)
 
 
if __name__ == "__main__":
    main()

Sample Output

When posted to Slack, the briefing looks like:

NEEDS ATTENTION

- auth-service pod is CrashLoopBackOff in production (OOMKilled x3 in last 8h)
  — check memory limits in values-prod.yaml
- 2 Prometheus alerts firing: HighMemoryUsage (namespace: payments) and
  PodRestartRateHigh (namespace: auth)
- PR "feat: add retry logic to payment processor" open 31h, no reviewer assigned
  — @team-lead please assign
- deploy-to-prod workflow failed on branch main at 03:14 IST — build step
  timeout, likely flaky test in integration suite
- 1 PR from @new-engineer open 26h with "needs-review" label

Suggested first action: Investigate auth-service OOMKilled — check kubectl top pod -n auth

GitHub Actions Cron Schedule

yaml
# .github/workflows/morning-digest.yml
name: DevOps Morning Digest
 
on:
  schedule:
    - cron: "30 2 * * 1-5"   # 8:00 AM IST (UTC+5:30), Mon-Fri
  workflow_dispatch:          # allow manual trigger
 
jobs:
  digest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
 
      - name: Install dependencies
        run: pip install anthropic requests slack-sdk
 
      - name: Set up kubectl
        uses: azure/setup-kubectl@v4
 
      - name: Configure kubeconfig
        run: |
          echo "${{ secrets.KUBECONFIG_B64 }}" | base64 -d > /tmp/kubeconfig
          echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
 
      - name: Run digest
        env:
          GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
          GITHUB_REPO: ${{ github.repository }}
          PROMETHEUS_URL: ${{ secrets.PROMETHEUS_URL }}
          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python digest.py

Store KUBECONFIG_B64 as base64 -w0 ~/.kube/config output in GitHub secrets.

Cost

Using claude-haiku-4-5-20251001 (the fastest, cheapest Claude model), each briefing costs roughly $0.001–0.003 depending on data size. Running every weekday morning = $0.06–0.15/month. Negligible.

Extending It

Add more data sources by writing a function that returns a list of dicts and dropping it into build_prompt. Ideas:

  • PagerDuty on-call schedule — who is the primary responder today?
  • Jira sprint board — which tickets are past due?
  • Cost anomalies — AWS Cost Explorer daily delta
  • Dependency CVEs — Snyk or Dependabot summary

The Claude prompt handles any JSON you throw at it — just label the section clearly.

Affiliate Tools

Sign up for the Anthropic API to get your API key. For Slack integration, the Slack API bot token setup takes about 5 minutes. For hosted Prometheus, Grafana Cloud has a generous free tier.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments