Build an AI-Powered Incident Report Generator with Claude API (2026)
Writing postmortems takes 2-3 hours. Here's how to build an AI tool that generates a structured incident report from Slack logs, metrics screenshots, and alert data in minutes.
Postmortems are valuable but writing them is painful. This tool takes raw incident data ā Slack thread, alert timeline, metrics ā and generates a structured postmortem draft in under 2 minutes.
What We're Building
An incident report generator that:
- Accepts: incident description, timeline of events, systems affected, impact
- Generates: structured postmortem with root cause analysis, timeline, action items
- Outputs: Markdown document ready for your wiki (Confluence, Notion, GitHub)
- API: FastAPI endpoint + simple web UI
Setup
mkdir incident-reporter && cd incident-reporter
pip install anthropic fastapi uvicorn python-multipart jinja2export ANTHROPIC_API_KEY=sk-ant-your-key-hereCore Generator
# generator.py
import anthropic
from dataclasses import dataclass
from typing import Optional
client = anthropic.Anthropic()
POSTMORTEM_SYSTEM_PROMPT = """You are a senior SRE writing a blameless postmortem report.
Your reports follow this structure:
1. **Incident Summary** ā 2-3 sentence overview
2. **Impact** ā who was affected, for how long, severity
3. **Timeline** ā chronological events with timestamps
4. **Root Cause** ā technical root cause, not blame
5. **Contributing Factors** ā what made this worse or harder to detect
6. **Resolution** ā what fixed it
7. **Action Items** ā specific, assigned, time-bound improvements
Rules:
- Blameless: focus on systems and processes, not individuals
- Specific: include exact error messages, metrics where provided
- Actionable: every problem identified must have a concrete action item
- Honest: if we don't know the root cause, say so clearly
Format as clean Markdown."""
@dataclass
class IncidentInput:
title: str
severity: str # P0/P1/P2/P3
start_time: str
end_time: str
affected_services: str
impact_description: str
timeline_notes: str
slack_thread: Optional[str] = None
alerts_fired: Optional[str] = None
metrics_summary: Optional[str] = None
fix_applied: Optional[str] = None
def generate_incident_report(incident: IncidentInput) -> str:
user_content = f"""Generate a postmortem for this incident:
**Title:** {incident.title}
**Severity:** {incident.severity}
**Duration:** {incident.start_time} ā {incident.end_time}
**Affected Services:** {incident.affected_services}
**Impact:**
{incident.impact_description}
**Timeline Notes:**
{incident.timeline_notes}
{"**Slack Thread:**" + chr(10) + incident.slack_thread if incident.slack_thread else ""}
{"**Alerts That Fired:**" + chr(10) + incident.alerts_fired if incident.alerts_fired else ""}
{"**Metrics Summary:**" + chr(10) + incident.metrics_summary if incident.metrics_summary else ""}
{"**Fix Applied:**" + chr(10) + incident.fix_applied if incident.fix_applied else ""}
Generate the complete postmortem following the required structure."""
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=2000,
system=POSTMORTEM_SYSTEM_PROMPT,
messages=[{"role": "user", "content": user_content}]
)
return response.content[0].textFastAPI Backend
# main.py
from fastapi import FastAPI, HTTPException
from fastapi.responses import HTMLResponse
from pydantic import BaseModel
from typing import Optional
from generator import generate_incident_report, IncidentInput
app = FastAPI(title="Incident Report Generator")
class IncidentRequest(BaseModel):
title: str
severity: str = "P2"
start_time: str
end_time: str
affected_services: str
impact_description: str
timeline_notes: str
slack_thread: Optional[str] = None
alerts_fired: Optional[str] = None
metrics_summary: Optional[str] = None
fix_applied: Optional[str] = None
@app.post("/generate")
async def generate_report(request: IncidentRequest):
try:
incident = IncidentInput(**request.model_dump())
report = generate_incident_report(incident)
return {"report": report, "word_count": len(report.split())}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/", response_class=HTMLResponse)
async def web_ui():
return """
<!DOCTYPE html>
<html>
<head>
<title>Incident Report Generator</title>
<style>
body { font-family: monospace; max-width: 900px; margin: 40px auto; padding: 20px; background: #0f0f0f; color: #e0e0e0; }
input, textarea, select { width: 100%; padding: 8px; margin: 4px 0 12px; background: #1a1a1a; color: #e0e0e0; border: 1px solid #333; border-radius: 4px; box-sizing: border-box; }
button { background: #7c3aed; color: white; padding: 12px 24px; border: none; border-radius: 4px; cursor: pointer; font-size: 16px; }
button:hover { background: #6d28d9; }
#result { margin-top: 20px; padding: 20px; background: #1a1a1a; border: 1px solid #333; border-radius: 4px; white-space: pre-wrap; display: none; }
label { color: #888; font-size: 12px; }
h1 { color: #7c3aed; }
</style>
</head>
<body>
<h1>ā” Incident Report Generator</h1>
<label>Incident Title</label>
<input id="title" placeholder="e.g. API Latency Spike ā Payment Service Degradation" />
<label>Severity</label>
<select id="severity">
<option value="P0">P0 ā Complete outage</option>
<option value="P1">P1 ā Major degradation</option>
<option value="P2" selected>P2 ā Partial impact</option>
<option value="P3">P3 ā Minor impact</option>
</select>
<label>Start Time</label>
<input id="start_time" placeholder="e.g. 2026-05-07 14:32 IST" />
<label>End Time</label>
<input id="end_time" placeholder="e.g. 2026-05-07 15:48 IST" />
<label>Affected Services</label>
<input id="affected_services" placeholder="e.g. payment-api, checkout-service, order-service" />
<label>Impact Description</label>
<textarea id="impact_description" rows="3" placeholder="e.g. 23% of payment attempts failed. ~4,200 users affected. Estimated ā¹8.5L in blocked transactions."></textarea>
<label>Timeline Notes (paste your notes)</label>
<textarea id="timeline_notes" rows="5" placeholder="14:32 - First alert fired 14:35 - Engineer paged 14:45 - Root cause identified as DB connection pool exhaustion 15:30 - Fix deployed 15:48 - Metrics normalized"></textarea>
<label>Slack Thread (optional ā paste key messages)</label>
<textarea id="slack_thread" rows="4" placeholder="Paste relevant Slack messages from the incident channel"></textarea>
<label>Alerts That Fired (optional)</label>
<textarea id="alerts_fired" rows="3" placeholder="e.g. PaymentAPILatencyHigh, DBConnectionPoolNearLimit, ErrorRateSpikePayment"></textarea>
<label>Fix Applied</label>
<textarea id="fix_applied" rows="3" placeholder="e.g. Increased DB connection pool from 50 to 200, added connection timeout of 5s, deployed at 15:30"></textarea>
<br/>
<button onclick="generate()">Generate Postmortem</button>
<div id="loading" style="display:none; margin-top:20px; color:#7c3aed;">Generating... (~30 seconds)</div>
<div id="result"></div>
<script>
async function generate() {
const data = {
title: document.getElementById('title').value,
severity: document.getElementById('severity').value,
start_time: document.getElementById('start_time').value,
end_time: document.getElementById('end_time').value,
affected_services: document.getElementById('affected_services').value,
impact_description: document.getElementById('impact_description').value,
timeline_notes: document.getElementById('timeline_notes').value,
slack_thread: document.getElementById('slack_thread').value,
alerts_fired: document.getElementById('alerts_fired').value,
fix_applied: document.getElementById('fix_applied').value,
};
document.getElementById('loading').style.display = 'block';
document.getElementById('result').style.display = 'none';
try {
const response = await fetch('/generate', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(data)
});
const result = await response.json();
document.getElementById('result').style.display = 'block';
document.getElementById('result').textContent = result.report;
} catch (e) {
document.getElementById('result').textContent = 'Error: ' + e.message;
document.getElementById('result').style.display = 'block';
} finally {
document.getElementById('loading').style.display = 'none';
}
}
</script>
</body>
</html>
"""
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)Run It
python main.py
# Open http://localhost:8080Slack Bot Integration
Make it trigger directly from Slack:
# slack_bot.py
from slack_bolt import App
from generator import generate_incident_report, IncidentInput
slack_app = App(token=os.environ["SLACK_BOT_TOKEN"])
@slack_app.command("/postmortem")
def handle_postmortem(ack, body, client):
ack()
# Open a modal to collect incident details
client.views_open(
trigger_id=body["trigger_id"],
view={
"type": "modal",
"callback_id": "postmortem_submit",
"title": {"type": "plain_text", "text": "Generate Postmortem"},
"submit": {"type": "plain_text", "text": "Generate"},
"blocks": [
{
"type": "input",
"block_id": "title",
"element": {"type": "plain_text_input", "action_id": "value"},
"label": {"type": "plain_text", "text": "Incident Title"}
},
# ... more fields
]
}
)Deploy to Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: incident-reporter
namespace: internal-tools
spec:
replicas: 1
selector:
matchLabels:
app: incident-reporter
template:
metadata:
labels:
app: incident-reporter
spec:
containers:
- name: app
image: your-registry/incident-reporter:latest
ports:
- containerPort: 8080
env:
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic-secret
key: api-key
resources:
limits:
cpu: 500m
memory: 512MiSample Output
Given minimal input, the generator produces:
# Postmortem: API Latency Spike ā Payment Service Degradation
**Severity:** P2 | **Duration:** 76 minutes | **Date:** 2026-05-07
## Incident Summary
The payment service experienced elevated latency and a 23% error rate between
14:32 and 15:48 IST on May 7, 2026, due to database connection pool exhaustion.
Approximately 4,200 users were affected, with an estimated ā¹8.5L in blocked
payment transactions.
## Impact
- **Users affected:** ~4,200
- **Error rate:** 23% of payment attempts failed
- **Duration:** 76 minutes
- **Business impact:** ~ā¹8.5L in blocked transactions
## Timeline
| Time (IST) | Event |
|---|---|
| 14:32 | PaymentAPILatencyHigh alert fired |
| 14:35 | On-call engineer paged |
...
## Root Cause
Database connection pool exhausted at 50 connections under increased load...
## Action Items
| Action | Owner | Due |
|---|---|---|
| Increase DB connection pool limit to 200 | Platform team | 2026-05-14 |
| Add connection pool monitoring alert | Observability team | 2026-05-14 |
...Writing postmortems that are actually useful is hard. AI handles the structure and boilerplate ā you focus on the insights and action items. Full source code available on GitHub.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam ā just practical engineering content.
Related Articles
Agentic SRE Will Replace Traditional Incident Response by 2028
AI agents are moving beyond alerting into autonomous incident detection, root cause analysis, and remediation. Here's why Agentic SRE will fundamentally change how we handle production incidents.
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection ā using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks ā catches what thresholds can't.
AI-Powered Log Analysis Is Replacing Manual Debugging in DevOps (2026)
How LLMs and AI are transforming log analysis, anomaly detection, and root cause analysis ā and the tools DevOps engineers should know about in 2026.