Build an AI-Powered Terraform Drift Detection System

Terraform drift happens silently. Here's how to build an automated drift detector using Terraform plan + Claude API that alerts your team and explains exactly what changed.

Infrastructure drift is when your actual cloud resources no longer match your Terraform state. Someone manually changed a security group rule. An auto-scaling policy got updated through the console. A team member ran terraform apply without committing the plan.

The problem is silent until it breaks something. Here's how to build a system that detects drift automatically, analyzes it with AI, and alerts your team with a clear explanation.

What You'll Build

Scheduled Job (every 6h)
        ↓
terraform plan -detailed-exitcode
        ↓ (exit code 2 = drift detected)
Parse plan output (JSON)
        ↓
Claude API analyzes drift
        ↓
Slack alert with:
  - Resources drifted
  - What changed
  - Risk assessment
  - Recommended action

Step 1 — Set Up Terraform Plan to Detect Drift

bash

# Run plan and capture exit code
terraform plan -detailed-exitcode -out=plan.tfplan
EXIT_CODE=$?
 
# Exit codes:
# 0 = no changes (no drift)
# 1 = error
# 2 = changes detected (drift!)

Output the plan as JSON for programmatic parsing:

bash

terraform show -json plan.tfplan > plan.json

Step 2 — The Drift Detector Script

python

# drift_detector.py
import subprocess
import json
import os
import anthropic
import requests
from datetime import datetime
 
def run_terraform_plan(working_dir: str) -> tuple[int, str]:
    """Run terraform plan and return exit code + JSON output"""
    # Init first
    subprocess.run(["terraform", "init", "-input=false"], 
                   cwd=working_dir, capture_output=True)
    
    # Run plan
    result = subprocess.run(
        ["terraform", "plan", "-detailed-exitcode", "-out=plan.tfplan", "-input=false"],
        cwd=working_dir,
        capture_output=True,
        text=True
    )
    
    exit_code = result.returncode
    
    if exit_code == 1:
        return 1, result.stderr
    
    if exit_code == 2:
        # Get JSON output
        json_result = subprocess.run(
            ["terraform", "show", "-json", "plan.tfplan"],
            cwd=working_dir,
            capture_output=True,
            text=True
        )
        return 2, json_result.stdout
    
    return 0, ""
 
 
def extract_changes(plan_json: str) -> dict:
    """Extract meaningful changes from terraform plan JSON"""
    plan = json.loads(plan_json)
    
    changes = {
        "resource_changes": [],
        "summary": {
            "create": 0,
            "update": 0,
            "delete": 0,
            "replace": 0
        }
    }
    
    for change in plan.get("resource_changes", []):
        actions = change.get("change", {}).get("actions", [])
        
        if "no-op" in actions:
            continue
            
        resource_info = {
            "address": change.get("address"),
            "type": change.get("type"),
            "name": change.get("name"),
            "actions": actions,
            "before": change.get("change", {}).get("before"),
            "after": change.get("change", {}).get("after")
        }
        
        changes["resource_changes"].append(resource_info)
        
        for action in actions:
            if action in changes["summary"]:
                changes["summary"][action] += 1
    
    return changes
 
 
def analyze_drift_with_ai(changes: dict, environment: str) -> str:
    """Use Claude to analyze drift and generate a human-readable report"""
    client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    
    changes_text = json.dumps(changes, indent=2)
    
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""You are a DevOps engineer reviewing infrastructure drift in a {environment} environment.
 
Analyze these Terraform plan changes and provide:
1. A 2-sentence summary of what drifted
2. Risk level: LOW / MEDIUM / HIGH (with reason)
3. Most likely cause of the drift
4. Recommended immediate action
 
Keep the response concise and actionable. Format for Slack (use *bold* and bullet points).
 
Changes detected:
{changes_text}"""
            }
        ]
    )
    
    return message.content[0].text
 
 
def send_slack_alert(changes: dict, ai_analysis: str, environment: str, repo: str):
    """Send formatted Slack alert"""
    webhook_url = os.environ["SLACK_WEBHOOK_URL"]
    
    summary = changes["summary"]
    resource_list = "\n".join([
        f"• `{r['address']}` — {', '.join(r['actions'])}"
        for r in changes["resource_changes"][:10]  # Show first 10
    ])
    
    if len(changes["resource_changes"]) > 10:
        resource_list += f"\n_...and {len(changes['resource_changes']) - 10} more_"
    
    blocks = [
        {
            "type": "header",
            "text": {"type": "plain_text", "text": f"⚠️ Terraform Drift Detected — {environment.upper()}"}
        },
        {
            "type": "section",
            "fields": [
                {"type": "mrkdwn", "text": f"*Environment:*\n{environment}"},
                {"type": "mrkdwn", "text": f"*Repository:*\n{repo}"},
                {"type": "mrkdwn", "text": f"*Detected at:*\n{datetime.now().strftime('%Y-%m-%d %H:%M UTC')}"},
                {"type": "mrkdwn", "text": f"*Changes:*\n+{summary['create']} create  ~{summary['update']} update  -{summary['delete']} delete"},
            ]
        },
        {
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*Drifted Resources:*\n{resource_list}"}
        },
        {
            "type": "divider"
        },
        {
            "type": "section",
            "text": {"type": "mrkdwn", "text": f"*AI Analysis:*\n{ai_analysis}"}
        },
        {
            "type": "actions",
            "elements": [
                {
                    "type": "button",
                    "text": {"type": "plain_text", "text": "View Plan Details"},
                    "url": f"https://github.com/{repo}/actions"
                }
            ]
        }
    ]
    
    requests.post(webhook_url, json={"blocks": blocks})
    print("Slack alert sent")
 
 
def main():
    environment = os.environ.get("ENVIRONMENT", "production")
    terraform_dir = os.environ.get("TERRAFORM_DIR", ".")
    repo = os.environ.get("GITHUB_REPOSITORY", "myorg/infra")
    
    print(f"Checking drift in {environment}...")
    
    exit_code, output = run_terraform_plan(terraform_dir)
    
    if exit_code == 0:
        print("No drift detected. Infrastructure matches Terraform state.")
        return
    
    if exit_code == 1:
        print(f"Error running terraform plan: {output}")
        return
    
    # Drift detected!
    print("Drift detected! Analyzing...")
    changes = extract_changes(output)
    print(f"Found {len(changes['resource_changes'])} changed resources")
    
    ai_analysis = analyze_drift_with_ai(changes, environment)
    send_slack_alert(changes, ai_analysis, environment, repo)
    
    # Exit with code 1 to fail the CI job if desired
    # exit(1)
 
 
if __name__ == "__main__":
    main()

Step 3 — GitHub Actions Scheduled Job

yaml

# .github/workflows/drift-detection.yml
name: Terraform Drift Detection
 
on:
  schedule:
    - cron: '0 */6 * * *'   # Every 6 hours
  workflow_dispatch:          # Manual trigger
 
jobs:
  detect-drift:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: [staging, production]
    
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/terraform-readonly
          aws-region: ap-south-1
 
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.8.0
 
      - name: Run Drift Detection
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
          ENVIRONMENT: ${{ matrix.environment }}
          TERRAFORM_DIR: environments/${{ matrix.environment }}
          GITHUB_REPOSITORY: ${{ github.repository }}
          TF_BACKEND_CONFIG: ${{ secrets[format('{0}_TF_BACKEND', matrix.environment)] }}
        run: |
          pip install anthropic requests
          python scripts/drift_detector.py

Example Slack Alert

⚠️ Terraform Drift Detected — PRODUCTION

Environment: production          Repository: myorg/infra
Detected at: 2026-05-12 06:00 UTC    Changes: ~3 update

Drifted Resources:
• aws_security_group.api_sg — update
• aws_autoscaling_group.api_asg — update  
• aws_iam_role_policy.lambda_policy — update

AI Analysis:
*Security group and autoscaling configuration changed outside of Terraform.*
Risk level: *HIGH* — security group changes in production may open unintended access.

Likely cause: Manual changes made via AWS Console during an incident response.
Recommended action: Run `terraform plan` locally to review exact changes, then either
`terraform apply` to revert or update the Terraform code to match the intended state.

[View Plan Details]

Making It Smarter

Add auto-remediation for low-risk drift:

python

def auto_remediate(changes: dict, risk_level: str, terraform_dir: str):
    if risk_level == "LOW" and changes["summary"]["update"] <= 2:
        print("Low-risk drift. Auto-applying...")
        subprocess.run(
            ["terraform", "apply", "-auto-approve", "plan.tfplan"],
            cwd=terraform_dir
        )

Track drift history in S3:

python

def save_drift_report(changes: dict, environment: str):
    import boto3
    s3 = boto3.client('s3')
    key = f"drift-reports/{environment}/{datetime.now().isoformat()}.json"
    s3.put_object(
        Bucket="my-drift-reports",
        Key=key,
        Body=json.dumps(changes)
    )

Terraform drift is inevitable — people make emergency changes, auto-scaling updates launch configs, cloud providers modify defaults. The goal isn't zero drift; it's fast detection and clear resolution paths.

This system gives you both, for free, with AI-generated explanations that even non-Terraform engineers can understand.

For Terraform labs and certification prep, KodeKloud has hands-on Terraform courses from beginner to associate-level certification.

Build an AI-Powered Terraform Drift Detection System

What You'll Build

Step 1 — Set Up Terraform Plan to Detect Drift

Step 2 — The Drift Detector Script

Step 3 — GitHub Actions Scheduled Job

Example Slack Alert

Making It Smarter

Stay ahead of the curve

Related Articles

Build an AI Terraform Plan Reviewer with Claude API

AI Agents for Automated Terraform Code Review — The Future of IaC Quality

Ansible vs Terraform: Which One Should You Use? (2026)

Comments