All Articles

How to Use AI Agents to Automate Terraform Infrastructure Changes in 2026

AI agents can now plan, review, and apply Terraform changes from natural language. Here's how agentic AI is transforming infrastructure-as-code workflows.

DevOpsBoysMar 22, 20267 min read
Share:Tweet

Imagine typing "add a Redis cache to the staging environment with 2GB memory and private subnet access" and having an AI agent write the Terraform code, run the plan, get approval, and apply it — all while following your organization's security policies and naming conventions.

This isn't a demo. It's happening in production at organizations using agentic AI for infrastructure management. And it's changing how DevOps teams think about Terraform.

What Agentic Terraform Looks Like

Traditional Terraform workflow:

Engineer writes HCL → terraform plan → review → terraform apply → verify

Agentic Terraform workflow:

Engineer describes intent → AI agent writes HCL → agent runs plan →
agent checks policies → human approves → agent applies → agent verifies

The key difference: the engineer describes what they want, not how to build it. The AI agent handles the translation from intent to infrastructure code.

The Tools Making This Possible

1. Claude/GPT with Tool Use + Terraform CLI

The simplest approach: give an LLM access to the Terraform CLI and your codebase.

python
import anthropic
 
client = anthropic.Anthropic()
 
tools = [
    {
        "name": "read_terraform_file",
        "description": "Read a Terraform file from the codebase",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string", "description": "Path to .tf file"}
            },
            "required": ["file_path"]
        }
    },
    {
        "name": "write_terraform_file",
        "description": "Write or update a Terraform file",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["file_path", "content"]
        }
    },
    {
        "name": "terraform_plan",
        "description": "Run terraform plan and return the output",
        "input_schema": {
            "type": "object",
            "properties": {
                "working_dir": {"type": "string"}
            },
            "required": ["working_dir"]
        }
    },
    {
        "name": "terraform_apply",
        "description": "Run terraform apply (requires human approval)",
        "input_schema": {
            "type": "object",
            "properties": {
                "working_dir": {"type": "string"},
                "plan_file": {"type": "string"}
            },
            "required": ["working_dir", "plan_file"]
        }
    }
]
 
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=tools,
    messages=[{
        "role": "user",
        "content": "Add a Redis ElastiCache cluster to staging. 2GB, cache.r7g.large, private subnet, encryption at rest enabled."
    }],
    system="You are a Terraform infrastructure agent. You have access to read and write .tf files and run terraform commands. Follow the existing code style and naming conventions in the codebase."
)

2. Atlantis + AI Review

Atlantis already automates terraform plan on pull requests. Adding an AI review layer:

yaml
# atlantis.yaml with AI review
version: 3
projects:
  - name: staging
    dir: environments/staging
    workflow: ai-reviewed
    autoplan:
      when_modified: ["*.tf", "*.tfvars"]
 
workflows:
  ai-reviewed:
    plan:
      steps:
        - init
        - plan
        - run: |
            # Send plan output to AI for review
            terraform show -json $PLANFILE | \
            curl -X POST https://your-api.com/review-plan \
              -H "Content-Type: application/json" \
              -d @- | \
            tee plan-review.md
        - run: |
            # Post AI review as PR comment
            gh pr comment $PULL_NUM --body-file plan-review.md

The AI reviews the plan for:

  • Security issues (open security groups, unencrypted resources)
  • Cost implications (expensive instance types, over-provisioned resources)
  • Naming convention violations
  • Missing tags
  • Blast radius concerns (too many resources changing at once)

3. Spacelift AI Assist

Spacelift's built-in AI features can:

  • Generate Terraform from natural language descriptions
  • Review plans and flag risks
  • Suggest optimizations
  • Auto-remediate drift

4. env0 AI Terraform Generator

env0 offers AI-powered Terraform generation integrated into their IaC management platform, with policy enforcement and cost estimation built in.

Building Your Own Terraform AI Agent

Here's a practical architecture for a Terraform AI agent:

Architecture

┌─────────────────────────────────────────────┐
│                 Slack / Chat                  │
│     "Add Redis to staging, 2GB, encrypted"   │
└──────────────────┬──────────────────────────┘
                   │
┌──────────────────▼──────────────────────────┐
│              Agent Orchestrator              │
│  1. Parse intent                            │
│  2. Read existing Terraform                 │
│  3. Generate new HCL                        │
│  4. Run terraform plan                      │
│  5. Check OPA policies                      │
│  6. Request human approval                  │
│  7. Apply if approved                       │
│  8. Verify deployment                       │
└──────────────────┬──────────────────────────┘
                   │
┌─────────┬────────┼────────┬─────────────────┐
│ Codebase│ TF CLI │ OPA    │ Cloud APIs      │
│ (Git)   │        │Policies│ (AWS/GCP/Azure) │
└─────────┴────────┴────────┴─────────────────┘

The Agent Loop

python
import subprocess
import json
 
class TerraformAgent:
    def __init__(self, workspace_dir, llm_client):
        self.workspace_dir = workspace_dir
        self.llm = llm_client
 
    def handle_request(self, user_request):
        # Step 1: Understand existing infrastructure
        existing_tf = self.read_existing_terraform()
 
        # Step 2: Generate Terraform code
        new_code = self.generate_terraform(user_request, existing_tf)
 
        # Step 3: Write to file
        self.write_terraform(new_code)
 
        # Step 4: Run terraform plan
        plan_output = self.terraform_plan()
 
        # Step 5: AI reviews the plan
        review = self.review_plan(plan_output, user_request)
 
        # Step 6: Check policies
        policy_result = self.check_opa_policies(plan_output)
 
        # Step 7: Present to human for approval
        approval = self.request_approval(
            plan=plan_output,
            review=review,
            policy=policy_result
        )
 
        if approval:
            # Step 8: Apply
            result = self.terraform_apply()
            # Step 9: Verify
            self.verify_deployment()
            return f"Infrastructure updated: {result}"
        else:
            self.rollback_code_changes()
            return "Changes cancelled by user"
 
    def terraform_plan(self):
        result = subprocess.run(
            ["terraform", "plan", "-out=tfplan", "-no-color"],
            cwd=self.workspace_dir,
            capture_output=True, text=True
        )
        return result.stdout
 
    def check_opa_policies(self, plan_output):
        # Convert plan to JSON
        subprocess.run(
            ["terraform", "show", "-json", "tfplan"],
            cwd=self.workspace_dir,
            capture_output=True, text=True
        )
        # Run OPA evaluation
        result = subprocess.run(
            ["opa", "eval", "-d", "policies/", "-i", "tfplan.json",
             "data.terraform.deny"],
            capture_output=True, text=True
        )
        return json.loads(result.stdout)

OPA Guardrail Policies

The agent needs guardrails. Use OPA policies to prevent dangerous changes:

rego
# policies/terraform.rego
package terraform
 
# Deny public S3 buckets
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    resource.change.after.acl == "public-read"
    msg := sprintf("Public S3 bucket not allowed: %s", [resource.address])
}
 
# Deny overly permissive security groups
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_security_group_rule"
    resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
    resource.change.after.type == "ingress"
    msg := sprintf("Open ingress rule not allowed: %s", [resource.address])
}
 
# Deny expensive instance types without approval tag
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_instance"
    expensive := {"x1", "p4", "p5", "dl1", "trn1"}
    instance_family := split(resource.change.after.instance_type, ".")[0]
    expensive[instance_family]
    not resource.change.after.tags.cost_approved
    msg := sprintf("Expensive instance %s requires cost_approved tag: %s",
                   [resource.change.after.instance_type, resource.address])
}
 
# Limit blast radius — max 10 resources per apply
deny[msg] {
    changes := [r | r := input.resource_changes[_]; r.change.actions != ["no-op"]]
    count(changes) > 10
    msg := sprintf("Too many changes (%d). Split into smaller applies.", [count(changes)])
}

Safety Guardrails for AI Terraform

This is infrastructure. Mistakes can take down production. Essential guardrails:

1. Never Auto-Apply to Production

python
def request_approval(self, plan, review, policy):
    if self.environment == "production":
        # Always require human approval for production
        return self.wait_for_human_approval(plan, review, policy)
    elif self.environment == "staging":
        # Auto-apply if policies pass and no destructive changes
        if not policy["violations"] and not self.has_destructive_changes(plan):
            return True
        return self.wait_for_human_approval(plan, review, policy)
    else:  # dev
        # Auto-apply if policies pass
        return not policy["violations"]

2. Destructive Change Detection

python
def has_destructive_changes(self, plan_json):
    """Flag any destroy or replace actions"""
    for resource in plan_json.get("resource_changes", []):
        actions = resource.get("change", {}).get("actions", [])
        if "delete" in actions or "replace" in actions:
            return True
    return False

3. Cost Estimation Before Apply

bash
# Use Infracost to estimate cost impact
infracost diff --path . --format json | jq '.totalMonthlyCost'

4. Automatic Rollback

python
def apply_with_rollback(self):
    # Save state before apply
    subprocess.run(["terraform", "state", "pull"], capture_output=True)
 
    result = subprocess.run(
        ["terraform", "apply", "tfplan"],
        cwd=self.workspace_dir,
        capture_output=True, text=True
    )
 
    if result.returncode != 0:
        # Apply failed — state is unchanged, report error
        return {"success": False, "error": result.stderr}
 
    # Verify deployment
    if not self.verify_deployment():
        # Deployment verification failed — destroy new resources
        self.terraform_destroy_new_resources()
        return {"success": False, "error": "Deployment verification failed"}
 
    return {"success": True}

What This Means for DevOps Engineers

AI agents writing Terraform doesn't eliminate the DevOps engineer. It changes the job:

BeforeAfter
Write HCL for every changeDefine policies and guardrails
Review every PR manuallyReview AI-generated plans for edge cases
Debug syntax errorsDesign infrastructure patterns the AI follows
Copy-paste modulesBuild reusable modules the AI composes
Respond to infra ticketsSet up self-service with AI agent

The engineer becomes the architect and guardrail designer, not the code writer.

Getting Started Today

  1. Start with plan review — use AI to review terraform plan output in PRs. No risk, immediate value.

  2. Add policy checks — define OPA policies for your security and cost requirements.

  3. Enable generation for dev — let the AI generate Terraform for development environments. Low risk, fast iteration.

  4. Expand to staging — add human approval gates and blast radius limits.

  5. Production last — only after months of proven reliability in lower environments.

Wrapping Up

AI agents writing Terraform isn't science fiction — it's happening now. The combination of LLMs with tool use, policy engines like OPA, and mature Terraform tooling makes it practical and safe.

The key is guardrails. Never give an AI agent unrestricted access to production infrastructure. Always have policy checks, human approval gates, and automatic rollback.

Start with AI-powered plan review in your PRs. That alone will catch security issues and cost surprises that humans miss.

Want to master Terraform, IaC best practices, and infrastructure automation? The KodeKloud Terraform course covers everything from basics to advanced patterns with hands-on labs. For cloud infrastructure to practice Terraform, DigitalOcean has a great Terraform provider and predictable pricing.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments