How to Use AI Agents to Automate Terraform Infrastructure Changes in 2026
AI agents can now plan, review, and apply Terraform changes from natural language. Here's how agentic AI is transforming infrastructure-as-code workflows.
Imagine typing "add a Redis cache to the staging environment with 2GB memory and private subnet access" and having an AI agent write the Terraform code, run the plan, get approval, and apply it — all while following your organization's security policies and naming conventions.
This isn't a demo. It's happening in production at organizations using agentic AI for infrastructure management. And it's changing how DevOps teams think about Terraform.
What Agentic Terraform Looks Like
Traditional Terraform workflow:
Engineer writes HCL → terraform plan → review → terraform apply → verify
Agentic Terraform workflow:
Engineer describes intent → AI agent writes HCL → agent runs plan →
agent checks policies → human approves → agent applies → agent verifies
The key difference: the engineer describes what they want, not how to build it. The AI agent handles the translation from intent to infrastructure code.
The Tools Making This Possible
1. Claude/GPT with Tool Use + Terraform CLI
The simplest approach: give an LLM access to the Terraform CLI and your codebase.
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "read_terraform_file",
"description": "Read a Terraform file from the codebase",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string", "description": "Path to .tf file"}
},
"required": ["file_path"]
}
},
{
"name": "write_terraform_file",
"description": "Write or update a Terraform file",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string"},
"content": {"type": "string"}
},
"required": ["file_path", "content"]
}
},
{
"name": "terraform_plan",
"description": "Run terraform plan and return the output",
"input_schema": {
"type": "object",
"properties": {
"working_dir": {"type": "string"}
},
"required": ["working_dir"]
}
},
{
"name": "terraform_apply",
"description": "Run terraform apply (requires human approval)",
"input_schema": {
"type": "object",
"properties": {
"working_dir": {"type": "string"},
"plan_file": {"type": "string"}
},
"required": ["working_dir", "plan_file"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=[{
"role": "user",
"content": "Add a Redis ElastiCache cluster to staging. 2GB, cache.r7g.large, private subnet, encryption at rest enabled."
}],
system="You are a Terraform infrastructure agent. You have access to read and write .tf files and run terraform commands. Follow the existing code style and naming conventions in the codebase."
)2. Atlantis + AI Review
Atlantis already automates terraform plan on pull requests. Adding an AI review layer:
# atlantis.yaml with AI review
version: 3
projects:
- name: staging
dir: environments/staging
workflow: ai-reviewed
autoplan:
when_modified: ["*.tf", "*.tfvars"]
workflows:
ai-reviewed:
plan:
steps:
- init
- plan
- run: |
# Send plan output to AI for review
terraform show -json $PLANFILE | \
curl -X POST https://your-api.com/review-plan \
-H "Content-Type: application/json" \
-d @- | \
tee plan-review.md
- run: |
# Post AI review as PR comment
gh pr comment $PULL_NUM --body-file plan-review.mdThe AI reviews the plan for:
- Security issues (open security groups, unencrypted resources)
- Cost implications (expensive instance types, over-provisioned resources)
- Naming convention violations
- Missing tags
- Blast radius concerns (too many resources changing at once)
3. Spacelift AI Assist
Spacelift's built-in AI features can:
- Generate Terraform from natural language descriptions
- Review plans and flag risks
- Suggest optimizations
- Auto-remediate drift
4. env0 AI Terraform Generator
env0 offers AI-powered Terraform generation integrated into their IaC management platform, with policy enforcement and cost estimation built in.
Building Your Own Terraform AI Agent
Here's a practical architecture for a Terraform AI agent:
Architecture
┌─────────────────────────────────────────────┐
│ Slack / Chat │
│ "Add Redis to staging, 2GB, encrypted" │
└──────────────────┬──────────────────────────┘
│
┌──────────────────▼──────────────────────────┐
│ Agent Orchestrator │
│ 1. Parse intent │
│ 2. Read existing Terraform │
│ 3. Generate new HCL │
│ 4. Run terraform plan │
│ 5. Check OPA policies │
│ 6. Request human approval │
│ 7. Apply if approved │
│ 8. Verify deployment │
└──────────────────┬──────────────────────────┘
│
┌─────────┬────────┼────────┬─────────────────┐
│ Codebase│ TF CLI │ OPA │ Cloud APIs │
│ (Git) │ │Policies│ (AWS/GCP/Azure) │
└─────────┴────────┴────────┴─────────────────┘
The Agent Loop
import subprocess
import json
class TerraformAgent:
def __init__(self, workspace_dir, llm_client):
self.workspace_dir = workspace_dir
self.llm = llm_client
def handle_request(self, user_request):
# Step 1: Understand existing infrastructure
existing_tf = self.read_existing_terraform()
# Step 2: Generate Terraform code
new_code = self.generate_terraform(user_request, existing_tf)
# Step 3: Write to file
self.write_terraform(new_code)
# Step 4: Run terraform plan
plan_output = self.terraform_plan()
# Step 5: AI reviews the plan
review = self.review_plan(plan_output, user_request)
# Step 6: Check policies
policy_result = self.check_opa_policies(plan_output)
# Step 7: Present to human for approval
approval = self.request_approval(
plan=plan_output,
review=review,
policy=policy_result
)
if approval:
# Step 8: Apply
result = self.terraform_apply()
# Step 9: Verify
self.verify_deployment()
return f"Infrastructure updated: {result}"
else:
self.rollback_code_changes()
return "Changes cancelled by user"
def terraform_plan(self):
result = subprocess.run(
["terraform", "plan", "-out=tfplan", "-no-color"],
cwd=self.workspace_dir,
capture_output=True, text=True
)
return result.stdout
def check_opa_policies(self, plan_output):
# Convert plan to JSON
subprocess.run(
["terraform", "show", "-json", "tfplan"],
cwd=self.workspace_dir,
capture_output=True, text=True
)
# Run OPA evaluation
result = subprocess.run(
["opa", "eval", "-d", "policies/", "-i", "tfplan.json",
"data.terraform.deny"],
capture_output=True, text=True
)
return json.loads(result.stdout)OPA Guardrail Policies
The agent needs guardrails. Use OPA policies to prevent dangerous changes:
# policies/terraform.rego
package terraform
# Deny public S3 buckets
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.change.after.acl == "public-read"
msg := sprintf("Public S3 bucket not allowed: %s", [resource.address])
}
# Deny overly permissive security groups
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
resource.change.after.type == "ingress"
msg := sprintf("Open ingress rule not allowed: %s", [resource.address])
}
# Deny expensive instance types without approval tag
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
expensive := {"x1", "p4", "p5", "dl1", "trn1"}
instance_family := split(resource.change.after.instance_type, ".")[0]
expensive[instance_family]
not resource.change.after.tags.cost_approved
msg := sprintf("Expensive instance %s requires cost_approved tag: %s",
[resource.change.after.instance_type, resource.address])
}
# Limit blast radius — max 10 resources per apply
deny[msg] {
changes := [r | r := input.resource_changes[_]; r.change.actions != ["no-op"]]
count(changes) > 10
msg := sprintf("Too many changes (%d). Split into smaller applies.", [count(changes)])
}Safety Guardrails for AI Terraform
This is infrastructure. Mistakes can take down production. Essential guardrails:
1. Never Auto-Apply to Production
def request_approval(self, plan, review, policy):
if self.environment == "production":
# Always require human approval for production
return self.wait_for_human_approval(plan, review, policy)
elif self.environment == "staging":
# Auto-apply if policies pass and no destructive changes
if not policy["violations"] and not self.has_destructive_changes(plan):
return True
return self.wait_for_human_approval(plan, review, policy)
else: # dev
# Auto-apply if policies pass
return not policy["violations"]2. Destructive Change Detection
def has_destructive_changes(self, plan_json):
"""Flag any destroy or replace actions"""
for resource in plan_json.get("resource_changes", []):
actions = resource.get("change", {}).get("actions", [])
if "delete" in actions or "replace" in actions:
return True
return False3. Cost Estimation Before Apply
# Use Infracost to estimate cost impact
infracost diff --path . --format json | jq '.totalMonthlyCost'4. Automatic Rollback
def apply_with_rollback(self):
# Save state before apply
subprocess.run(["terraform", "state", "pull"], capture_output=True)
result = subprocess.run(
["terraform", "apply", "tfplan"],
cwd=self.workspace_dir,
capture_output=True, text=True
)
if result.returncode != 0:
# Apply failed — state is unchanged, report error
return {"success": False, "error": result.stderr}
# Verify deployment
if not self.verify_deployment():
# Deployment verification failed — destroy new resources
self.terraform_destroy_new_resources()
return {"success": False, "error": "Deployment verification failed"}
return {"success": True}What This Means for DevOps Engineers
AI agents writing Terraform doesn't eliminate the DevOps engineer. It changes the job:
| Before | After |
|---|---|
| Write HCL for every change | Define policies and guardrails |
| Review every PR manually | Review AI-generated plans for edge cases |
| Debug syntax errors | Design infrastructure patterns the AI follows |
| Copy-paste modules | Build reusable modules the AI composes |
| Respond to infra tickets | Set up self-service with AI agent |
The engineer becomes the architect and guardrail designer, not the code writer.
Getting Started Today
-
Start with plan review — use AI to review
terraform planoutput in PRs. No risk, immediate value. -
Add policy checks — define OPA policies for your security and cost requirements.
-
Enable generation for dev — let the AI generate Terraform for development environments. Low risk, fast iteration.
-
Expand to staging — add human approval gates and blast radius limits.
-
Production last — only after months of proven reliability in lower environments.
Wrapping Up
AI agents writing Terraform isn't science fiction — it's happening now. The combination of LLMs with tool use, policy engines like OPA, and mature Terraform tooling makes it practical and safe.
The key is guardrails. Never give an AI agent unrestricted access to production infrastructure. Always have policy checks, human approval gates, and automatic rollback.
Start with AI-powered plan review in your PRs. That alone will catch security issues and cost surprises that humans miss.
Want to master Terraform, IaC best practices, and infrastructure automation? The KodeKloud Terraform course covers everything from basics to advanced patterns with hands-on labs. For cloud infrastructure to practice Terraform, DigitalOcean has a great Terraform provider and predictable pricing.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Why Agentic AI Will Kill the Traditional On-Call Rotation by 2028
60% of enterprises now use AIOps self-healing. 83% of alerts auto-resolve without humans. The era of 2 AM PagerDuty wake-ups is ending. Here's what replaces it.
Agentic SRE Will Replace Traditional Incident Response by 2028
AI agents are moving beyond alerting into autonomous incident detection, root cause analysis, and remediation. Here's why Agentic SRE will fundamentally change how we handle production incidents.
AI Agents for Automated Terraform Code Review — The Future of IaC Quality
How AI agents are automating Terraform code review with security scanning, cost estimation, best practice enforcement, and drift prevention. Covers practical tools, custom LLM pipelines, and CI/CD integration.