All Articles

Terraform Plan Succeeds But Apply Fails: How to Fix State Drift and Provider Errors

Your terraform plan looks clean but apply blows up? Here's how to fix provider conflicts, state drift, and dependency errors step by step.

DevOpsBoysMar 19, 20265 min read
Share:Tweet

You run terraform plan. It shows exactly what you expect. Green lights everywhere. You run terraform apply and boom — errors everywhere. Resources fail to create, dependencies break, and the state file is now inconsistent.

This is one of the most frustrating Terraform experiences because it feels like a lie. The plan said it would work. It didn't.

Let me walk you through why this happens and how to fix every common cause.

Why Plan Succeeds But Apply Fails

terraform plan is a prediction, not a guarantee. It compares your code against the state file and makes assumptions:

  • Resources it plans to create don't already exist
  • APIs will accept the configuration
  • Provider versions haven't changed behavior
  • No one else modified infrastructure between plan and apply

When any of these assumptions break, apply fails even though plan succeeded.

Cause 1 — State Drift (Most Common)

Someone changed infrastructure outside Terraform — through the console, CLI, or another tool. The state file doesn't know about these changes.

Detect State Drift

bash
terraform plan -refresh-only

This shows what changed in the real world vs what Terraform thinks exists:

Note: Objects have changed outside of Terraform

  # aws_security_group.main has been changed
  ~ resource "aws_security_group" "main" {
      ~ ingress {
          - cidr_blocks = ["10.0.0.0/16"]
          + cidr_blocks = ["0.0.0.0/0"]
        }
    }

Fix State Drift

Option 1: Accept the real-world state

bash
terraform apply -refresh-only

This updates the state file to match reality without making infrastructure changes.

Option 2: Override with your code

Run a normal terraform apply to force the infrastructure back to match your code.

Option 3: Import the resource

If a resource was created outside Terraform and now conflicts:

bash
terraform import aws_security_group.main sg-0123456789abcdef

Cause 2 — Provider Version Conflicts

You upgraded a provider and the new version handles resources differently.

Symptoms

Error: Provider produced inconsistent result after apply

When applying changes to aws_s3_bucket.main, provider "aws" produced
an unexpected new value: Root object was present, but now absent.

Fix

Pin your provider versions in versions.tf:

hcl
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40.0"  # Pin to patch releases
    }
  }
  required_version = ">= 1.7.0"
}

If you've already upgraded and things broke, check the provider changelog for breaking changes. Common offenders:

  • AWS provider v4 → v5: S3 bucket arguments moved to separate resources
  • Google provider v5 → v6: Several resource schemas changed
  • AzureRM v3 → v4: Massive refactoring of resource types

Roll Back a Provider

bash
# Check current version
terraform providers
 
# Edit version constraint, then:
terraform init -upgrade

Cause 3 — API Rate Limits and Timeouts

Cloud APIs have rate limits. When Terraform creates many resources simultaneously, it can hit throttling.

Symptoms

Error: creating EC2 Instance: RequestLimitExceeded: Request limit exceeded.
  status code: 503

Error: timeout while waiting for resource to be created (5m0s)

Fix — Add Parallelism Limits

bash
terraform apply -parallelism=5  # Default is 10

Or add retry logic in your provider config:

hcl
provider "aws" {
  region = "us-east-1"
 
  default_tags {
    tags = {
      ManagedBy = "terraform"
    }
  }
}

For specific resources that need more time:

hcl
resource "aws_rds_cluster" "main" {
  # ... config ...
 
  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }
}

Cause 4 — Resource Dependencies Not Declared

Terraform might try to create resources in the wrong order if implicit dependencies aren't detected.

Symptoms

Error: creating EKS Node Group: InvalidParameterException:
  Subnet subnet-xxx does not exist

The subnet is being created in the same apply, but the node group tries to use it before it's ready.

Fix — Explicit depends_on

hcl
resource "aws_eks_node_group" "workers" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "workers"
  subnet_ids      = aws_subnet.private[*].id
 
  depends_on = [
    aws_subnet.private,
    aws_iam_role_policy_attachment.node_policy,
  ]
}

Cause 5 — Resource Already Exists

You're trying to create a resource that already exists (created manually or by another Terraform workspace).

Symptoms

Error: creating S3 Bucket (my-app-bucket): BucketAlreadyOwnedByYou:
  Your previous request to create the named bucket succeeded.

Fix — Import It

bash
# Check what's in the state
terraform state list | grep bucket
 
# Import the existing resource
terraform import aws_s3_bucket.main my-app-bucket
 
# Verify
terraform plan

For Terraform 1.5+, use import blocks instead:

hcl
import {
  to = aws_s3_bucket.main
  id = "my-app-bucket"
}

Then run terraform plan to see what changes would be needed to match your config.

Cause 6 — Partial Apply Failure

Apply succeeded for some resources but failed for others, leaving the state partially updated.

Fix — Don't Panic

Terraform state is updated resource-by-resource. If apply fails midway:

  1. Check what actually got created:
bash
terraform state list
  1. Fix the error that caused the failure (usually one of the causes above)

  2. Run apply again:

bash
terraform apply

Terraform will skip resources that already exist in the state and only create/modify what's still needed.

Never manually delete the state file after a partial failure. That creates orphaned resources.

Cause 7 — Conditional Logic Timing

Using count or for_each with values that aren't known until apply time.

Symptoms

Error: Invalid count argument

  The "count" value depends on resource attributes that cannot be
  determined until apply.

Fix

Move the dynamic value to a variable or data source that's known at plan time:

hcl
# Bad — count depends on a resource output
resource "aws_instance" "worker" {
  count = aws_autoscaling_group.main.desired_capacity  # Unknown at plan
  # ...
}
 
# Good — use a variable
variable "worker_count" {
  default = 3
}
 
resource "aws_instance" "worker" {
  count = var.worker_count  # Known at plan
  # ...
}

Prevention Checklist

PracticeWhy
Pin provider versionsPrevent surprise behavior changes
Use -refresh-only before applyDetect drift early
Run plan with -out=planfile then apply planfileGuarantee plan matches apply
Enable state lockingPrevent concurrent modifications
Use CI/CD for all appliesEliminate manual console changes
Limit parallelism for large stacksAvoid API throttling

The Golden Rule: Plan Files

Always use plan files in CI/CD:

bash
# Generate plan
terraform plan -out=tfplan
 
# Review (or automated approval)
terraform show tfplan
 
# Apply the exact plan
terraform apply tfplan

This guarantees the apply executes exactly what the plan showed — no re-evaluation, no drift between plan and apply.

Wrapping Up

"Plan succeeds but apply fails" almost always comes down to one of these seven causes. Start with state drift (the most common), then check provider versions, then look at API limits and dependencies.

The best prevention is to use plan files, pin provider versions, and never make manual changes to Terraform-managed infrastructure.

Want to master Terraform and avoid these production pitfalls? KodeKloud's Terraform course has hands-on labs covering state management, provider configuration, and real-world debugging scenarios. For practicing on real cloud infrastructure, DigitalOcean's Terraform provider is well-maintained and affordable for learning.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments