Topic 17

State Locking

LockingState

When two people — or two CI jobs — run apply against the same state at once, they can corrupt it. Each refreshes, plans, and writes back a version that does not include the other's changes; the second write overwrites the first, and state ends up reflecting neither run. State locking prevents this by taking an exclusive lock for the duration of any state-mutating operation.

With the S3 backend, modern Terraform can lock using S3 itself through a lockfile, introduced in 1.10. Older setups bolted on a DynamoDB table to hold the lock. Both work; the lockfile is one fewer resource to provision and is the current recommendation.

Two ways to lock S3 state

S3 native lockfile

use_lockfile = true (Terraform 1.10+). A conditional put on a lock object in the bucket you already have — nothing extra to provision.

DynamoDB lock table

Legacy: a separate table keyed on LockID holds the lock. Still everywhere in existing configs, one more resource to manage.

The Race Condition

Picture two engineers applying within the same minute. Both read serial 17. Both compute a plan against it. The first writes serial 18; the second, still holding its view of 17, writes its own serial 18 a moment later and silently erases the first's changes. No error fires — the state file is valid JSON — but it now describes a reality that never existed, and the next plan will try to "fix" the divergence by destroying or recreating real resources. Locking exists to make the second run wait instead of clobbering.

How Locking Works

Before any operation that writes state — apply, destroy, state mv, import — Terraform acquires a lock, performs the work, then releases it. A second run that finds the lock held either waits or fails fast with the lock's ID and who holds it. The lock is advisory between Terraform runs, not a filesystem lock, which is why every shared backend needs an explicit mechanism rather than assuming the operating system handles it.

S3 Native Locking

Setting use_lockfile = true on the S3 backend (Terraform 1.10+) makes S3 itself the lock. Terraform writes a small lock object alongside the state using a conditional put — the write only succeeds if the lock object does not already exist — so two simultaneous applies cannot both acquire it. There is no separate table to create, key, or pay for; the bucket you already have does the job.

backend.tf — native S3 lockfile (1.10+)

terraform {
  backend "s3" {
    bucket       = "acme-tfstate-prod"
    key          = "network/terraform.tfstate"
    region       = "us-east-1"
    use_lockfile = true
  }
}

DynamoDB Locking (Legacy)

Before 1.10, the established approach was a DynamoDB table with a partition key named LockID. The backend's dynamodb_table argument points at it, and Terraform writes a conditional item to claim the lock. It still works and is everywhere in existing configs, so you will read it for years — but a wrongly keyed table (anything other than a string key called LockID) silently fails to lock, giving you the false comfort of a lock mechanism that never engages.

backend.tf — legacy DynamoDB lock table

terraform {
  backend "s3" {
    bucket         = "acme-tfstate-prod"
    key            = "network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "acme-tflock"
  }
}

Stuck Locks

When a process dies mid-apply — a killed CI job, a dropped laptop connection — the lock can linger because nothing released it. terraform force-unlock LOCK_ID clears it, using the ID Terraform printed when it refused to run. This is the nuclear option, not a routine fix: force-unlocking a lock that a still-running apply actually holds lets a second apply corrupt state exactly the way locking was meant to prevent. Confirm no apply is running before you reach for it.

S3 Native Lockfile vs DynamoDB Lock Table

S3 native lockfile — use_lockfile = true (Terraform 1.10+) locks through a conditional-write object in the same bucket, with no extra resource to manage. Choose it for any new S3 backend.

DynamoDB lock table — a separately provisioned table with a LockID key, the established approach in older configs. The dynamodb_table argument is now deprecated and slated for removal in a future Terraform minor version; it still works for now, so existing setups keep running, but new backends should use use_lockfile and existing DynamoDB ones should plan to migrate.

Common Mistakes

Running an S3 backend with no locking at all, so two simultaneous applies silently corrupt state.
Calling force-unlock on a lock a still-running apply holds, letting a second apply overwrite the first's changes.
Assuming locking is automatic on every backend; some backends do not support it and need explicit configuration.
Provisioning a DynamoDB lock table with the wrong key schema, so locking silently never engages and you think you are protected.
Treating a transient "state is locked" message as an error to force past, instead of waiting for the run that holds it.

Best Practices

Enable locking on every shared backend — use_lockfile = true for new S3 backends; keep a legacy DynamoDB table only for an existing setup you have not migrated yet, since dynamodb_table is deprecated.
Treat force-unlock as an incident tool: confirm no apply is running, then use the exact lock ID Terraform reported.
Run applies through a serialized pipeline — one apply per state at a time — so locks are rarely contended in the first place.
When migrating off DynamoDB to the native lockfile, verify locking still engages before trusting it in production.
Key any DynamoDB lock table as a string partition key named LockID exactly, since any other schema fails silently.

Comparable tools CloudFormation serializes stack operations server-side, no user-managed locking Pulumi locks through its backend HCP Terraform queues runs so locking is implicit

Knowledge Check

What corruption does state locking prevent?

Two concurrent applies writing back versions that overwrite each other, leaving state matching neither run
A single apply being interrupted halfway through and leaving behind a partial, half-written state file on disk
Secrets being written into the state file in readable plaintext
A provider upgrade changing the resource schema partway through an apply

How does native S3 locking differ from the DynamoDB approach?

It locks via a conditional-write object in the same bucket, with no separate table to provision (1.10+)
It acquires the lock noticeably faster because DynamoDB has much higher write latency than the S3 bucket does
It encrypts the lock object, while the DynamoDB table stores the lock in plaintext
It allows several simultaneous writers at once, which the DynamoDB table forbids

When is terraform force-unlock safe to run?

Only once you have confirmed no apply is actually running and the lock is a stale leftover
Any time a plan feels slow, simply to speed the run up
Routinely before every single apply, to clear out any prior locks that an earlier run may have left behind
Whenever a teammate's apply is taking longer than you expected it to

Why does serializing CI applies reduce lock contention?

Only one apply per state runs at a time, so a second rarely arrives while the lock is held
Serialized runs skip acquiring the lock entirely, since there is no concurrency left to guard against
It disables locking on the backend, which is the real thing that causes contention
It merges all the concurrent plans into a single one before applying them

You got correct