State Surgery
Sometimes you must operate on state directly: remove a resource Terraform should forget but not destroy (state rm), rename or move a resource's address (state mv), or force a resource to be recreated on the next apply (-replace). These are the sharp tools in the kit. A wrong state rm orphans a live resource that keeps billing; a wrong state mv corrupts the mapping between your code and reality. Using them deliberately — and knowing when a declarative alternative is better — is a defining production skill.
The governing rule for all of them: look before you cut. Every surgery command takes a resource address, and the difference between fixing a problem and creating one is usually whether you confirmed the exact address first.
Removing from State
terraform state rm ADDRESS tells Terraform to forget a resource without touching the real object. The resource stays alive in AWS; Terraform simply stops managing it. This is the right tool when you are handing a resource off to another state or another team, or cleaning up after a bad import that pulled in something you did not mean to manage.
# Terraform stops tracking the bucket; the bucket still exists in AWS terraform state rm aws_s3_bucket.legacy_logs
The orphan risk is the whole danger. After state rm, nothing in Terraform knows the resource exists, so nothing will ever clean it up — it keeps running and keeps billing, invisible to every future plan. state rm is for handing off management on purpose, never for "getting rid of" a resource; for that you want destroy.
Moving an Address
terraform state mv OLD NEW changes a resource's address in state without destroying and recreating the object — the imperative way to record a rename or a move into a module. It works, but it is a local one-off command: every teammate and every pipeline that shares the state must run the identical command separately, and none of it shows up in a plan or a review.
# rename the address; the underlying instance is untouched terraform state mv aws_instance.web aws_instance.api # move a resource into a module terraform state mv aws_instance.api module.compute.aws_instance.this
For anything in shared state, a moved block in your configuration is the better tool — it is in version control, shows up in the plan, and applies once for everyone including CI. Reserve state mv for genuine one-off surgery where no declarative path fits.
Forcing Recreation with -replace
When a resource is in a bad runtime state that its config does not capture — a corrupted instance, a stuck cache node — you force Terraform to destroy and recreate it on the next apply with -replace=ADDRESS. This is the successor to the old taint workflow: where terraform taint mutated state out of band so the next plan recreated the resource, -replace is a plan-time flag that shows the replacement in the plan you review before it happens.
# show the destroy/recreate in the plan, then apply it terraform plan -replace=aws_instance.api terraform apply -replace=aws_instance.api # the old way — deprecated, mutated state with no plan preview # terraform taint aws_instance.api
-replace is strictly better because it is reviewable: the destroy-then-create appears as -/+ in the plan output, so you see exactly what is about to happen instead of discovering it after a separate taint command already changed state. Prefer it over taint in any current version.
Inspecting Before Surgery
No surgery command should run before state list and state show. state list prints every address in the state so you can confirm the exact target — including the indexed forms (aws_instance.web[0], aws_instance.web["api"]) that are easy to get wrong from memory. state show ADDRESS dumps a resource's recorded attributes so you can verify you are operating on the object you think you are.
# list every address in state, filter to what you care about terraform state list | grep instance # confirm the exact resource and its attributes terraform state show aws_instance.api
A targeting mistake on a surgery command is a self-inflicted incident: state rm on the wrong address orphans the wrong resource, state mv to the wrong target corrupts the mapping. The thirty seconds spent confirming the address is the cheapest insurance in the chapter.
Backups and Recovery
Every state subcommand writes a local backup file before it changes anything, named with a timestamp so you can restore the prior state if a command went wrong. That local backup is a convenience, not the real safety net — it lives on the operator's machine and is easy to lose.
The dependable recovery mechanism is versioning on the remote state bucket. With S3 bucket versioning enabled, every state write — including the ones your surgery commands make — is a recoverable object version, so a botched state rm is a restore of the previous version rather than a reconstruction project. Rely on bucket versioning as the backstop and treat the local backup as the quick first option.
state mv — an imperative, local, one-off command that each operator and pipeline must run separately, bypassing review entirely. Reserve it for genuine one-off surgery where no declarative path fits — fixing one operator's diverged state, an awkward move no moved block expresses cleanly.
moved block — a declarative statement in your configuration: in version control, visible in the plan, reviewed like any change, and applied exactly once for everyone including CI. Use it for any shared refactor — a rename or a module extraction that every state needs to follow.
- Running
state rmand forgetting the resource still exists in AWS, orphaning it so it keeps running and billing with nothing tracking it. - Using
state mvfor a refactor everyone needs instead of amovedblock, leaving every teammate's and CI's state diverged until each runs the command by hand. - Operating on state without
state listorstate showfirst and targeting the wrong address — orphaning or corrupting a resource you never meant to touch. - Still reaching for
terraform taintto force recreation instead of-replace, giving up the plan preview that shows the destroy/recreate before it happens. - Trusting the local timestamped backup alone when surgery goes wrong, instead of S3 bucket versioning that survives losing the operator's machine.
- Inspect with
state listandstate showbefore any surgery to confirm the exact target address, including indexed forms. - Use
movedblocks for shared refactors; reservestate mvandstate rmfor true one-off surgery. - Prefer
-replaceover the deprecatedtaintworkflow so the destroy/recreate shows up in a plan you review first. - Remember
state rmforgets without destroying — usedestroywhen you actually want the resource gone. - Rely on S3 bucket versioning as the recovery mechanism and verify a known-good version exists before risky surgery.
Knowledge Check
What does terraform state rm do to the real resource?
- Nothing — the resource keeps running in AWS; Terraform just stops tracking it, risking an orphan
- Destroys it immediately in AWS, exactly the same as running
terraform destroyagainst that resource - Schedules the underlying resource for deletion on the very next apply
- Moves it into a separate quarantine state file until you choose to restore it
Why is a moved block preferred over state mv for a shared refactor?
- It is in version control, shows up in the plan, and applies once for everyone including CI
- It is faster because it edits the state directly without making any AWS API call
- It can move resources between two completely separate state files in one step, which
state mvcannot - It encrypts the moved resource's attributes inside the state file
What did -replace supersede, and why is it better?
- It superseded
taint; the destroy/recreate shows up in the plan you review before it happens - It superseded
state mv; it moves the resource to a new address instead of recreating it from scratch - It superseded
destroy; it removes the resource without a confirmation prompt - It superseded
import; it brings an existing resource under management faster
A state rm went wrong and you need to recover. What is the dependable mechanism?
- Restore the prior object version from the versioned S3 state bucket
- Re-run the same command, which automatically reverses itself
- Run
terraform refresh, which rebuilds the removed entry from AWS - Delete the state file so Terraform regenerates it from the configuration
You got correct