Chapter 12: Production Operations
Topic 70

Upgrading Providers and Versions

ToolingOperations

Terraform core and the AWS provider both ship releases constantly, and the cost of ignoring them compounds. Falling years behind turns what should be a routine bump into a multi-version migration where renamed arguments, removed resources, and changed defaults all land at once. The AWS provider alone moves through a major version roughly yearly, so a config pinned three majors back has three sets of breaking changes waiting.

Upgrading deliberately and incrementally — read the guide, bump one major at a time, verify with a clean plan — is how you stay current without the breakage. The discipline is unglamorous and it is the entire difference between an estate that upgrades in an afternoon and one that nobody dares touch.

Why Stay Current

Three forces push you to keep up. Security fixes land in current releases, not back-ported to whatever version you froze on. New AWS resources and arguments only appear in newer provider versions, so a config three majors behind cannot manage services AWS shipped since. And the migration debt grows the longer you wait — every deferred upgrade is breaking changes accumulating into one giant, risky jump instead of a series of small reviewable ones.

Staying current is cheaper than catching up, and the gap only ever widens. A config that upgrades a major version every few months never accumulates a migration; one that skips upgrades for two years eventually faces a project to do them all at once under pressure.

Upgrading the AWS Provider

A provider upgrade is four steps: read the upgrade guide for the major you are moving to, raise the version constraint in required_providers, run init -upgrade to pull the new version and rewrite the lock file, then plan and read every change carefully. The upgrade guide is the part people skip and the part that matters — it lists the renamed and removed arguments that turn into apply-time errors if you do not adjust your config first.

One major version at a time
read upgrade guide
bump constraint
init -upgrade
plan & read
versions.tf — bump one major at a time
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # was ~> 4.0 — one major step
    }
  }
}
pull the new version and rewrite the lock file
# download the new provider, update .terraform.lock.hcl
terraform init -upgrade

# read every change — renamed args show up here, not as a surprise at apply
terraform plan

The constraint above moves the AWS provider from the 4.x line to 5.x in a single major step, which is the point — one major at a time. After init -upgrade rewrites the lock file, the plan reveals any config that the new version reads differently, so you fix it before apply rather than discovering it mid-change.

Upgrading Terraform Core

Core upgrades have an extra hazard the provider upgrades do not: state format. A newer Terraform may write the state file in a format an older Terraform cannot read, and that upgrade is one-way. Pin the version with required_version so the whole team and CI run the same core, and treat a core bump as a coordinated change, not something one engineer does locally.

pin core so the team runs one version
terraform {
  required_version = "~> 1.9"
}

The failure here is asymmetric and quiet. If one engineer upgrades core locally and applies, the newer Terraform upgrades the state format, and a teammate still on the older version can no longer read it — a self-inflicted lockout from a one-line change nobody coordinated. required_version is the guardrail that makes the older Terraform refuse to run rather than silently diverge.

Incremental over Big-Bang

Step through major versions one at a time, with a clean plan as the checkpoint after each. Moving 4.x to 5.x to 6.x is three small upgrades, each with its own upgrade guide and its own verifying plan; jumping 4.x straight to 6.x stacks every breaking change from two majors into one impossible-to-review diff. The clean plan between steps is what proves the config still matches reality before you take the next step.

The verifying plan is non-negotiable. After each bump, a plan that shows no unexpected changes is the signal it is safe to proceed; a plan full of surprise replacements is the signal to stop and read the guide again before the next major.

Coordinating a Team Upgrade

An upgrade is a shared change, not a personal one. Agree on the target version, update the dependency lock file (.terraform.lock.hcl) and commit it so everyone resolves the identical provider, and roll the change out through environments in order — a lower environment first, production last, after the upgrade has proven itself. The committed lock file is what stops "works on my machine" version drift between the team and CI.

Test in a non-production environment before production sees the new version. The point of staging is exactly this: a provider upgrade that rewrites a plan should surface its surprises where an outage does not matter, not in the production apply.

Common Mistakes
  • Letting Terraform and provider versions fall years behind, turning a needed upgrade into a risky multi-version migration with every breaking change stacked at once.
  • Bumping a major provider version without reading the upgrade guide, then hitting renamed or removed arguments as errors at apply time.
  • Letting a newer Terraform write and upgrade the state format locally, locking out a teammate still on the older version who can no longer read it.
  • Jumping several major versions at once instead of stepping through them, producing a diff too large to review and too risky to apply.
  • Upgrading straight in production without testing in a lower environment first, so a plan-rewriting surprise lands where an outage matters.
Best Practices
  • Upgrade regularly and incrementally, one major version at a time, with a clean verifying plan as the checkpoint between steps.
  • Read the upgrade guide before every major bump and treat the resulting config changes as a reviewed pull request.
  • Pin required_version so the whole team and CI run the same core, and coordinate any bump rather than upgrading locally.
  • Commit the updated .terraform.lock.hcl so everyone resolves the identical provider versions with no drift.
  • Test the upgrade in a lower environment and roll out to production last, after it has proven itself.
Comparable tools Pulumi upgrades the CLI and provider packages on a similar cadence CloudFormation no direct equivalent — AWS manages the engine, nothing to upgrade Ansible versions its core and collections, with the same incremental discipline

Knowledge Check

Why upgrade Terraform and the AWS provider regularly instead of freezing on one version?

  • Deferred upgrades stack every breaking change into one risky multi-version migration, and you miss security fixes and new resources
  • Old versions lose the ability to read the S3 backend once they are more than a year out of date
  • HashiCorp routinely deletes old provider versions from the public Registry, so a pinned init eventually breaks once the version is gone
  • The state file expires after a fixed period and must be regenerated on each new release

What is the safe process for a major AWS provider upgrade?

  • Read the upgrade guide, bump the constraint, run init -upgrade, then plan and adjust for renamed or removed arguments
  • Delete the dependency lock file entirely and let the next apply resolve and install the newest version available automatically
  • Run terraform destroy and recreate every resource fresh under the new provider
  • Edit the state file by hand to match the new provider's expected schema

What is the specific risk of one engineer upgrading Terraform core locally and applying?

  • The newer core upgrades the state format one-way, and teammates on the older version can no longer read it
  • The provider lock file is silently deleted by the upgrade, forcing the whole team to re-init their working directories from scratch
  • Every managed resource is tainted and recreated on the next apply that runs
  • The backend silently switches from the S3 remote to a local file

Why step through one major version at a time rather than jumping several at once?

  • Each step has its own upgrade guide and a clean verifying plan, instead of stacking every breaking change into one unreviewable diff
  • Terraform hard-refuses to skip more than one major version in a single init or upgrade
  • Each intermediate version is strictly required to migrate the state format forward one step at a time, in an exact unbroken sequence
  • Jumping several versions at once costs noticeably more in provider download bandwidth

You got correct