Chapter 10: Collaboration and Automation
Topic 58

Terraform in CI/CD

CI/CDPipeline

Running Terraform from CI instead of laptops is what makes infrastructure changes reviewable, auditable, and consistent. The canonical pipeline runs fmt -check, validate, and plan on every pull request and posts the plan for review, then runs apply on merge — with credentials assumed through OIDC, never stored keys. Every job talks to the same remote S3 backend, so a plan in CI sees exactly the state a teammate's laptop would.

The whole point is that the diff a reviewer approves is the diff that gets applied. That requires two disciplines the pipeline enforces: persist the plan as an artifact and apply that exact artifact on merge, and make CI the only thing that writes to shared environments. Break either and the pipeline becomes theater.

The pull-request pipeline
PR: fmt/validate + plan
review/approve
merge
apply (saved plan)

The Pipeline Shape

On a pull request the pipeline runs terraform fmt -check to fail on unformatted code, terraform validate to catch syntax and reference errors, and terraform plan to compute the diff. The plan is posted back to the PR as a comment so reviewers read it alongside the code. The apply stage is gated: it runs only on merge to the main branch, after the required review has approved both the code and the plan it produced.

Plan Artifacts

The reliable way to guarantee you apply what was reviewed is to save the plan on the PR with plan -out=tfplan, persist that file as a build artifact, and run apply tfplan on merge against the saved file. apply on a saved plan executes the frozen diff rather than recomputing a fresh one — so the bytes a reviewer approved are the bytes applied. Recompute the plan at apply time and you might apply something nobody saw.

.github/workflows/terraform.yml — plan on PR, apply on merge
jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    permissions: { id-token: write, contents: read, pull-requests: write }
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111122223333:role/terraform-plan
          aws-region: us-east-1
      - run: terraform init
      - run: terraform fmt -check
      - run: terraform validate
      - run: terraform plan -out=tfplan
      - uses: actions/upload-artifact@v4
        with: { name: tfplan, path: tfplan }

The plan job runs only on pull requests, assumes a plan-only IAM role through OIDC, and uploads the saved plan as an artifact. A separate apply job — gated on merge and assuming a role with write permissions — downloads that artifact and runs terraform apply tfplan, applying exactly the reviewed diff.

OIDC Authentication

Static AWS access keys stored as CI secrets are long-lived credentials sitting in a place many people can reach; one leak and an attacker has standing access. OIDC removes them: the CI provider issues a short-lived signed token, AWS trusts that token's issuer through an IAM OIDC provider, and the job assumes an IAM role for temporary credentials scoped to that workflow. There is no key to rotate, leak, or forget — the credentials live for minutes and exist only inside the job.

trust policy — let a specific GitHub repo assume the role via OIDC
resource "aws_iam_role" "terraform_plan" {
  name = "terraform-plan"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Federated = aws_iam_openid_connect_provider.github.arn }
      Action    = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:acme/infra:*"
        }
      }
    }]
  })
}

The trust policy scopes who can assume the role to one repository's tokens with the sub condition — only workflows in acme/infra can assume it, and you can tighten the condition further to specific branches or environments. No access key appears anywhere in the pipeline.

Remote State in CI

Every job — plan and apply — runs terraform init against the same S3 backend with the same locking. That is what keeps CI and laptops from diverging: there is one state, and whoever holds the lock is the only writer at that moment. If CI used its own copy of state, a plan would be computed against a different reality than the one a developer sees, and the approved diff would be meaningless.

Common Platforms

GitHub Actions and GitLab CI are the usual hosts for this pipeline — you write the workflow yourself, which gives full control. Atlantis is a self-hosted service that automates the plan-on-PR and apply-on-comment flow specifically for Terraform, so you do not hand-write the orchestration. HCP Terraform offers managed remote runs with the same shape built in. All three implement the same pattern: plan on the PR, gated apply on merge or approval.

Common Mistakes
  • Letting engineers apply from their laptops alongside CI, so changes bypass review and you can no longer say who wrote the current state.
  • Generating a plan on the PR and a fresh, different plan on apply, applying a diff nobody reviewed because reality or config shifted in between.
  • Storing static AWS access keys as CI secrets instead of assuming a role through OIDC, leaving long-lived credentials to leak.
  • Running apply automatically on every PR push instead of gating it on merge and required review, so unreviewed changes reach the environment.
  • Skipping fmt -check and validate on the PR, letting formatting churn and reference errors land and only surface at apply time.
Best Practices
  • Run fmt -check, validate, and plan on every pull request and gate apply on merge with a required review.
  • Persist the plan with plan -out as an artifact and run apply on that exact artifact at merge, so you apply precisely what was reviewed.
  • Authenticate CI through OIDC-assumed roles scoped to the repository, never stored static keys.
  • Make CI the only thing that applies to shared environments, removing laptop write access to their state.
  • Give the plan job a read-only IAM role and the apply job a separate write role, so a PR run can never mutate infrastructure.
Comparable tools Atlantis automates plan-on-PR and apply-on-comment HCP Terraform managed remote runs with the same shape Pulumi the same pipeline pattern with its own CLI

Knowledge Check

Why save the plan with plan -out=tfplan on the PR and run apply tfplan on merge?

  • Apply executes the frozen, reviewed diff rather than recomputing a fresh one that nobody saw
  • It is the only way to run a Terraform apply without any network access to the remote backend
  • It encrypts the saved plan so that reviewers are unable to read any sensitive values in it
  • It skips the state refresh phase entirely so that the apply step runs noticeably faster

What does OIDC authentication replace in a Terraform CI pipeline?

  • Long-lived static AWS access keys stored as CI secrets, with short-lived role-assumed credentials
  • The remote S3 backend, by storing the state file inside the CI provider's own managed storage instead
  • The plan step, by cryptographically proving the diff without ever running Terraform itself
  • State locking, by serializing every concurrent run through the external identity provider

Why do all CI jobs run init against the same S3 backend?

  • So CI and laptops never diverge — there is one state and one lock, and the approved plan reflects real state
  • Because the S3 backend is the only backend type that supports OIDC role-assumed credentials at all, full stop
  • To let each individual job keep its own private isolated copy of the state file for safety
  • Because fmt -check simply cannot run at all without a remote backend already configured for it

What goes wrong if engineers keep applying from their laptops alongside the CI pipeline?

  • Changes bypass review and the state writers become unpredictable, so nobody can say who produced the current state
  • The S3 backend permanently stops accepting any further writes coming in from the CI pipeline
  • The pipeline's OIDC tokens are immediately invalidated whenever a laptop runs an apply
  • Nothing at all — laptop and CI applies are completely equivalent as long as both of them use the same remote state

You got correct