Case Studies
Topic 67

CI/CD for a Web Application

Case Study

A small team ships a web app — a Node frontend and a Go service living in one monorepo — to a handful of paying customers. Every change goes through a PR, merging to main should put it in front of internal testers within minutes, and a tagged release should promote it to production once a human approves. The team has exactly one platform engineer, so the pipeline has to run itself: no manual deploy steps, no long-lived cloud keys sitting in a secrets store waiting to leak.

The design choices here are about trust and identity as much as automation. What does each stage gate? Which credential can reach production, and for how long? The decisions that matter — build once and promote the same artifact, scope production secrets to an approved environment, and authenticate to the cloud with short-lived OIDC tokens — are the ones that keep a fast pipeline from becoming a fast way to ship the wrong bits or leak a key.

Build once, promote the same digest
PR gatechecks must pass
Build onceimage @ digest
Deploy stagingon merge
Promote to prodsame digest · gated · OIDC

The PR Gate

Every pull_request runs the same checks: lint, type-check, unit tests, and a build. The point of the gate is that nothing reaches main without passing it, enforced by a required status check in branch protection. The one piece teams forget is concurrency — without it, a developer who force-pushes three times queues three full builds, and the runners spend their day on stale commits.

on: pull_request
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run lint && npm run typecheck && npm test

Build Once, Promote the Same Artifact

The cardinal rule of a deploy pipeline is that the artifact you tested is the artifact you ship. Build the container image one time, push it to a registry, and refer to it everywhere by its immutable digest — not a mutable tag like latest that can point at different bytes tomorrow. Staging deploys that digest; production deploys the same digest. The moment you rebuild per environment, "it passed in staging" stops meaning anything, because production is running a different build.

Promotion is therefore a deploy of a known digest, never a fresh build:

      - run: |
          IMAGE=registry.example.com/app@${{ needs.build.outputs.digest }}
          deploy --image "$IMAGE" --env production

Environments and Protection Rules

Staging deploys automatically on merge to main. Production is a GitHub Environment with required reviewers and a wait timer, and — this is the part that does the security work — the production deploy credentials are scoped to that environment. A workflow job that does not target the production environment cannot read its secrets at all, so a careless or malicious job elsewhere in the repo has no path to the production credential.

  deploy-prod:
    needs: build
    environment: production   # gates secrets behind required reviewers
    runs-on: ubuntu-latest
    steps:
      - run: deploy --image "registry.example.com/app@${{ needs.build.outputs.digest }}"

OIDC Instead of Stored Keys

The deploy job needs to authenticate to the cloud, and the lazy way is a long-lived access key stored as a repo secret. That key is a standing liability: it works from anywhere, never expires on its own, and one leak compromises the whole account. OIDC federation replaces it. The job requests id-token: write, the runner mints a short-lived token signed by GitHub, and a cloud-side trust policy accepts it only when it comes from this repo and this environment. There is no static credential to leak, and the token dies in minutes.

    permissions:
      id-token: write
      contents: read
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123:role/deploy
          aws-region: us-east-1   # short-lived token, no stored key

Rollback and the Health Gate

Because every deploy is a digest, rollback is trivial: redeploy the previous known-good digest. There is no reverting commits and rebuilding under pressure — the old artifact still exists in the registry, and pointing production back at it is one command. The other half is not declaring victory when the pipeline goes green. A passing pipeline means the rollout command succeeded, not that the app is healthy. A post-deploy smoke test against the live environment is what turns "deployed" into "working," and its failure is what triggers the rollback.

Speed Budget

A pipeline slower than about five minutes gets routed around — developers merge without waiting, or push "fix CI" commits blindly. Dependency caching and build-layer caching keep PR feedback fast enough that the gate stays a gate rather than an obstacle people learn to ignore.

Stored cloud keys vs OIDC federation

Stored keys — a long-lived access key sits in repo secrets, usable by any workflow that can read them, from anywhere, until someone rotates it. One leak compromises the whole cloud account, and the blast radius is every environment the key can reach.

OIDC federation — the deploy job mints a short-lived token per run via id-token: write, accepted by a cloud trust policy scoped to one repo and one environment. Nothing static exists to leak, the token expires in minutes, and the trust is bounded to exactly the job that needs it.

Common Mistakes
  • Rebuilding the artifact separately for staging and production, so the thing you tested is not the thing you shipped and "passed in staging" stops meaning anything.
  • Putting production deploy credentials in plain repo secrets every workflow can read, so any job in the repo — including fork-adjacent ones — has a path to production.
  • Omitting a concurrency group on deploys, so two merges race and the older deploy overwrites the newer one in production.
  • Writing tests that hit live shared infrastructure, making PR runs flaky and order-dependent and training the team to re-run until green.
  • Treating a green pipeline as "deployed" with no smoke check, so a successful rollout of a broken build goes unnoticed.
Best Practices
  • Build the deployable artifact once and promote it across environments by immutable digest.
  • Configure production as a protected Environment with required reviewers and scope its secrets to that environment.
  • Authenticate to the cloud with OIDC and permissions: id-token: write, and delete static cloud keys from secrets.
  • Set a concurrency group per environment so only one deploy runs at a time.
  • Add a post-deploy smoke test and make rollback a one-command redeploy of the prior digest.
  • Cache dependencies and build layers to keep PR feedback under about five minutes.
Comparable toolsGitLab CI/CD environments and protected variablesCircleCI contexts and OIDC for cloud authArgo CD / Flux GitOps-style digest promotion

Knowledge Check

Why promote the same artifact by digest instead of rebuilding per environment?

  • A rebuild produces different bytes, so the build tested in staging is not the one shipped to production
  • Rebuilding per environment is slower but produces a byte-for-byte identical image, so it is equally safe
  • GitHub Environments reject any deploy that references an image by a mutable tag instead of a digest
  • A registry enforces a one-build-per-commit limit, so digest promotion is the only image it will keep

What does a GitHub Environment's required reviewers actually gate?

  • A job targeting that environment, including access to the secrets scoped to it, until a reviewer approves
  • Whether a contributor is allowed to open a pull request against the repo at all, holding the PR in a pending state until one of the listed reviewers signs off
  • Whether the upstream build job is permitted to run before the deploy stage begins compiling the artifact
  • The merge of the pull request into the protected main branch once required status checks pass

What failure mode does OIDC federation eliminate?

  • A long-lived stored cloud key that works from anywhere and compromises the account if leaked
  • Flaky tests that hit shared infrastructure and pass or fail by ordering, so a clean run depends on which job happened to touch the fixture first
  • Two merges racing their deploys into the same production environment, where the slower run lands last and overwrites the newer release
  • A pipeline whose PR feedback runs longer than the five-minute budget

Why does a deploy job need a concurrency group?

  • Without it two merges race and the older deploy can finish last and overwrite the newer one
  • It is the permission that lets the job mint its OIDC token to exchange for short-lived cloud credentials at deploy time
  • It caches the build layers reused between successive runs so the next deploy skips recompiling unchanged images
  • It blocks pull requests from forks from reaching the deploy and exposing the environment secrets

You got correct