Topic 20

Rolling Updates

Concept

Maya has a new version of Pageturn ready, and right now five copies of the old version are serving readers. Somehow those five copies all need to become the new version. The naive way is to stop all five, then start five new ones — but for the seconds or minutes in between, there are zero copies running, and every reader sees an error page. Taking the whole site down to update it is exactly what teams want to avoid.

The orchestrator offers a better way: swap the copies a few at a time, so there are always healthy copies serving readers while the change happens. That gradual swap is called a rolling update, and it comes with a safety net — if the new version misbehaves, you can roll back to the previous one.

Replace the copies a few at a time — the site never drops to zero
5 oldall serving
Swap a few4 old · 1 new
Keep going2 old · 3 new
5 newno downtime

The problem: updating without going offline

Updating software sounds like it should be simple — out with the old, in with the new. The trouble is the gap in the middle. If you remove the old version before the new one is up and serving, there's a window where nothing answers, and a reader who opens Pageturn in that window hits a dead site.

For a site that real people use at all hours, that window is unacceptable. The goal is to change the running version while keeping the site continuously available, so that no reader ever notices an update happened at all.

The rolling update: a few at a time

A rolling update solves this by never touching all the copies at once. The orchestrator retires one old copy and starts one new copy in its place, waits until that new copy is healthy and serving, then moves to the next. It rolls through the copies gradually until every one is the new version.

Think of replacing the wheels on a moving train one at a time — the train keeps rolling because the other wheels are always on the track. The update rolls through the copies the same way: because only a few are ever being swapped at once, the rest keep the site moving. (Like any analogy, this one has limits — software copies are swapped whole, not bolt by bolt — but the "never all at once" idea is the part that holds.)

No downtime: always something serving

The payoff is in the count. At the start there are five healthy copies; at the end there are five healthy copies; and at every step in between there are still about five healthy copies, just a shifting mix of old and new. The total serving readers never drops to zero, so the site stays up the whole time.

This is what people mean by a zero-downtime deploy: shipping a new version with no visible interruption. It's become a basic expectation for serious services, and the rolling update is the most common way it's done.

Rollback: the undo button

Sometimes the new version turns out to be bad — it's slow, or it breaks a feature that the tests didn't catch. Because the previous version is known to have worked, the orchestrator can reverse course and bring it back, swapping the new copies out for the old ones a few at a time, the same gradual way it rolled them in. That reversal is called a rollback: a fast, deliberate undo to the last good version.

Rollback is what makes frequent updates feel safe rather than terrifying — a bad deploy is an inconvenience you can undo in minutes, not a disaster. That idea of shipping carefully and always keeping a way back is the heart of the next strand of this course, deployment, in Chapter 8. And the full mechanics of rolling updates and rollbacks — how to tune them, watch them, and control them — are where the Kubernetes Deep Dive picks up, with you at the controls instead of reading along.

Common Confusions
  • "Every update means the site has to go down." Not with a rolling update. Swapping copies a few at a time keeps healthy copies serving throughout, so readers see no interruption.
  • "A rolling update means the app is updating constantly." No — "rolling" describes how one update moves through the copies gradually. It's a single update that rolls across them, not a never-ending stream of changes.
  • "Once you deploy a new version, you're stuck with it." You're not. Rollback swaps the copies back to the previous, known-good version — the deploy is reversible.
  • "A rollback erases readers' data." Rollback reverts the running app version, not the data. (Data needs its own careful handling, which the deep courses cover — flagging it here so you don't assume it's automatic.)
Why It Matters
  • Zero-downtime deploys are a baseline expectation for real services — this is how a site updates several times a day without anyone noticing.
  • Rollback is the safety net that lets teams ship often without fear; it turns a bad release from a crisis into a quick undo.
  • Rolling updates and rollbacks set up Chapter 8's deeper look at deployment strategies — this is the first taste of shipping safely.
  • These are the exact mechanics you'll get hands-on with in the Kubernetes Deep Dive, where you control the rollout yourself.

Knowledge Check

What does a rolling update achieve when shipping a new version?

  • It stops all copies, then starts the new version on all of them
  • It swaps the copies a few at a time so the site stays up throughout
  • It makes each copy run faster than the old version did
  • It keeps both the old and the new versions running side by side together forever afterward

Why does a rolling update avoid taking the site offline?

  • Because the brand-new copies happen to start up far faster than the old ones ever did before
  • Because readers are briefly paused until the update finishes
  • Because updates only ever happen late at night when no one visits
  • Because there are always healthy copies serving while a few are swapped

A new Pageturn version turns out to be slow and buggy after it's deployed. What is rollback for?

  • Returning to the previous version that is known to work
  • Automatically repairing the bug in the new version
  • Adding more copies of the slow version to speed it up
  • Erasing every bit of the reader data that was created since the new version was deployed

You got correct