Topic 38

Backups, Migration, and Persistence Patterns

BackupsPersistence

A named volume keeps data alive across container churn, but it does nothing to protect against a deleted volume, a corrupt database, or a dead host — that's what backups are for, and Docker has no built-in backup command. The standard pattern is a throwaway helper container that mounts the volume and tars its contents out to the host. The same trick moves a volume between hosts.

Underneath all of it sits the rule this chapter has been building toward: the image carries the application, the volume carries the data, and the two are versioned and protected by completely different mechanisms. Conflate them and you get bloated images that still don't protect the data. Keep them separate and each gets the treatment it actually needs.

There Is No docker volume backup

Docker will not snapshot a volume for you. There is no docker volume backup command, and looking for one is the first wrong turn. You back a volume up by mounting it into a short-lived container alongside a host directory and copying the data out yourself — which is why the helper-container pattern is the canonical answer rather than a single flag. It feels like a workaround the first time; it is simply how volume backup works.

The tar Backup Pattern, in Words

One disposable container produces one portable archive. It mounts the volume read-side at /data, mounts the current host directory at /backup, tars the contents of /data into a file on the host, and the --rm discards the helper the instant it exits. The image is busybox because it's tiny and ships tar — you need nothing heavier.

The tar backup pattern

--rm busybox

→

mount volume + host dir

→

tar the data

→

archive on the host

Back up a named volume to a host tarball with a throwaway busybox container

docker run --rm \
  -v driftwood-db-data:/data \
  -v $(pwd):/backup \
  busybox \
  tar czf /backup/driftwood-db.tar.gz -C /data .

Read it mount by mount: -v driftwood-db-data:/data attaches the volume you're backing up, -v $(pwd):/backup bind-mounts the current host directory so the archive lands on the host, and tar czf /backup/driftwood-db.tar.gz -C /data . changes into /data and tars its contents — the -C matters, because it stores relative paths that restore cleanly into any target volume. The --rm means no stopped helper container is left behind.

Restore and Migration

Restoring reverses the same shape: a helper container mounts the target volume and the archive, then untars into the volume. Migrating a volume to another host is just back up, copy the tarball across, restore on the far side — the volume becomes a portable artifact independent of any one machine.

Restore the archive into a fresh volume — the same pattern, reversed

docker volume create driftwood-db-data

docker run --rm \
  -v driftwood-db-data:/data \
  -v $(pwd):/backup \
  busybox \
  tar xzf /backup/driftwood-db.tar.gz -C /data

Create the target volume, mount it at /data and the directory holding the archive at /backup, and untar into /data. Because the archive stored relative paths, the contents land at the volume root regardless of which host or which volume name you're restoring into — which is exactly what makes the same two commands a migration tool, not just a backup tool.

Quiesce the Database First

taring a live Postgres data directory can capture a torn, mid-write state — pages half-flushed, the write-ahead log out of sync with the heap — and you only discover it's unrestorable when the restore won't start. For a database, take a logical dump with pg_dump, or stop or quiesce the container before archiving the volume, so the backup is consistent rather than merely present. A backup that restores into a broken database is not a backup; it's a false sense of security with a timestamp on it.

Data Belongs in a Volume, Not the Image

Never docker commit a container's data into an image, and never COPY a database into a Dockerfile. The image is the application artifact; the volume is the data, backed up out-of-band on its own schedule. Baking data into the image balloons it, ships the whole dataset on every push, and still leaves the data unprotected against the failures backups exist for. Keep the boundary clean — application in the image, data in the volume — and each is versioned and recovered the right way. That boundary is the through-line of this whole chapter, and it carries straight into the next one, where the Driftwood stack starts talking over a network.

Common Mistakes

Trusting that a named volume is the backup — it survives container removal but not docker volume rm, a disk failure, or compose down -v; without an out-of-band copy, one wrong command ends the data.
taring a running Postgres volume and discovering on restore that the snapshot is a torn, mid-transaction state — quiesce or pg_dump first so the archive is actually consistent.
Baking the database into the image via docker commit or COPY data/ /var/lib/postgresql/data — the image balloons, every push ships the whole dataset, and the data still isn't backed up anywhere durable.
Forgetting --rm on the helper container and accumulating stopped backup containers on every run, or omitting -C /data and tarring absolute paths that won't restore cleanly to a different volume.
Storing backups on the same host — or the same disk — as the volume, so a host failure takes both; the archive has to live somewhere else to count as a backup.

Best Practices

Back up named volumes with a --rm helper container that tars the data to the host, then move the archive off the host, so a lost machine doesn't take the only copy.
Take a consistent database backup — pg_dump or a quiesced volume — rather than a tar of a live data directory, so the restore actually works.
Keep the boundary clean: application in the image, data in the volume, each versioned and protected on its own track, never commited or COPYed together.
Test a restore on a fresh volume on a clean host periodically, because an untested backup is an assumption, not a recovery plan.

Comparable tools Podman runs the identical --rm tar helper pattern over the same volume layout Velero · CSI snapshots orchestration-driven volume snapshots instead of hand-run tar (Ch12 topic 76) pg_dump · pg_basebackup · cloud disk snapshots the consistency-aware alternatives to a raw filesystem tar

Knowledge Check

Why isn't a named volume on its own a backup?

It survives container removal but not docker volume rm, a disk failure, or compose down -v — there's no second copy
Docker silently snapshots every named volume to a hidden directory on a fixed nightly schedule and rotates the copies, so the live volume is itself a complete restorable backup with no extra step
It is strictly read-only, so it can't hold a current copy of the data to restore from later
It is wiped every time the container restarts, so nothing ever accumulates to restore from

In the --rm tar helper command, what does each of the two volume mounts do?

One attaches the volume being backed up at /data; the other bind-mounts the host directory at /backup where the archive lands
Both are tmpfs mounts, so the entire backup is built in RAM first before being written out
Both mount the very same driftwood-db-data volume twice, at /data and /backup, so that tar can read the source and write the finished archive back into the same volume in a single pass
One mounts the volume and the other one pulls the busybox image down from a registry

Why does a database need a quiesced or logical backup rather than a tar of its live data directory?

A live tar can capture a torn, mid-write state that won't restore; pg_dump or quiescing first makes it consistent
A plain tar simply can't handle a Postgres data directory larger than a few gigabytes in one pass, so it silently truncates the archive partway through and the backup ends up incomplete
tar cannot compress dense database files, so the resulting backup would be far too large to move
The volume simply can't be mounted by a second helper container while the database is running

Why should application data never be baked into the image with docker commit or COPY?

The image is the application; data belongs in a volume — baking it in bloats the image, ships the dataset on every push, and still doesn't protect it
Committing the data into the image with docker commit is a perfectly good backup on its own that needs no out-of-band copy, since pushing the image to a registry safely stores a durable second copy of the dataset for free
Images encrypt all of their contents by default, so the baked-in data would be unreadable when restored
Docker outright blocks any Dockerfile that tries to copy data into a database directory

You got correct