Backups, Migration, and Persistence Patterns
A named volume keeps data alive across container churn, but it does nothing to protect against a deleted volume, a corrupt database, or a dead host — that's what backups are for, and Docker has no built-in backup command. The standard pattern is a throwaway helper container that mounts the volume and tars its contents out to the host. The same trick moves a volume between hosts.
Underneath all of it sits the rule this chapter has been building toward: the image carries the application, the volume carries the data, and the two are versioned and protected by completely different mechanisms. Conflate them and you get bloated images that still don't protect the data. Keep them separate and each gets the treatment it actually needs.
There Is No docker volume backup
Docker will not snapshot a volume for you. There is no docker volume backup command, and looking for one is the first wrong turn. You back a volume up by mounting it into a short-lived container alongside a host directory and copying the data out yourself — which is why the helper-container pattern is the canonical answer rather than a single flag. It feels like a workaround the first time; it is simply how volume backup works.
The tar Backup Pattern, in Words
One disposable container produces one portable archive. It mounts the volume read-side at /data, mounts the current host directory at /backup, tars the contents of /data into a file on the host, and the --rm discards the helper the instant it exits. The image is busybox because it's tiny and ships tar — you need nothing heavier.
--rm busyboxmount volume + host dirtar the dataarchive on the hostdocker run --rm \ -v driftwood-db-data:/data \ -v $(pwd):/backup \ busybox \ tar czf /backup/driftwood-db.tar.gz -C /data .
Read it mount by mount: -v driftwood-db-data:/data attaches the volume you're backing up, -v $(pwd):/backup bind-mounts the current host directory so the archive lands on the host, and tar czf /backup/driftwood-db.tar.gz -C /data . changes into /data and tars its contents — the -C matters, because it stores relative paths that restore cleanly into any target volume. The --rm means no stopped helper container is left behind.
Restore and Migration
Restoring reverses the same shape: a helper container mounts the target volume and the archive, then untars into the volume. Migrating a volume to another host is just back up, copy the tarball across, restore on the far side — the volume becomes a portable artifact independent of any one machine.
docker volume create driftwood-db-data docker run --rm \ -v driftwood-db-data:/data \ -v $(pwd):/backup \ busybox \ tar xzf /backup/driftwood-db.tar.gz -C /data
Create the target volume, mount it at /data and the directory holding the archive at /backup, and untar into /data. Because the archive stored relative paths, the contents land at the volume root regardless of which host or which volume name you're restoring into — which is exactly what makes the same two commands a migration tool, not just a backup tool.
Quiesce the Database First
taring a live Postgres data directory can capture a torn, mid-write state — pages half-flushed, the write-ahead log out of sync with the heap — and you only discover it's unrestorable when the restore won't start. For a database, take a logical dump with pg_dump, or stop or quiesce the container before archiving the volume, so the backup is consistent rather than merely present. A backup that restores into a broken database is not a backup; it's a false sense of security with a timestamp on it.
Data Belongs in a Volume, Not the Image
Never docker commit a container's data into an image, and never COPY a database into a Dockerfile. The image is the application artifact; the volume is the data, backed up out-of-band on its own schedule. Baking data into the image balloons it, ships the whole dataset on every push, and still leaves the data unprotected against the failures backups exist for. Keep the boundary clean — application in the image, data in the volume — and each is versioned and recovered the right way. That boundary is the through-line of this whole chapter, and it carries straight into the next one, where the Driftwood stack starts talking over a network.
- Trusting that a named volume is the backup — it survives container removal but not
docker volume rm, a disk failure, orcompose down -v; without an out-of-band copy, one wrong command ends the data. taring a running Postgres volume and discovering on restore that the snapshot is a torn, mid-transaction state — quiesce orpg_dumpfirst so the archive is actually consistent.- Baking the database into the image via
docker commitorCOPY data/ /var/lib/postgresql/data— the image balloons, every push ships the whole dataset, and the data still isn't backed up anywhere durable. - Forgetting
--rmon the helper container and accumulating stopped backup containers on every run, or omitting-C /dataand tarring absolute paths that won't restore cleanly to a different volume. - Storing backups on the same host — or the same disk — as the volume, so a host failure takes both; the archive has to live somewhere else to count as a backup.
- Back up named volumes with a
--rmhelper container that tars the data to the host, then move the archive off the host, so a lost machine doesn't take the only copy. - Take a consistent database backup —
pg_dumpor a quiesced volume — rather than a tar of a live data directory, so the restore actually works. - Keep the boundary clean: application in the image, data in the volume, each versioned and protected on its own track, never
commited orCOPYed together. - Test a restore on a fresh volume on a clean host periodically, because an untested backup is an assumption, not a recovery plan.
--rm tar helper pattern over the same volume layout
Velero · CSI snapshots orchestration-driven volume snapshots instead of hand-run tar (Ch12 topic 76)
pg_dump · pg_basebackup · cloud disk snapshots the consistency-aware alternatives to a raw filesystem tar
Knowledge Check
Why isn't a named volume on its own a backup?
- It survives container removal but not
docker volume rm, a disk failure, orcompose down -v— there's no second copy - Docker silently snapshots every named volume to a hidden directory on a fixed nightly schedule and rotates the copies, so the live volume is itself a complete restorable backup with no extra step
- It is strictly read-only, so it can't hold a current copy of the data to restore from later
- It is wiped every time the container restarts, so nothing ever accumulates to restore from
In the --rm tar helper command, what does each of the two volume mounts do?
- One attaches the volume being backed up at
/data; the other bind-mounts the host directory at/backupwhere the archive lands - Both are tmpfs mounts, so the entire backup is built in RAM first before being written out
- Both mount the very same
driftwood-db-datavolume twice, at/dataand/backup, so thattarcan read the source and write the finished archive back into the same volume in a single pass - One mounts the volume and the other one pulls the busybox image down from a registry
Why does a database need a quiesced or logical backup rather than a tar of its live data directory?
- A live tar can capture a torn, mid-write state that won't restore;
pg_dumpor quiescing first makes it consistent - A plain
tarsimply can't handle a Postgres data directory larger than a few gigabytes in one pass, so it silently truncates the archive partway through and the backup ends up incomplete - tar cannot compress dense database files, so the resulting backup would be far too large to move
- The volume simply can't be mounted by a second helper container while the database is running
Why should application data never be baked into the image with docker commit or COPY?
- The image is the application; data belongs in a volume — baking it in bloats the image, ships the dataset on every push, and still doesn't protect it
- Committing the data into the image with
docker commitis a perfectly good backup on its own that needs no out-of-band copy, since pushing the image to a registry safely stores a durable second copy of the dataset for free - Images encrypt all of their contents by default, so the baked-in data would be unreadable when restored
- Docker outright blocks any Dockerfile that tries to copy data into a database directory
You got correct