Topic 54

LVM

VolumesSnapshots

LVM, the Logical Volume Manager, inserts a layer of indirection between physical disks and the filesystems that live on them. Instead of formatting a partition directly, you mark disks or partitions as physical volumes (PVs), pool one or more of them into a volume group (VG), and then carve that pool into logical volumes (LVs) that the filesystem sits on. The VG is a bucket of fixed-size chunks called extents (4 MiB by default), and an LV is just a named set of extents — which is why an LV can grow, shrink, move to a different disk, or span several disks while the filesystem above it barely notices.

That indirection is the whole point. A raw partition's size is fixed in the partition table the moment you create it; growing it means juggling adjacent free space that usually isn't there. An LV grows by handing it more extents from the VG, online, with the filesystem still mounted. The cost is one more layer to understand and back up — and a class of failure modes (full snapshots, overcommitted thin pools) that do not exist on plain partitions. On Debian and Ubuntu the tooling is the lvm2 package and the pv*/vg*/lv* command families; the same commands ship identically on RHEL-family systems.

The PV, VG, LV Stack

The three layers stack in a fixed order. You initialize a block device as a PV with pvcreate — this writes LVM metadata to the front of the device and nothing else. One or more PVs become a VG with vgcreate; the VG is the allocation pool, measured in physical extents. From the VG you allocate LVs with lvcreate, and only then do you put a filesystem on the resulting device, which appears as /dev/<vg>/<lv> (and as /dev/mapper/<vg>-<lv>).

# disk → PV → VG → LV → filesystem
sudo pvcreate /dev/sdb /dev/sdc
sudo vgcreate vg_data /dev/sdb /dev/sdc
sudo lvcreate -L 100G -n lv_app vg_data
sudo mkfs.ext4 /dev/vg_data/lv_app
sudo mount /dev/vg_data/lv_app /srv/app

Inspect each layer with pvs, vgs, and lvs for a one-line summary, or the -display variants (pvdisplay, vgdisplay, lvdisplay) for full detail. Mount LVs by their stable device path or by the filesystem UUID — never by a kernel name like /dev/dm-3, which is assigned in activation order and is not stable across reboots. The LVM device path (/dev/vg_data/lv_app) is stable because it is built from the VG and LV names, so it is safe to put in /etc/fstab; the underlying filesystem UUID is equally safe.

Resizing Volumes

Resizing is the feature people install LVM for, and it is two operations, not one: resize the LV, then resize the filesystem on it. Forgetting the second step is the single most common LVM mistake — the LV gets bigger and the filesystem still reports the old capacity. lvextend with -r (--resizefs) runs the filesystem resize for you and is the safe default for growing.

# grow the LV by 50G and the filesystem in one step
sudo lvextend -L +50G -r /dev/vg_data/lv_app

# or do it explicitly, per filesystem type
sudo lvextend -L +50G /dev/vg_data/lv_app
sudo resize2fs /dev/vg_data/lv_app   # ext4
sudo xfs_growfs /srv/app            # XFS: takes the mountpoint, grow-only

Growing is online and low-risk for ext4 and XFS. Shrinking is a different animal: XFS cannot shrink at all, and ext4 must be unmounted, checked, and shrunk with resize2fs before you shrink the LV with lvreduce — reverse that order and you truncate live data. Because the order matters and there is no undo, treat any shrink as a maintenance-window operation with a backup taken first, and prefer growing a fresh volume to shrinking a populated one.

Snapshots

An LVM snapshot is a copy-on-write point-in-time image of an LV. Creating it is instant and consumes almost no space at first; LVM only starts allocating when the original volume is written to, at which point the pre-change blocks are copied aside into the snapshot's reserved area. The classic use is a consistent backup source: snapshot the volume, back up the frozen snapshot while the live volume keeps serving writes, then drop the snapshot.

# 10G of CoW space for the snapshot
sudo lvcreate -s -L 10G -n lv_app_snap /dev/vg_data/lv_app
sudo mount -o ro /dev/vg_data/lv_app_snap /mnt/snap
# ... back up /mnt/snap ...
sudo umount /mnt/snap
sudo lvremove vg_data/lv_app_snap

The trap is sizing. If the original volume receives more changed data than the snapshot's reserved space can hold, the snapshot fills and is dropped — it becomes invalid, and any backup reading from it fails. Size the snapshot for the write churn expected during the backup window, monitor it with lvs (the Data% column), and keep snapshots short-lived. A long-lived snapshot on a write-heavy volume also slows the original, because every write to changed blocks now triggers a copy-on-write into the snapshot.

Thin Provisioning

Thin volumes invert the allocation model. Instead of reserving extents up front, you create a thin pool and then create thin LVs inside it whose advertised size can exceed the pool's real capacity — overcommitment. Blocks are allocated from the pool only as data is actually written, which makes thin volumes space-efficient for many sparsely-filled volumes and gives much cheaper, faster snapshots than the old CoW kind.

# a 200G pool, then a 500G thin volume backed by it (overcommit)
sudo lvcreate -L 200G -T vg_data/pool0
sudo lvcreate -V 500G -T vg_data/pool0 -n lv_thin

Overcommitment is also the risk. If the thin pool's real space runs out while volumes still think they have room, writes fail and filesystems on those volumes can be corrupted — and unlike a regular full disk, the filesystem had no warning because it believed the space was there. Set thin_pool_autoextend_threshold in /etc/lvm/lvm.conf so the pool grows automatically before it fills, keep real free extents in the VG for it to grow into, and watch the pool's Data% and metadata usage in lvs. Thin provisioning trades a guaranteed-space model for a monitored one; if you cannot monitor it, do not overcommit.

Spanning and Striping

Because a VG can hold several PVs, an LV can span more than one disk. The default linear layout fills one PV before moving to the next, so a single LV can be larger than any one disk — useful for capacity, but it gives no performance benefit and widens the failure surface: lose any one PV in a linear span and the whole LV is gone. LVM is not redundancy; that job belongs to RAID underneath it.

# stripe across 2 PVs in 64KiB chunks for parallel throughput
sudo lvcreate -i 2 -I 64 -L 200G -n lv_fast vg_data

Striping (-i for the number of stripes, -I for the stripe size) spreads consecutive extents across multiple PVs so reads and writes hit several disks in parallel, raising throughput the way RAID 0 does — and inheriting the same risk that a single disk failure destroys the LV. For redundancy plus the flexibility of LVM, the standard server layout is RAID (hardware or mdadm) for the disk-failure protection, with LVM on top of the array for resizing and snapshots. LVM can also build its own RAID LVs with --type raid1, but on Linux servers a dedicated mdadm array under a plain LVM stack is the more common and better-understood arrangement.

Raw Partitions vs LVM

Raw partitions — the filesystem sits directly on a partition whose size is fixed in the partition table. Simplest possible stack, one less layer to back up or debug, and nothing to overcommit. Choose it for a single-purpose disk you will never resize, or a small boot/EFI partition where flexibility buys nothing.

LVM — a resizable, snapshottable pool between disks and filesystems. Grow volumes online, snapshot for consistent backups, span and stripe across disks. The price is a layer of metadata to back up and failure modes (full snapshots, overcommitted thin pools) that raw partitions never have. Choose it for anything you expect to resize, snapshot, or rearrange — which on a server is almost everything except /boot and the EFI partition.

Common Mistakes

Running lvextend and stopping there — the LV grows but the filesystem still reports the old size. Use lvextend -r, or follow with resize2fs/xfs_growfs, or the new space is invisible.
Shrinking in the wrong order: reducing the LV with lvreduce before shrinking the ext4 filesystem with resize2fs truncates live data. Shrink the filesystem first, and never try to shrink XFS at all — it cannot be shrunk.
Undersizing an LVM snapshot so the original's write churn overflows the reserved space — the snapshot is dropped, becomes invalid, and the backup reading from it fails silently mid-run.
Overcommitting a thin pool with no autoextend and no monitoring — the pool fills, writes fail, and filesystems on the thin volumes corrupt because they were never told the space was gone.
Treating a linear or striped span across multiple PVs as if it were resilient — losing one PV destroys the entire LV. LVM spanning is capacity and throughput, not redundancy.
Doing risky operations (pvmove, lvreduce, vgreduce) without backing up LVM metadata first — a botched layout change with no vgcfgbackup on hand can leave the VG unrecoverable.

Best Practices

Grow with lvextend -r so the LV and the filesystem resize in one atomic step, and you can never forget the second half.
Allocate logical volumes smaller than the VG and leave free extents in the pool — grow on demand later instead of claiming everything up front, which is what makes resizing useful.
Size snapshots for the write churn of the backup window, keep them short-lived, and monitor the Data% column in lvs so you catch one before it fills and drops.
If you overcommit a thin pool, set thin_pool_autoextend_threshold in /etc/lvm/lvm.conf and keep real free space in the VG for it to grow into — never overcommit without monitoring.
Put redundancy below LVM: build a RAID array (hardware or mdadm) for disk-failure protection and run LVM on top for resizing and snapshots. Do not rely on a linear span for resilience.
Back up VG metadata before risky changes with vgcfgbackup, and mount volumes by their stable /dev/vg/lv path or filesystem UUID, never by /dev/dm-N.

Comparable toolsWindows — Storage Spaces and the older Logical Disk Manager (LDM): pooled disks, resizable volumes, and thin provisioning at the OS layerZFS / Btrfs — integrated volume management and filesystem in one, with built-in snapshots and checksums instead of a separate LVM layermdadm + partitions — the non-LVM route: RAID for redundancy and plain partitions, trading resizing and snapshots for a simpler stack

Knowledge Check

You run lvextend -L +50G /dev/vg_data/lv_app and df still shows the old size. Why?

Extending the LV does not touch the filesystem on it — you still need resize2fs/xfs_growfs (or you should have passed -r)
The newly added extents are not usable until the whole volume group is deactivated and then reactivated with vgchange -an followed by vgchange -ay
df caches the filesystem capacity in memory and only refreshes it after a remount or a reboot
An LV cannot be extended while it is mounted, so lvextend silently refused the operation

What is the failure mode of an LVM snapshot whose reserved space is too small for the write churn on the original volume?

Once its copy-on-write reserve fills, the snapshot is dropped and marked invalid, so any backup still reading from it fails
Writes to the original volume block and stall until you manually grow the snapshot's reserved space
The snapshot silently starts overwriting the original volume's newest blocks once its reserve is full, so the origin slowly loses recent data while the snapshot keeps serving
LVM automatically extends the snapshot from free volume-group space with no intervention needed

What is the central risk of thin provisioning that a normally-allocated LV does not have?

Volumes advertise more space than the pool actually has, so the pool can run out while filesystems still think they have room, causing failed writes and corruption
Thin volumes cannot be snapshotted at all, which removes LVM's main point-in-time backup workflow
Thin volumes stay read-only until the pool is manually pre-allocated to its full advertised size
Each write to a thin volume is synchronously mirrored to every other volume sharing the same pool, so write throughput is roughly halved as the number of thin volumes in the pool grows

You build a linear LV spanning two PVs across two physical disks. One disk fails. What happens?

The entire LV is lost — a linear span has no redundancy, so any one failed PV takes down the whole volume
Only the data on the failed PV is lost, while extents on the surviving disk stay fully readable
LVM transparently rebuilds the missing extents from parity blocks it keeps on the surviving PV by default
The LV switches to a degraded read-only mode until the failed disk is replaced, with no data lost

You got correct