Software RAID
Software RAID combines two or more block devices into a single logical array that survives disk failure, increases throughput, or both, using the kernel's md (multiple device) driver instead of a dedicated hardware controller. On Debian and Ubuntu you create and manage arrays with the mdadm tool; the array appears as /dev/md0 and you partition, format, or hand it to LVM exactly like a physical disk.
Because the parity and mirroring work happens in the kernel on your CPU, software RAID has no proprietary on-disk format and no controller to fail or to source a matching replacement for. The operational consequence is concrete: a failed disk in a RAID 1 or RAID 5 array keeps serving reads and writes in degraded mode, but you remain responsible for noticing the failure, swapping the disk, and triggering the rebuild — none of that is automatic, and a second failure during a rebuild can lose the whole array.
RAID Levels and Their Trade-offs
The md driver implements the standard levels plus a Linux-specific RAID 10. RAID 0 stripes data across all members for capacity and speed with zero redundancy — one disk lost means the array is gone. RAID 1 mirrors every block to all members, so an N-disk mirror tolerates N−1 failures but gives you only one disk of usable capacity. RAID 5 stripes data with one parity block per stripe and tolerates a single failure; RAID 6 carries two parity blocks and tolerates two.
| Level | Min disks | Usable capacity | Failures tolerated | Typical use |
|---|---|---|---|---|
| RAID 0 | 2 | 100% | 0 | Scratch, caches, rebuildable data |
| RAID 1 | 2 | 1 disk | N−1 | Boot and root volumes |
| RAID 5 | 3 | N−1 disks | 1 | Capacity with some redundancy |
| RAID 6 | 4 | N−2 disks | 2 | Large arrays, long rebuilds |
| RAID 10 | 4 | 50% | 1 per mirror set | Databases, mixed read/write |
RAID 5 carries a hidden cost called the write penalty: every small write must read the old data and old parity, recompute, and write both back, turning one logical write into four I/O operations. On large arrays of multi-terabyte disks, RAID 6 is preferred over RAID 5 because rebuild times now run for many hours, and the probability of a second disk or an unrecoverable read error appearing during that window is no longer negligible.
Creating and Inspecting an Array
You assemble an array from whole disks or partitions with mdadm --create. Using GPT partitions of type Linux RAID rather than raw disks makes the members self-describing and avoids confusion when a disk is moved between machines. After creation the kernel begins an initial sync, and the array is usable — though slower — while that sync runs.
# create a 3-disk RAID 5 array from partitions mdadm --create /dev/md0 --level=5 --raid-devices=3 \ /dev/sdb1 /dev/sdc1 /dev/sdd1 # watch the initial sync and overall state cat /proc/mdstat mdadm --detail /dev/md0
The file /proc/mdstat is the fastest health check: it lists each array, its level, its members, and a status line such as [UU_] where each U is an up member and an underscore is a missing one. mdadm --detail adds the array UUID, the sync percentage, and the per-device roles you need before replacing hardware.
Persistent Assembly and the Boot Path
An array that works after --create will not reassemble after a reboot unless its definition is recorded. Write the array's UUID into /etc/mdadm/mdadm.conf (the path is /etc/mdadm.conf on Red Hat and Fedora), then refresh the initramfs so the array can be assembled early enough to mount the root filesystem.
# append the running array's definition to the config mdadm --detail --scan | tee -a /etc/mdadm/mdadm.conf # Debian/Ubuntu: rebuild the initramfs so md0 assembles at boot update-initramfs -u # Red Hat/Fedora equivalent: dracut -f
Reference the array in /etc/fstab by its filesystem UUID from blkid, never as /dev/md0, because the kernel may enumerate an unconfigured array as /dev/md127 after a reboot. If you forget to update the initramfs on a root-on-RAID system, the machine drops to an initramfs shell because the root device cannot be assembled.
Failure Handling and Rebuilds
When a disk fails, the array continues in degraded mode and the md driver marks the member faulty. You replace it by failing and removing the bad device, partitioning the new disk identically, and adding it back; the rebuild starts automatically. Configure mdadm --monitor with a MAILADDR line in the config so a failure pages you instead of waiting to be discovered.
# fail, remove, then add the replacement disk mdadm /dev/md0 --fail /dev/sdc1 --remove /dev/sdc1 mdadm /dev/md0 --add /dev/sde1 # cap rebuild speed so it does not starve production I/O echo 50000 > /proc/sys/dev/raid/speed_limit_max
Add a hot spare with --add-spare so a rebuild begins the instant a disk is marked faulty, before anyone reads the alert. Schedule a periodic check scrub — Debian and Ubuntu ship a monthly cron job for this — to read every block and detect silent unrecoverable read errors while the array is still redundant, rather than discovering them mid-rebuild when there is no longer any parity to reconstruct from.
RAID Is Availability, Not Backup
RAID protects against disk hardware failure, and nothing else. It happily mirrors or stripes an rm -rf, a corrupting application bug, a ransomware encryption pass, or a bad write to all members at once. A controller-free mirror also does nothing for fire, theft, or a filesystem that corrupts its own metadata.
Keep independent, versioned, off-host backups regardless of RAID level, and test restores. Treat RAID as an availability mechanism that buys time to replace a disk without downtime, not as a recovery mechanism for data you deleted or corrupted. Parity levels also carry a write-hole risk: if power is lost mid-stripe, data and parity can disagree, which is why a battery-backed cache or a journaled array matters for RAID 5 and 6.
RAID 5 — single parity, survives one failure, gives N−1 disks of capacity. The cheapest redundant level, but it carries a four-I/O write penalty and a rebuild that re-reads every surviving disk. Reasonable only for small arrays of modest disks.
RAID 6 — double parity, survives two failures, gives N−2 disks of capacity. Choose it for any large array of multi-terabyte disks, where the multi-hour rebuild window makes a second failure or an unrecoverable read error likely.
RAID 10 — mirrored stripes, 50% usable capacity, no parity math. Choose it for databases and write-heavy workloads where low latency and a fast rebuild (copying one mirror member, not recomputing parity across the whole array) matter more than capacity efficiency.
- Creating the array but never running
mdadm --detail --scaninto/etc/mdadm/mdadm.conf, so after a reboot the array reappears as/dev/md127and any/dev/md0entry in fstab fails to mount. - Forgetting
update-initramfs -uon a root-on-RAID system, which leaves the rebuilt config out of the initramfs and drops the machine to an emergency shell on the next boot. - Never configuring
mdadm --monitoror aMAILADDR, so a disk fails silently and the array runs degraded for weeks until a second failure destroys it. - Building large RAID 5 arrays from multi-terabyte disks, where a multi-hour rebuild plus a single unrecoverable read error on a surviving disk takes the whole array down.
- Skipping the periodic
checkscrub, so latent bad sectors stay hidden until they are needed during a rebuild and the reconstruction fails. - Reusing disks that still carry an old
mdsuperblock without runningmdadm --zero-superblock, causing the kernel to auto-assemble a stale array over your new one. - Building an array from mismatched disk sizes, which wastes the excess on every larger member because the array sizes itself to the smallest device.
- Record every array with
mdadm --detail --scan | tee -a /etc/mdadm/mdadm.confand then runupdate-initramfs -uimmediately after creating it. - Mount arrays by filesystem UUID from
blkidin/etc/fstab, never by the/dev/md0kernel name, which is not stable across reboots. - Enable email alerts by setting
MAILADDRinmdadm.confand runningmdadm --monitorso a faulty disk reaches you the same day. - Choose RAID 6 or RAID 10 over RAID 5 for any array of large disks so it survives a failure during the long rebuild window.
- Assign at least one hot spare with
mdadm --add-spareso rebuilds begin automatically the moment a member is marked faulty. - Leave the monthly
checkscrub cron job enabled and review its results, so latent bad sectors surface while the array is still redundant. - Keep independent off-host backups regardless of RAID level, and test restores — RAID survives a dead disk, not a bad delete.
Knowledge Check
You need a root volume that keeps booting after one disk dies, with no parity write penalty and the simplest possible recovery. Which level fits best?
- RAID 1, which mirrors every block so a survivor can serve the system unchanged
- RAID 0, which stripes blocks across both disks for speed but loses everything the moment one disk fails
- RAID 5, which spreads parity for capacity but adds a write penalty and a slow rebuild
- RAID 6, which needs at least four disks and is aimed at large capacity arrays
After creating /dev/md0 with mdadm --create, the array vanishes and comes back as /dev/md127 after a reboot. What was skipped?
- Writing the array definition into
/etc/mdadm/mdadm.confand rebuilding the initramfs - Running the initial resync to completion, which the kernel kicks off automatically on create anyway
- Formatting the array with a filesystem and writing its label before the first reboot
- Adding a dedicated hot spare to the array with
mdadm --add-spare
Why is RAID 6 preferred over RAID 5 for large arrays of multi-terabyte disks?
- Its second parity block lets the array survive an unrecoverable read error during the long single-disk rebuild
- It carries no write penalty at all, unlike RAID 5 which has to read, recompute, and rewrite parity on every single stripe update
- It rebuilds a failed disk faster than RAID 5 because it has to read fewer surviving members
- It needs only two disks for double parity, making it cheaper to deploy than RAID 5
A teammate argues the RAID 1 mirror makes nightly backups unnecessary. What is the flaw?
- RAID replicates deletions, corruption, and ransomware to every member; it only protects against disk hardware failure
- RAID 1 cannot actually survive a single disk failure, so off-host backups are still required
- Mirrors silently drop a fraction of writes, so the two copies drift apart over time without warning
- The mirror turns read-only the moment it is degraded, so it refuses to store any new data until the failed member has been replaced and resynced
You got correct