Topic 04

The Boot Process

Foundations

Booting is the sequence that carries a machine from power-on to a login prompt, through a fixed chain of handoffs: firmware to bootloader, bootloader to kernel, kernel to initramfs, initramfs to the real root, init to the login service. Each stage initializes what it owns and then hands control to the next, and each depends on the one before it having succeeded. On Debian and Ubuntu that chain is UEFI firmware, GRUB 2, the kernel and its initramfs, then systemd as PID 1.

When a server will not boot, the only useful first question is which handoff broke. A blank screen before any menu points at firmware. A bare grub> prompt instead of a menu points at the bootloader. A kernel panic about an unfound root device points at the initramfs or the root= parameter. A system that reaches a shell but never the login prompt points at systemd. Knowing the chain turns a dead machine into a tractable diagnosis instead of a reinstall.

Firmware: BIOS and UEFI

When power is applied, the CPU starts executing firmware stored on the motherboard. The firmware runs a power-on self-test, initializes RAM, the CPU, and basic devices, then looks for something to boot. Legacy BIOS reads the first 512 bytes of a disk — the Master Boot Record — and executes the code it finds there. UEFI, which has replaced BIOS on essentially all server hardware shipped since around 2012, instead reads a FAT32 partition called the EFI System Partition, mounted at /boot/efi on Debian and Ubuntu, and runs a .efi executable named in one of its stored boot entries.

You manage those UEFI boot entries from Linux with efibootmgr. Secure Boot, a UEFI feature, refuses to run a bootloader or kernel that is not signed by a trusted key — Ubuntu and Debian ship a Microsoft-signed shim so a stock install works, but a self-built kernel or an unsigned out-of-tree module such as the proprietary NVIDIA driver is blocked until you enroll your own key with mokutil or disable Secure Boot in firmware. Confirm which mode the machine actually booted in before you assume anything: if /sys/firmware/efi exists, it booted UEFI; if it does not, it booted legacy BIOS.

# List UEFI boot entries and the current boot order
efibootmgr -v
# Present only on a UEFI boot — absent means legacy BIOS
ls /sys/firmware/efi

The Bootloader: GRUB 2

The firmware hands control to a bootloader whose job is to locate a kernel, load it into memory along with its initramfs, and jump to it. On Debian and Ubuntu that bootloader is GRUB 2. Its runtime config lives in /boot/grub/grub.cfg, but that file is generated and you never edit it by hand. The inputs are /etc/default/grub for global settings and the scripts in /etc/grub.d/, and you regenerate the config by running update-grub, a wrapper around grub-mkconfig -o /boot/grub/grub.cfg.

The GRUB menu lists installed kernels plus a recovery entry. Pressing e on a menu entry opens an editor where you can append kernel parameters for a single boot — systemd.unit=rescue.target for single-user repair, or nomodeset when a graphics driver hangs the console. On the Red Hat family the inputs are the same but the regenerate command is grub2-mkconfig -o /boot/grub2/grub.cfg, and persistent defaults are managed through grubby rather than by hand-editing /etc/default/grub.

# /etc/default/grub — edit these, then apply
GRUB_TIMEOUT=5
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

# Regenerate /boot/grub/grub.cfg from the inputs above
sudo update-grub

Kernel and initramfs

GRUB loads two files: the compressed kernel image /boot/vmlinuz-$(uname -r) and a matching /boot/initrd.img-$(uname -r). The kernel decompresses itself, sets up memory management and the scheduler, then mounts the initramfs as a temporary in-memory root filesystem. The initramfs exists to solve a bootstrapping problem: the kernel needs drivers and helpers to reach the real root filesystem — a SATA or NVMe driver, an LVM or RAID assembler, a LUKS unlock for an encrypted disk — but those live on a disk it cannot yet read. The initramfs carries just enough to mount the real root, the kernel pivots onto it, and the initramfs is then freed.

Because the initramfs is a baked snapshot of modules and config, it must be rebuilt whenever those change. After editing /etc/crypttab, adding a storage driver, or changing the module set, run update-initramfs -u on Debian and Ubuntu — the Red Hat family uses dracut -f. Skipping that step is the classic way to produce a system that worked before the reboot and panics with "Cannot open root device" after it. The root= parameter, visible in /proc/cmdline, tells the kernel which device to pivot to, and is best given as a stable root=UUID= rather than a /dev/sda2 name that can shift between boots.

init: systemd Takes Over

Once the real root is mounted, the kernel executes /sbin/init, which on every mainstream distribution today is systemd. It runs as PID 1, the ancestor of every other process, and brings the system up by reading units — files describing services, mount points, sockets, and timers — and resolving their dependencies in parallel rather than running numbered scripts in sequence. That parallelism is the main reason systemd displaced SysV init, where /etc/init.d scripts ran one after another in a fixed order and one slow script stalled the whole boot.

systemd organizes the boot into targets, named groups of units roughly analogous to the old SysV runlevels. It pulls in basic.target, mounts the filesystems listed in /etc/fstab, starts the services wanted by the default target, and stops when that target is reached. You inspect what happened with systemctl and journalctl -b: the boot log is structured and queryable, not a flat text file you grep by hand.

Targets and Recovery

The default target decides where the boot lands. multi-user.target brings up networking and all background services on a text console and is the correct default for a server; graphical.target pulls in multi-user.target and adds a display manager. Check the setting with systemctl get-default and change it with systemctl set-default multi-user.target, which just repoints the default.target symlink. At the end of multi-user.target, systemd starts getty@tty1.service, which runs agetty to print the local login prompt — the console you fall back to when the network is down and SSH is unreachable.

Recovery means booting to a smaller target on purpose. Appending systemd.unit=rescue.target at the GRUB menu gives a single-user root shell with the root filesystem mounted but no network; emergency.target goes further, dropping to a bare shell with almost nothing mounted and the root filesystem typically still read-only, for when even rescue will not come up — remount it writable with mount -o remount,rw / before you can fix anything. The blunt last resort is init=/bin/bash, which skips systemd entirely and hands you a raw shell on the root filesystem — useful for resetting a forgotten root password, useless for anything that needs services running.

Target	State it reaches	Use it for
rescue.target	Root shell, root FS mounted, no network	Single-user repair
emergency.target	Bare shell, almost nothing mounted, root usually read-only	Last-resort recovery
multi-user.target	Network and all services, text console	Servers
graphical.target	multi-user plus a display manager	Desktops

UEFI vs legacy BIOS

UEFI — reads a GPT-partitioned disk, runs a signed .efi bootloader from the FAT32 EFI System Partition, supports Secure Boot, and boots disks larger than 2 TB. This is what essentially all server hardware shipped since around 2012 uses, and what you should assume on any modern install.

Legacy BIOS — reads the 512-byte Master Boot Record at the start of an MBR-partitioned disk and runs the code there, with no signing and a 2 TB disk ceiling from the 32-bit MBR sector count. Choose it only for genuinely old hardware or a deliberate compatibility requirement; new deployments default to UEFI.

Why it matters operationally — the firmware mode dictates the partition table and the bootloader install, so a disk imaged for BIOS will not boot a UEFI machine and the reverse fails too. Confirm the mode with ls /sys/firmware/efi before cloning a disk or reinstalling GRUB, or you produce an unbootable system.

Common Mistakes

Hand-editing /boot/grub/grub.cfg. The file is regenerated on the next kernel update or update-grub run, silently discarding the change — put it in /etc/default/grub instead.
Editing /etc/default/grub but forgetting to run update-grub, so grub.cfg never picks up the change and the next boot behaves exactly as before, leaving you debugging a setting that was never applied.
Changing LUKS, LVM, RAID, or the storage module set without running update-initramfs -u, so the stale initramfs cannot find the root device and the kernel panics with "Cannot open root device" on the next boot.
Hardcoding root=/dev/sda2 when device names can reorder between boots; a wrong root= drops you to an initramfs shell, which a stable root=UUID= avoids.
Running a remote or virtualized server with no serial console and GRUB_TIMEOUT=0, so a failed kernel upgrade leaves no way to pick an older kernel and no console to watch the panic — the box is simply unreachable.
Purging old kernels down to a single entry, so one bad kernel upgrade leaves nothing bootable and forces a rescue boot from external media.

Best Practices

Edit /etc/default/grub and run sudo update-grub to apply changes; never touch the generated grub.cfg directly.
Keep at least 2 known-good kernels installed so a bad upgrade always leaves a fallback entry in the GRUB menu.
Reference the root device as root=UUID= in the kernel command line, not as /dev/sdX, so reboots survive disk reordering.
Run update-initramfs -u after any change to LUKS, LVM, RAID, or the loaded module set, and watch it build without errors before you reboot.
Set GRUB_TIMEOUT to a few seconds and configure a serial console (console=ttyS0) on remote and virtualized servers, so you retain a window and a view to pick an older kernel after a bad update.
Practice GRUB recovery before you need it: press e, append systemd.unit=rescue.target, and learn the path to a single-user shell while the machine is healthy.
Inspect the current boot with journalctl -b and a prior one with journalctl -b -1 to find which unit failed or where startup stalled.

Comparable toolssystemd-boot — a minimal UEFI-only boot manager, an alternative to GRUB 2 on UEFI systems where you do not need legacy BIOS or advanced scriptingU-Boot — the de facto bootloader for embedded and ARM boards, where firmware and bootloader responsibilities blur togetherWindows Boot Manager — the single-vendor equivalent: bootmgr and BCD entries on the same EFI System Partition, where GRUB chainloads into it for dual-boot

Knowledge Check

You added a LUKS-encrypted disk and edited /etc/crypttab, but the next reboot panics with "Cannot open root device." What was almost certainly skipped?

Running update-initramfs -u, so the initramfs lacks the crypt modules and tools needed to reach the root device before the real root mounts
Running update-grub to regenerate the boot menu so the newly added encrypted disk gets its own entry in the list alongside the existing root partition
Adding a new UEFI boot entry with efibootmgr so the firmware points at the encrypted disk and unlocks it before handing off to GRUB
Setting a longer GRUB_TIMEOUT in /etc/default/grub so the passphrase unlock prompt has enough time to appear at boot

A teammate hand-edited /boot/grub/grub.cfg to add a kernel parameter. After the next kernel update the parameter is gone. Why?

grub.cfg is generated from /etc/default/grub and /etc/grub.d/, and the kernel update ran update-grub, which overwrote the manual edit
Kernel updates reset every file under /boot back to packaged defaults, wiping the hand-added parameter along with all other local edits
Secure Boot inspected the command line, found the added parameter unsigned, and stripped it out during signature verification
systemd reverted the edit when it reached multi-user.target and reconciled boot configuration against its units

Why does a server need an initramfs at all, rather than the kernel mounting the root filesystem directly?

The kernel may need drivers or helpers — an NVMe driver, an LVM assembler, a LUKS unlock — that live on the very disk it cannot read yet, so the initramfs carries them in RAM
The initramfs holds the GRUB boot menu and the kernel command-line parameters, which the kernel reads back out of it at startup before it has mounted any real disk filesystem
The kernel itself cannot run as PID 1, so the initramfs supplies the very first userspace process until systemd is later started
The initramfs is where systemd's boot targets and unit files are permanently stored before the real root filesystem appears

A remote VM with a single installed kernel becomes unbootable after an upgrade, and it has GRUB_TIMEOUT=0 and no serial console. What is the design lesson?

Keep a fallback kernel, a non-zero GRUB timeout, and a serial console, so a bad upgrade leaves both an older entry to pick and a console to see the menu
Set the default systemd target to emergency.target so that every boot deterministically drops the VM straight into a single-user recovery shell instead of starting normally
Disable Secure Boot in firmware, since signature checks are what block an upgraded kernel from booting on a remote VM
Switch the VM from UEFI to legacy BIOS firmware, since BIOS always exposes a recoverable boot menu after a failed upgrade

You got correct