init and systemd
init is the first user-space process the kernel starts after it mounts the root filesystem, and it is always PID 1. The kernel hands it control and walks away; from that moment everything else on the machine — your shell, sshd, the database, every cron job — descends from PID 1. On every current Debian, Ubuntu, RHEL, Fedora, and SUSE release, the program that fills that slot is systemd, which is far more than a process launcher: it is a service manager that tracks dependencies, supervises long-running daemons, captures their logs, and confines them with cgroups and kernel namespaces.
That makes PID 1 the single most consequential process on the box. It reaps orphaned children so they do not pile up as zombies, it decides what starts in what order at boot, and it restarts a daemon that crashes if you told it to. When you write a unit file, set a restart policy, or run systemctl, you are configuring this one process — and a misconfiguration here does not crash one service, it can leave the machine unable to boot or refusing to come back after a reboot you cannot undo remotely.
PID 1 and the init Contract
PID 1 has two non-negotiable duties that the kernel itself depends on. First, it must reap dead children: when any process exits, its parent is supposed to call wait() to collect the exit status, and if the parent has already died, the orphaned process is re-parented to PID 1, which must reap it or the system slowly fills with zombie entries that consume PIDs. Second, PID 1 cannot die — if it exits, the kernel panics with Attempted to kill init and the machine halts. Everything systemd does on top is optional; these two properties are the contract.
This contract is exactly why a bare process makes a terrible PID 1, and why containers are a recurring trap. A shell script or an application started as PID 1 in a minimal container does not reap re-parented children and often ignores SIGTERM, so signals from docker stop are dropped and zombies accumulate. The fix is a tiny init shim — tini (shipped as Docker's --init flag) or dumb-init — that does the reaping and signal-forwarding the contract demands, without dragging full systemd into the image.
SysV init and Runlevels
Before systemd, the dominant scheme was SysV init: PID 1 read /etc/inittab, picked a runlevel (a single digit, 0 through 6), and ran the numbered shell scripts in the matching /etc/rcN.d/ directory in lexical order. Runlevel 0 was halt, 6 was reboot, 1 was single-user, and 2 through 5 were multi-user variants — though the meaning of 2–5 differed between Debian and Red Hat, which was a perennial source of confusion. Each service shipped a script in /etc/init.d/ accepting start, stop, restart, and status.
The model had two structural weaknesses that systemd was built to fix. Startup was strictly sequential — S01 finished before S02 began, so a slow mount stalled everything behind it — and the scripts had no real supervision: once a script forked a daemon and returned, init lost track of it, so a crashed service stayed dead and status often just checked a stale PID file. systemd keeps a thin compatibility layer that generates units from leftover /etc/init.d/ scripts, but new services should never be written this way.
The systemd Model: Units, Targets, and the Journal
systemd organizes the system into units, each a small declarative file describing one manageable object. The common types are .service (a daemon or one-shot command), .socket (a listening socket that starts its service on first connection), .timer (a cron replacement), .mount and .automount (filesystems), and .target (a named group used as a synchronization point). Targets replace runlevels: multi-user.target is the headless server state, graphical.target adds a display, and the symlink default.target selects which one boot aims for.
# where the running state lives, and the order systemd searches /usr/lib/systemd/system/ # distro-shipped units (do not edit) /etc/systemd/system/ # your overrides + custom units (wins) /run/systemd/system/ # volatile, runtime-generated units # inspect and control systemctl status nginx.service systemctl list-units --type=service --state=running systemctl get-default # prints the active default.target systemctl cat nginx.service # show the effective unit + drop-ins
Logging is unified through the journal: every line a service writes to stdout and stderr is captured by journald, tagged with the unit name, PID, timestamp, and boot ID, and stored as a structured binary log. You read it with journalctl -u nginx, scope it to the current boot with journalctl -b, or follow it live with journalctl -f. The default Storage=auto keeps the journal persistent in /var/log/journal when that directory exists and volatile in /run/log/journal otherwise; on Debian and Ubuntu that directory is absent by default, so the journal is volatile and lost on reboot. Setting Storage=persistent in /etc/systemd/journald.conf creates /var/log/journal and makes the log survive, which you want on any server you expect to debug after a crash.
cgroups and Service Supervision
systemd's real advantage over SysV is that it knows exactly which processes belong to a service, because it places each service in its own control group. Every process a daemon forks — workers, helper scripts, runaway children — stays inside that cgroup, so systemctl stop can signal the entire tree and systemctl kill guarantees nothing is left running. This is also where resource limits live: MemoryMax=512M, CPUQuota=50%, and TasksMax= are enforced by the kernel cgroup, not by fragile per-process ulimit settings.
Supervision is declarative. Restart=on-failure brings a crashed daemon back; Restart=always brings it back even after a clean exit. The rate limiter StartLimitIntervalSec= and StartLimitBurst= stops a crash loop from hammering the machine — exceed the burst and systemd marks the unit failed and stops trying. The Type= directive tells systemd how to know the service is actually up: simple assumes readiness the instant systemd executes the process, while notify waits for the daemon to send sd_notify(READY=1), which is the only honest readiness signal and what dependent units should wait on.
# a minimal, well-behaved service unit [Unit] Description=Widget API After=network-online.target Wants=network-online.target [Service] Type=notify ExecStart=/usr/local/bin/widget --port 8080 Restart=on-failure RestartSec=2 MemoryMax=512M [Install] WantedBy=multi-user.target
Boot Sequence and Dependency Ordering
At boot, systemd pulls in default.target and walks its dependency graph, starting everything it can in parallel. Two directives govern this, and conflating them is the classic systemd mistake. Wants= and Requires= express what must be pulled in — a requirement, with Requires= being hard (if the dependency fails, this unit fails too) and Wants= being soft. Before= and After= express ordering only — they say nothing about whether a unit is started, just the sequence if both happen to be active.
The practical consequence: a unit with After=postgresql.service but no Wants= or Requires= will start after Postgres if Postgres is being started anyway, and will start immediately otherwise. Network ordering is the most common trap of all — After=network.target only means the networking stack has been configured, not that an interface has an address, so a service that binds to a specific IP must use After=network-online.target together with Wants=network-online.target. On Debian and Ubuntu, that online state is satisfied by systemd-networkd-wait-online or Netplan's renderer; on RHEL it is NetworkManager-wait-online.service. Inspect the whole chain with systemd-analyze critical-chain and find what is slowing boot with systemd-analyze blame.
SysV init — sequential shell scripts in /etc/init.d/ driven by runlevels, with no real supervision once a daemon forks. You will only meet it on legacy systems and inside the systemd compatibility layer; do not write new services against it.
systemd — declarative units, parallel boot, cgroup-based supervision, integrated logging and timers. The default on every mainstream distribution and the assumption behind the rest of this chapter; choose it unless an external constraint forbids it.
OpenRC / runit — lightweight alternatives kept by Alpine, Gentoo, Void, and Devuan. OpenRC adds dependency-based ordering on top of init scripts; runit is a minimal supervision suite with near-instant restarts. Reach for these on tiny container images or systems where a full systemd footprint is unwanted, accepting that you lose journald, timers, and unit semantics.
- Confusing
Wants=/Requires=withAfter=/Before=— adding onlyAfter=db.serviceand expecting the database to be pulled in. Ordering directives never start anything; the service races ahead and fails to connect because nothing requested the dependency. - Using
After=network.targetfor a daemon that binds to a specific address. That target only means networking is configured, not that an interface has an IP, so the bind fails at boot; you neednetwork-online.targetwith a matchingWants=. - Editing a distro-shipped unit in
/usr/lib/systemd/system/directly. The next package update silently overwrites it; usesystemctl editto create a drop-in under/etc/systemd/system/that survives upgrades. - Running a raw application or shell script as PID 1 in a container. It does not reap re-parented children and usually ignores
SIGTERM, so zombies accumulate anddocker stophangs for the full 10-second grace period before aSIGKILL. - Setting
Restart=alwayswith noStartLimitBurst=on a daemon that fails fast. The service enters a tight crash loop, spinning CPU and flooding the journal instead of failing cleanly and alerting you. - Assuming
systemctl enablealso starts the service now. It only sets the boot-timeWantedBysymlink; the daemon stays down until the next boot unless you also runsystemctl startor useenable --now. - Forgetting
systemctl daemon-reloadafter changing a unit file. systemctl keeps acting on the cached version, so your edit appears to have no effect and you chase a phantom bug.
- Customize units with
systemctl edit unit, which writes a drop-in to/etc/systemd/system/unit.d/override.confinstead of touching the packaged file — your change then survives every package upgrade. - Pair
After=network-online.targetwithWants=network-online.targetfor any service that binds to a fixed IP or dials out at startup; never rely onnetwork.targetalone. - Set
Type=notifyand emitsd_notify(READY=1)from daemons you control, so dependent units start only when the service is genuinely ready rather than the instant it forks. - Bound every restarting service with
StartLimitIntervalSec=andStartLimitBurst=so a crash loop trips intofailedand surfaces, instead of hammering the box forever. - Cap resources in the unit with
MemoryMax=,CPUQuota=, andTasksMax=; the kernel cgroup enforces them across the whole process tree, which per-processulimitcannot. - Enable persistent logging by setting
Storage=persistentinjournald.conf, which creates/var/log/journalif it is missing, so post-crash investigation has the lines that explain the crash instead of the volatile default that vanishes on reboot. - Run
systemctl daemon-reloadafter every unit edit, thensystemd-analyze verify unitto catch directive typos before the service refuses to start.
rc.d and OpenBSD's rc, ordered shell scripts much like SysV init, with no cgroup-style supervisionKnowledge Check
A unit declares only After=postgresql.service and nothing else about Postgres. What actually happens at boot?
- If Postgres is being started anyway it goes first; if nothing else pulls Postgres in, the unit starts immediately without it, because
After=orders but never requests a dependency - systemd blocks and refuses to start the unit until
postgresql.serviceis fully running first, treating the bareAfter=line as a hard requirement that must be satisfied before launch - Postgres is automatically pulled into the boot transaction and started, since
After=implies the dependency must exist - The unit fails immediately with a missing-dependency error at boot if
postgresql.serviceis not already active
Why is running a plain application as PID 1 in a container a problem?
- It does not reap re-parented orphans and often ignores
SIGTERM, so zombies pile up anddocker stophangs until the grace period forces aSIGKILL - The kernel forbids any process other than systemd from ever holding PID 1 in a namespace, so the container refuses to start and exits with an error at once
- A process running as PID 1 cannot open network sockets, so the application is left unreachable from outside the container
- Logs written by PID 1 are never captured anywhere, because journald requires a separate init process to exist first
You change ExecStart= in a service unit and the new command does not take effect. What is the most likely cause?
- You did not run
systemctl daemon-reload, so systemctl is still acting on the cached copy of the unit - A unit file becomes read-only after the first boot and must be deleted and then recreated from scratch to change it
- Changes to
ExecStart=require a full reboot, because the command is compiled into the initramfs boot image - The change is silently ignored unless you also bump a version number directive in the
[Unit]section
A daemon binds to a fixed IP and fails at boot but works when started by hand a minute later. What is the correct fix?
- Order it after
network-online.targetand addWants=network-online.target, becausenetwork.targetonly means networking is configured, not that an interface has an address - Add only
After=network.targetto the unit, which by itself already guarantees that every interface has been fully brought up and has its routable static IP address assigned - Set
Restart=alwayswith a short delay so the unit keeps retrying until the network happens to be up - Move the unit file into
/usr/lib/systemd/system/so that it is read earlier in the boot sequence
What does placing each service in its own cgroup give systemd that SysV init lacked?
- Exact accounting of every process the service forked, so stop and kill act on the whole tree and resource limits like
MemoryMax=are enforced by the kernel - Measurably faster process creation, because membership in a cgroup lets a service bypass the normal
fork()path - Automatic encryption of each service's resident memory pages by the kernel, so that one process can never read another service's in-memory data
- The ability to run services entirely without a PID of their own, removing the need to ever reap their exited orphan child processes at all under any real load
You got correct