Topic 30

The Process Model

ProcessesConcept

A process is a running instance of a program: its own virtual address space, its own table of open file descriptors, a numeric process ID (PID), and the credentials and scheduling state the kernel tracks for it. Two copies of the same binary are two processes with two PIDs and two address spaces — they share the program's read-only code pages in RAM, but nothing else. The kernel keeps one of these structures per task and that structure, not the file on disk, is what ps and top report on.

Linux creates new processes with a deliberately odd two-step: fork() clones the current process, then exec() replaces the clone's program image with a new one. Every process except PID 1 has a parent, so the running system is a single tree rooted at init. That tree is not trivia — it decides who is responsible for collecting a child's exit status, what happens to a process whose parent dies, and why a misbehaving daemon leaves a trail of zombies instead of disappearing cleanly.

fork, exec, and wait

There is no single "run this program" call on Linux. To start ls, your shell calls fork(), which returns twice — once in the parent with the child's PID, once in the child with 0. The child, now a near-identical copy, calls exec() to overlay itself with the ls binary; the parent's code keeps running. The copy is cheap because the kernel uses copy-on-write: parent and child share the same physical pages until one of them writes, at which point only the touched page is duplicated.

After the child exits, it does not vanish. The kernel keeps a small stub — exit code, timing, resource usage — until the parent calls wait() (or waitpid()) to read it. That read is called reaping. A parent that forks children and never waits leaves those stubs piling up; a parent that waits promptly keeps the table clean. This is why a shell prints the exit status the instant a foreground command finishes, and why $? has a value to give you.

# trace the fork/exec a shell does to run one command
strace -f -e trace=clone,execve,wait4 -- sh -c 'ls /tmp' 2>&1 | grep -E 'clone|execve|wait4'
# clone(...)  = 4123   ← fork: child PID returned to the parent
# execve("/usr/bin/ls", ...) = 0   ← child replaces its image with ls
# wait4(-1, ...) = 4123  ← parent reaps the child's exit status

The Process Tree and PID 1

PID 1 is the first user-space process the kernel starts, and on Debian, Ubuntu, and the RHEL family that is systemd. It is special in two ways: it cannot be killed by ordinary signals, and it is the universal foster parent. When any process exits while it still has living children, those children are re-parented — adopted by PID 1 (more precisely, by the nearest "subreaper", which on a default system is PID 1). That adoption is what makes the tree a single connected structure rather than a forest of orphans.

pstree -p renders the whole tree with PIDs and makes the parent/child structure obvious: systemd at the root, a branch for sshd down to your login shell, another for every service unit. Reading it top-down answers questions a flat ps list cannot — which process spawned this runaway worker, and whether a daemon double-forked away from its launcher.

# show the tree with PIDs; everything descends from systemd (PID 1)
pstree -p 1 | head -n 5
# systemd(1)─┬─sshd(812)───sshd(1450)───bash(1452)───pstree(1499)
#            ├─nginx(901)───nginx(902)
#            └─systemd-journ(440)

Process States

Every process is in exactly one state, shown in the STAT column of ps and the S column of top. Running or runnable is R: either on a CPU or waiting only for one. Sleeping is S — interruptible sleep, the normal idle state of a daemon waiting for work, which a signal can wake. Stopped is T: suspended by SIGSTOP or SIGTSTP (what Ctrl-Z does), frozen until SIGCONT.

Two states deserve special caution. Uninterruptible sleep, D, means the process is blocked inside a kernel call that cannot be interrupted — almost always disk or NFS I/O. A process in D ignores every signal, including SIGKILL, because the kernel will not abandon it mid-operation; it leaves D only when the I/O completes or fails. Zombie, Z, is the opposite: the process is already dead, holding no memory or CPU, existing only as an unreaped exit-status stub.

Code	State	What it means operationally
`R`	Running / runnable	On a CPU or queued for one; consuming or about to consume CPU.
`S`	Interruptible sleep	Idle, waiting for an event; a signal wakes it. The normal state of most daemons.
`D`	Uninterruptible sleep	Blocked in kernel I/O; ignores all signals, even `SIGKILL`, until the I/O returns.
`T`	Stopped	Suspended by a signal (Ctrl-Z / `SIGSTOP`); resumes only on `SIGCONT`.
`Z`	Zombie	Already exited; only the exit-status stub remains until the parent reaps it.

Zombies and Orphans

A zombie is a finished process whose parent has not yet called wait(). It costs no CPU and no memory — only one slot in the kernel's process table and its PID. A handful of transient zombies is normal; the problem is a buggy parent that forks continuously and never reaps, because the table is finite and a flood of zombies eventually exhausts available PIDs and blocks new process creation fleet-wide. You cannot kill a zombie — it is already dead. You fix the parent (so it reaps) or kill the parent, at which point PID 1 adopts the zombies and reaps them immediately.

An orphan is the live counterpart: a still-running process whose parent exited first. Orphans are harmless — they keep running, and PID 1 (or the nearest subreaper) becomes their new parent, guaranteeing someone will reap them when they finally exit. Daemons exploit this on purpose: the classic double-fork detaches a service from the shell that launched it, letting the intermediate parent exit so the daemon is re-parented to init and survives the terminal closing. The two failure modes are mirror images — a zombie is dead-but-unreaped, an orphan is alive-but-reparented.

Zombie vs Orphan

Zombie — a process that has already exited but whose parent has not reaped it. State Z, no CPU or memory, just a PID and an exit-status stub. Harmless in small numbers; a leak only if a parent forks endlessly without calling wait(), which can exhaust the PID space.

Orphan — a process still running after its parent exited. It is immediately re-parented to PID 1 (or the nearest subreaper), so it keeps working and is guaranteed to be reaped on exit. Not a bug — it is the mechanism daemons use to detach from their launching shell.

Common Mistakes

Writing a long-running parent that forks workers but never calls wait()/waitpid() — the dead children accumulate as Z entries until the process table fills and the host can no longer fork anything.
Trying to kill -9 a zombie to "clean it up" — it is already dead and ignores signals; the only fix is making the parent reap, or killing the parent so PID 1 adopts and reaps it.
Hammering a process stuck in D (uninterruptible sleep) with SIGKILL and expecting it to die — it cannot receive signals until its kernel I/O completes; the real problem is the storage or NFS mount it is blocked on.
Running a service as a bare ./app & from a shell instead of under a supervisor, so when the shell exits the process is orphaned to PID 1 with no one to restart it or read its exit code.
Assuming a PID uniquely identifies one process forever — PIDs are recycled after a process exits, so a stale PID saved earlier can later point at an unrelated process you then signal by accident.
Reading a high process or thread count as inherently unhealthy and killing things — a busy server legitimately runs thousands of tasks; R-state count and load average tell you about pressure, the raw total does not.

Best Practices

Run daemons under a real supervisor — a systemd service unit as PID 1's child — so reaping, restart-on-crash, and log capture are handled instead of relying on hand-rolled double-forks.
When you write code that forks, call waitpid() for every child (or install a SIGCHLD handler that reaps), so children never linger as zombies.
Read the STAT column before acting: a D means investigate I/O, not send a bigger signal; a Z means look at the parent, not the zombie.
Use pstree -p to find the true parent of a runaway worker before killing it — killing the parent often takes the children with it more cleanly than chasing each child PID.
Reference processes by a fresh lookup (pgrep by name plus a sanity check) rather than a long-lived saved PID, to avoid signaling a recycled PID.
For thin containers that run an app as PID 1, give them a minimal init (--init / tini) so orphaned grandchildren inside the container still get reaped.

Comparable toolsWindows — CreateProcess builds a new process from a binary in one call; there is no fork, so the copy-then-replace model and zombie/orphan semantics do not applymacOS — the same POSIX fork/exec/wait model and process tree, with launchd as PID 1 instead of systemdsystemd — the PID 1 implementation on most Linux servers; the subreaper that adopts orphans and the supervisor that reaps service children

Knowledge Check

A monitoring daemon shows hundreds of processes in state Z that keep growing. What is the correct fix?

Fix or restart the parent process so it reaps its children; killing the parent makes PID 1 adopt and immediately reap the zombies
Send SIGKILL to each zombie PID in turn so the kernel is forced to tear down the leftover process entries and reclaim their table slots
Raise the system-wide PID limit with kernel.pid_max so the zombies have room to keep accumulating without exhausting the table
Renice the zombies to the lowest priority of 19 so the scheduler stops handing them CPU time and lets them drain

A backup process is stuck in state D and ignores kill -9. Why?

It is in uninterruptible sleep inside a kernel I/O call and cannot receive any signal, including SIGKILL, until that I/O completes or fails
It is a zombie that has already exited and become an unkillable exit stub, so no further signal can act on it
Only its direct parent is permitted to signal a process sitting in D-state, and here that parent has already exited and left the child unreachable
It has registered a custom SIGKILL handler that intercepts and swallows the signal before the kernel can act on it

What is the practical difference between an orphan and a zombie?

An orphan is still running and gets re-parented to PID 1; a zombie has already exited and is waiting to be reaped
An orphan has already exited and lingers as a stub; a zombie is still running normally but has lost its parent to an early exit
Both have already died, but an orphan still holds its full memory image while a zombie has released everything
An orphan consumes CPU while waiting for adoption; a zombie consumes a CPU core until reaped

Why does Linux create a new process with fork() followed by exec() rather than a single call?

fork() cheaply clones the process via copy-on-write, giving the child a window to adjust file descriptors and environment before exec() overlays the new image
fork() loads the binary into memory while exec() assigns the new PID; both calls are required in sequence before the kernel will register the process in its table
It duplicates the program twice so that if exec() fails the original copy keeps running the new binary
fork() first runs the program in kernel space and then exec() moves its execution out into user space once it is loaded

You got correct