Topic 31

Signals

FoundationsConcept

A signal is an asynchronous notification the kernel delivers to a process to tell it something happened or to ask it to change state. Each signal has a number and a default disposition — terminate, terminate with a core dump, stop, continue, or ignore — and a process can override most of those by installing a handler. When you press Ctrl-C, when systemd stops a unit, when a child exits, or when a process segfaults, the mechanism underneath is the same: a small integer raised against a target PID.

Signals are the primary lever for process lifecycle control on Linux, and the operational cost of misusing them is real. Send the wrong one and a database flushes nothing to disk before dying; send the right one and the process refuses to budge because it is wedged in uninterruptible sleep. Knowing which signal a process can catch, which it cannot, and what the default does is the difference between a clean shutdown and a corrupted data file.

Signal Types and Numbers

Each signal has a symbolic name (SIGTERM) and a number (15), but the numbers below 32 are not fully portable across architectures — SIGUSR1 is 10 on x86-64 and 30 on some others. Use the names. The handful you will actually reach for are termination and job-control signals plus the two reserved for application use.

Signal	No.	Default	What it is for
`SIGTERM`	15	Terminate	Polite "please exit" — catchable, the default of `kill`
`SIGKILL`	9	Terminate	Forced kill by the kernel — cannot be caught or ignored
`SIGINT`	2	Terminate	Interrupt from the keyboard — Ctrl-C
`SIGHUP`	1	Terminate	Controlling terminal closed; many daemons repurpose it as "reload config"
`SIGQUIT`	3	Core dump	Quit from the keyboard — Ctrl-\, leaves a core file
`SIGSTOP`	19	Stop	Suspend the process — cannot be caught or ignored
`SIGCONT`	18	Continue	Resume a stopped process
`SIGUSR1` / `SIGUSR2`	10 / 12	Terminate	No kernel meaning; whatever the application defines
`SIGCHLD`	17	Ignore	A child stopped or exited — the parent reaps it here

Above the standard set sit the real-time signals, SIGRTMIN through SIGRTMAX (34 to 64 on Linux). Unlike standard signals, they are queued rather than coalesced — if you send the same standard signal three times before the process handles it, it may see only one; real-time signals deliver all three, in order, and can carry a small integer payload. Most administration work never touches them, but they are why a process can distinguish "ten of these events happened" from "at least one happened".

Sending Signals

The shell sends signals constantly without naming them: Ctrl-C raises SIGINT against the foreground process group, Ctrl-Z raises SIGTSTP to suspend it, and closing the terminal raises SIGHUP. From the command line, kill is the explicit tool, and despite the name it sends whatever signal you ask for — kill with no flag sends SIGTERM, not SIGKILL.

There are three ways to address a target: kill by PID, pkill and killall by name, and kill %n by shell job number. The name-based tools are the dangerous ones — pkill -f nginx matches against the full command line and will happily signal every process whose arguments mention nginx, including your editor with nginx.conf open. Always rehearse the match with pgrep first.

kill 4123              # SIGTERM to PID 4123 (the default)
kill -TERM 4123        # the same, named explicitly
kill -9 4123           # SIGKILL — last resort
kill -HUP 4123         # ask a daemon to reload
kill -0 4123           # send nothing; just test that the PID exists and you may signal it
kill -l                # list every signal name and number

pgrep -f -u www-data 'php-fpm'   # preview the matches BEFORE acting
pkill -HUP -x nginx              # exact-name match only, no substring surprises

A signal to a negative PID targets a whole process group, which is how the shell stops an entire pipeline at once. The kill -0 form sends no signal at all — it only checks whether the PID exists and whether you have permission to signal it, which is the correct, race-free way to ask "is this process still alive?" from a script.

Dispositions and Handlers

For every signal a process has a disposition: run the default action, ignore the signal, or run a handler the program installed. A well-written daemon catches SIGTERM to flush buffers and close sockets, catches SIGHUP to re-read its config without restarting, and ignores SIGPIPE so a closed client connection does not kill it. The default disposition is what you get when no handler is installed — which for most signals is "terminate".

SIGKILL (9) and SIGSTOP (19) are the two exceptions the kernel enforces: they cannot be caught, blocked, or ignored. That is by design — it guarantees an administrator always has a way to stop and to kill a misbehaving process regardless of what its code wants. The cost is that SIGKILL gives the process zero chance to clean up: no buffer flush, no temp-file removal, no lock release.

"Cannot be caught" is not the same as "instant". SIGKILL only takes effect when the process is scheduled to run, and a process blocked in an uninterruptible sleep (state D, typically waiting on disk or NFS I/O) will not die until that I/O completes or fails. You can kill -9 it all day and ps will keep showing it — the signal is pending, not ignored, and there is nothing userspace can do but wait or address the stuck I/O.

Graceful Shutdown

The correct way to stop a process is SIGTERM first, then SIGKILL only if it refuses. SIGTERM lets the process run its shutdown handler — commit the transaction, finish the in-flight request, release the lock — and exit on its own terms. Jumping straight to kill -9 skips all of that, which is how you get half-written files, orphaned lock files, and a database that has to run crash recovery on the next start.

On Debian and Ubuntu, systemd already does this dance for you and does it better than a hand-rolled script, because it signals the unit's cgroup rather than a single PID — every process the service forked gets the signal, with no PID-tracking gaps. KillSignal sets what is sent first (SIGTERM by default), TimeoutStopSec sets how long to wait, and when that timer expires systemd escalates to SIGKILL automatically.

# /etc/systemd/system/myapp.service.d/override.conf
[Service]
KillSignal=SIGTERM
TimeoutStopSec=45s     # give the app 45s to drain before SIGKILL
# Red Hat / dnf systems use the identical systemd directives.

If your service legitimately needs more than the default 90 seconds to shut down — a queue worker draining long jobs, a database flushing a large cache — raise TimeoutStopSec rather than disabling it. Setting it to 0 disables the timeout entirely, which means a wedged process blocks the stop forever and your reboots hang.

Signals and the Process Lifecycle

When a child process exits, it does not vanish — it becomes a zombie, a near-empty entry in the process table holding only its exit status, and the kernel raises SIGCHLD to the parent. The parent is supposed to call wait() to read that status and let the entry be freed; until it does, the zombie lingers. A handful of zombies is harmless, but a parent that never reaps will leak PIDs until the table fills and no new process can start.

If a parent dies before its children, the children are re-parented to PID 1 (systemd on a modern Debian or Ubuntu host), which reaps them correctly. This is exactly why containers need a real init as PID 1: a bare application run as PID 1 that does not reap will accumulate zombies forever, because there is no systemd above it to inherit the orphans. Inside a container, run tini or systemd, or use docker run --init.

Common Mistakes

Reaching for kill -9 as the first move. SIGKILL skips the process's cleanup entirely — unflushed writes are lost, lock files and temp files are orphaned, and a database is forced into crash recovery on restart. Send SIGTERM and give it a few seconds first.
Assuming SIGKILL is instantaneous. A process stuck in uninterruptible sleep (state D) on dead NFS or a failing disk will not die until that I/O returns; the signal sits pending and the process stays in ps no matter how many times you send it.
Trying to install a handler for SIGKILL or SIGSTOP. The kernel forbids it — those two are uncatchable by design — so the handler silently never runs and you are left wondering why your cleanup code did not fire.
Confusing kill 1234 with kill %1. The first signals PID 1234; the second signals shell job number 1, an entirely different process. In a script with no job control, %1 simply errors.
Signaling a stale PID after reuse. PIDs wrap around and get recycled; a script that records a PID, waits, then blindly kills it can hit a completely unrelated process now holding that number. Verify with kill -0 and check the process identity first.
Forgetting that SIGHUP kills foreground jobs when the terminal closes. A long task started over SSH dies when the session drops unless you launched it under nohup, disowned it, ran it in tmux, or made it a systemd unit.
Running an application as PID 1 in a container with no init. It never reaps its children, zombies pile up, and SIGTERM from docker stop is often ignored because PID 1 has no default disposition — use --init or tini.

Best Practices

Send SIGTERM first and wait. Reserve SIGKILL for processes that demonstrably ignored the polite request — never as the opening move.
Manage daemons with systemd, not raw kill. It signals the whole cgroup rather than a single PID, so every forked child is stopped with no tracking gaps.
Tune TimeoutStopSec to match real shutdown time, and never set it to 0 — an unbounded timeout lets a wedged process hang every reboot.
Preview every name-based signal with pgrep -f before running pkill -f or killall, and prefer exact-match -x over substring matching to avoid signaling unrelated processes.
Handle SIGTERM in any long-running service you write — drain in-flight work, flush buffers, release locks, then exit — so orchestrators can stop it cleanly.
Use kill -0 PID to test for existence in scripts instead of grepping ps; it is race-free and also confirms you have permission to signal the target.
Run a real init (tini, systemd, or docker run --init) as PID 1 in containers so children are reaped and SIGTERM is honored on shutdown.

Comparable toolsWindows — no POSIX signals; lifecycle uses WM_CLOSE/Ctrl-C for graceful exit and TerminateProcess as the uncatchable equivalent of SIGKILLmacOS / BSD — the same POSIX signals and kill semantics; launchd plays the systemd role for graceful service stopWindows Services — the SCM stop request is the rough analog of SIGTERM to a unit, with its own service-stop timeout

Knowledge Check

Why is SIGTERM the right first choice over SIGKILL when stopping a service?

SIGTERM is catchable, so the process can run its shutdown handler — flush buffers, release locks, finish in-flight work — whereas SIGKILL gives it no chance to clean up
SIGTERM is delivered noticeably faster than SIGKILL because its lower signal number gives it higher priority in the kernel queue and it is dispatched ahead of higher-numbered signals
SIGTERM stops the whole process group at once while SIGKILL reaches only the single PID you name on the command line
SIGTERM is the only signal systemd is able to send to a unit during a managed stop sequence

A process is stuck in uninterruptible sleep (state D) and survives repeated kill -9. Why?

SIGKILL can only be delivered when the process is scheduled to run; a process blocked in the kernel on disk or NFS I/O will not act on it until that I/O completes or fails
A process in state D has installed a private signal handler that quietly intercepts the incoming SIGKILL and discards it before the kernel ever gets a chance to act on the delivery
kill -9 only works on processes in the foreground process group, and a D-state process has already dropped out of that group
The kernel automatically converts an incoming SIGKILL into a harmless SIGSTOP for any process currently waiting on I/O

Why can a program install handlers for SIGTERM and SIGHUP but not for SIGKILL or SIGSTOP?

The kernel forbids catching, blocking, or ignoring SIGKILL and SIGSTOP so an administrator always retains a guaranteed way to kill or suspend any process
SIGKILL and SIGSTOP are both classified as real-time signals, and the kernel categorically forbids registering a handler for any signal that falls in the real-time range
Handlers for any signal numbered below 10 are disallowed by the kernel, and both of these fall inside that low range
Only the process that sent the signal may install a handler for it, and these come from the kernel

What does systemd's TimeoutStopSec control during a unit stop?

How long systemd waits after sending SIGTERM before escalating to SIGKILL on the unit's cgroup
How long the unit is allowed to take to start up before systemd gives up and marks the activation as failed
The interval systemd waits between automatic restarts after the unit crashes
How long systemd keeps the unit's logs before rotating them

Why does an application run as PID 1 in a container often ignore docker stop and accumulate zombies?

PID 1 has no default signal disposition, so an unhandled SIGTERM does nothing, and with no init to call wait() exited children are never reaped
Docker sends SIGSTOP rather than SIGTERM on a stop request, which merely suspends the process in place instead of asking it to shut down and exit cleanly
PID 1 is always pinned in an uninterruptible sleep state inside containers, so the kernel holds every signal pending
Containers strip the kill capability away from PID 1 at startup, so no signal sent from outside can ever reach it

You got correct