Exit Codes and Error Handling
Every command returns an integer status when it finishes: 0 means success, and 1 through 255 mean failure. The shell stores that number in $?, and you read it to decide what happens next. A script that never checks it is a script that keeps running after a step has already failed.
That is how a backup "succeeds" while writing an empty archive, and how a deploy continues past a failed migration. Error handling in shell is the discipline of checking the status of every command that matters and of turning silent failures into loud ones with set -e, set -o pipefail, and trap.
Exit Status Conventions
The status of the last foreground command lives in $?. It is a single byte, so the range is 0 to 255. Reading $? consumes nothing, but the next command overwrites it — so $? after an echo reports the echo, not the command you cared about. Capture it into a variable the instant you need it later.
Several codes carry fixed meanings, and knowing them turns an opaque number into a diagnosis. 127 means the command was misspelled or absent from $PATH; 126 means the file exists but is not executable; and any code of the form 128 + N means the process was killed by signal N, so 130 is a Ctrl-C (SIGINT, signal 2) and 137 is a SIGKILL (signal 9).
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error (catch-all) |
| 2 | Misuse of a shell builtin |
| 126 | Command found but not executable |
| 127 | Command not found |
| 128+N | Killed by signal N (130 = SIGINT, 137 = SIGKILL) |
cp backup.tar.gz /mnt/archive/ status=$? # capture before the next command overwrites $? if [ "$status" -ne 0 ]; then echo "copy failed (status $status)" >&2 fi
The errexit Option
The set -e option, also spelled set -o errexit, tells the shell to exit the moment a command returns a non-zero status. It converts "keep going no matter what" into "stop at the first failure," which is the right default for almost every non-interactive script.
The option has sharp, surprising exceptions. In a pipeline only the last command's status counts, so false | true exits 0 even under errexit. A command tested in an if, while, or until condition, or anywhere in an && or || list except the final command, never triggers the exit. And a function called inside any of those conditional contexts loses errexit for its entire body. Anyone who assumes set -e catches everything eventually gets burned by one of these.
set -e cat /var/log/app.log | grep ERROR | wc -l echo done # prints: only wc's status reaches $?, and wc succeeds
The pipefail and nounset Options
set -o pipefail closes the pipeline gap: a pipeline returns the status of the rightmost command that failed, not just the last stage. With it set, the grep pipeline above fails when grep finds nothing or cat cannot open the file, instead of being masked by a successful wc.
set -u, or set -o nounset, aborts the script when it expands a variable that was never set. That catches a typo like $DEST_DIRR before it expands to the empty string and turns rm -rf "$DEST_DIRR"/* into rm -rf /*. The three combine into the standard header set -euo pipefail, which most production scripts open with.
#!/usr/bin/env bash set -euo pipefail # errexit + nounset + pipefail # Now a failed command, an unset variable, and a broken # pipeline stage each stop the script immediately.
Traps and Cleanup
The trap builtin runs a command when the shell receives a signal or reaches a pseudo-signal. The most useful is EXIT, which fires whenever the script ends for any reason — normal completion, an errexit abort, or a fatal signal. A single trap ... EXIT guarantees cleanup runs on the failure paths, not just the happy path.
Make the cleanup idempotent so it is safe regardless of how far the script got before it ran. Add trap ... ERR to report which line failed during debugging, and trap INT and TERM separately when a Ctrl-C or a kill needs handling distinct from a normal exit.
set -euo pipefail tmp=$(mktemp -d) cleanup() { rm -rf "$tmp"; } # idempotent: safe even if $tmp is gone trap cleanup EXIT # runs on success, error, or signal trap 'echo "failed at line $LINENO" >&2' ERR tar czf "$tmp/out.tar.gz" /etc
Explicit Error Handling and Custom Codes
Sometimes you want to handle a failure rather than abort on it. The || operator runs its right side only when the left side fails, which gives the common command || die "message" pattern; && is the mirror image, running its right side only on success. Inside a function, return N sets the function's status without leaving the script, while exit N terminates the whole process — so an exit buried in a sourced library kills the parent shell.
Give your own scripts meaningful exit codes and document them, so callers and monitoring can tell a config error from a network timeout instead of seeing a generic 1. Stay inside 1 to 125, since 126, 127, and 128 + N are reserved by the shell, and any code above 255 wraps modulo 256 — exit 256 becomes 0 and reads as success.
die() { echo "$*" >&2; exit 1; }
deploy() {
systemctl restart app || return 3 # report a specific failure to the caller
}
deploy || die "deploy failed, aborting"
- Relying on
set -eto catch pipeline failures: withoutpipefail, only the last stage's status counts, sogenerate | tee out.logreports success even whengeneratecrashed. - Reading
$?after an interveningechoor[test — the second command overwrites it, so you branch on the status of the wrong command. Capturestatus=$?first. - Assuming
set -ecatches everything: a function called inside anifcondition, or any command joined with&&/||, runs with errexit disabled and silently swallows failures. - Calling
exitin a script meant to besourced — it terminates the interactive shell that sourced it instead of returning to the caller. Usereturnin sourced code. - Writing non-idempotent cleanup in
trap ... EXIT; if the temp dir was never created, the trap itself errors and masks the original failure. - Returning a status above 255 or reusing reserved codes (126, 127, 128 + N) for your own errors — the value wraps modulo 256 and collides with shell-defined meanings.
- Treating exit code 127 as a generic bug instead of "command not found," which almost always means a typo or a missing package, not faulty logic.
- Open every non-trivial script with
set -euo pipefailso failed commands, unset variables, and broken pipeline stages all stop execution. - Register cleanup with
trap cleanup EXITand write the cleanup function to be safe even when nothing has been created yet. - Capture
status=$?on the line immediately after the command you care about, before any other command can overwrite it. - Use
command || die "message"for fatal steps and send the error to stderr with>&2, never to stdout where it pollutes parseable output. - Return status from functions with
return Nand let the caller decide; reserveexitfor the top-level script, especially in anything that may be sourced. - Add
trap 'echo "error on line $LINENO" >&2' ERRwhile debugging to pinpoint exactly where a failure happened. - Assign your own exit codes in the 1–125 range, document what each one means, and keep them stable so monitoring can distinguish failure modes.
$LASTEXITCODE and $?Windows cmd errorlevelKnowledge Check
A script runs cat file | grep ERROR | wc -l under set -e, but file is missing. Why does the script keep going?
- Without
pipefail, only the last stage's status reaches the shell, andwcsucceeds — so errexit sees a 0 set -ecarries a built-in exception that quietly ignores the status of any command that happens to read its input from a pipecatreturns a status of 0 even when the file it is given does not existgrepalways returns 0 whenever it produces no matching output lines
Why prefer trap cleanup EXIT over simply calling cleanup as the last line of the script?
- EXIT fires on normal completion, an errexit abort, and signals, so cleanup also runs on the failure paths that never reach the last line
- A trap dispatches the cleanup faster than calling the function directly as the final line
- An EXIT trap can return a status greater than 255 to the parent shell, where a normally called function cannot
- Without a trap, the cleanup function would run in its own scope and lose all access to the script's variables and temporary file paths when it finally runs
A helper is loaded with source lib.sh, and on an error it calls exit 1. What happens?
- The shell that sourced the file terminates, because
exitends the current process rather than returning to the caller - Only the library's own subshell exits with status 1, and the interactive shell that sourced the file simply continues running unaffected
- The shell silently converts the sourced file's
exitinto areturnso the caller survives - The caller receives exit code 0 because sourcing suppresses failures
A command finishes and the shell reports exit code 130. What does that tell you?
- The command was killed by SIGINT — the signal Ctrl-C sends — because 130 is 128 + 2
- The named command could not be located anywhere on the current
$PATH - The command file existed on disk but was not marked with the executable permission bit
- A shell builtin was invoked with the wrong number or kind of arguments
Why should a script's custom error codes stay in the 1–125 range?
- 126, 127, and 128 + N are reserved by the shell, and any value above 255 wraps modulo 256 — so reusing them creates ambiguous or false-success statuses
- The shell flatly rejects any
exitargument greater than 125 with a syntax error before the script can even terminate, so codes in that range are simply not allowed to be written - Any code above 125 is automatically reset to a plain 1 by
set -ebefore it reaches the caller - Only codes 0 through 125 can be read back from
$?
You got correct