Topic 14

Pipes and Redirection

Shell

Every process starts life with three open file descriptors: standard input (fd 0), standard output (fd 1), and standard error (fd 2). A program reads from fd 0, writes its results to fd 1, and writes diagnostics to fd 2. The kernel attaches those descriptors to whatever the shell points them at — a terminal, a file, or another process — and the program neither knows nor cares which.

Redirection and pipes are how the shell rewires those three descriptors before the command ever runs. That wiring is the whole foundation of the Unix tool model: each command does one thing, reads stdin, writes stdout, and you compose them. Because stdout and stderr are separate descriptors, you can capture a command's results in a file while still seeing its errors on screen — but only if you get the ordering right, which is where most redirection bugs live.

The Three Standard Streams

A file descriptor is a small non-negative integer that indexes the kernel's per-process table of open files. When a shell launches a command it guarantees descriptors 0, 1, and 2 are open. On an interactive terminal all three point at the same device — the controlling TTY — which is why typed input, normal output, and error messages all share one window.

The streams are also reachable by path through /dev/stdin, /dev/stdout, and /dev/stderr, which the kernel maps back to descriptors 0, 1, and 2. Higher descriptors exist too: a shell or script can open fd 3 and above for extra channels, which is how a script keeps a dedicated log file open alongside its normal output.

# inspect the descriptors the shell handed this process
ls -l /proc/self/fd
# 0 -> /dev/pts/0   (stdin, the terminal)
# 1 -> /dev/pts/0   (stdout, the terminal)
# 2 -> /dev/pts/0   (stderr, the terminal)

Redirection and Truncation versus Append

Redirection points a descriptor at a file instead of the terminal. > sends stdout to a file, truncating it to zero length first; >> appends instead of truncating; < feeds a file into stdin. A descriptor number sits immediately before the operator with no space, so 1> is the same as > and 0< is the same as <.

The truncate-versus-append distinction is the one to internalize. echo line > log replaces the entire file on every run; echo line >> log adds a line and keeps the rest. A common data-loss event is running cmd > file inside a loop expecting the output to accumulate, then finding only the last iteration. On Debian and RHEL alike, bash also offers noclobber (set -C), which makes a bare > refuse to overwrite an existing file; >| overrides it when you do mean to clobber.

sort access.log > sorted.log      # truncate, then write stdout
date >> sorted.log            # append, keep existing contents
wc -l < sorted.log            # feed the file in as stdin

Redirecting Standard Error

A bare > moves only stdout; stderr still reaches the terminal. To redirect errors, name descriptor 2 explicitly — 2> truncates, 2>> appends. To send both streams to one place, duplicate one descriptor onto the other with 2>&1, which means "make fd 2 point wherever fd 1 currently points".

Order is the trap, because redirections apply left to right and 2>&1 copies the current target of fd 1 at the moment it is read. In cmd >file 2>&1 stdout is sent to the file first, then stderr is pointed at the same place — both land in the file. In cmd 2>&1 >file stderr is pointed at the terminal, where fd 1 still points, and only afterward is stdout moved to the file — so errors stay on screen. The bash shorthand &>file redirects both unambiguously, but it is a bashism — not POSIX — so a #!/bin/sh script under dash needs the explicit >file 2>&1 form.

# both stdout and stderr into build.log
make >build.log 2>&1
# WRONG: errors still hit the terminal; only stdout is captured
make 2>&1 >build.log
# bash shorthand: both streams to the file
make &>build.log
# discard stderr, keep stdout on screen
make 2>/dev/null

Pipes, Pipelines, and SIGPIPE

A pipe, written |, connects the stdout of one command to the stdin of the next through an in-kernel buffer (64 KB by default on Linux). Each stage of a pipeline runs as a separate process and they run concurrently — the reader consumes bytes as the writer produces them, so a pipeline streams rather than staging the whole output in memory or on disk. A pipe carries only stdout; each stage's stderr still goes to the terminal unless you redirect it.

When the reader exits early, the kernel sends SIGPIPE to the writer on its next write to the closed pipe. This is why producer | head -n 5 stops the producer shortly after the fifth line: head closes the pipe, the producer's next write triggers SIGPIPE, and the default action terminates it. Because each stage is its own process, a pipeline runs in subshells — variables set inside cmd | while read are lost when the pipeline ends, since the loop body never ran in the current shell.

# three concurrent processes; data streams through the buffer
journalctl -u nginx | grep ' 500 ' | wc -l
# head closes the pipe early -> producer gets SIGPIPE and stops
seq 1 100000000 | head -n 5
# tee: write to a file AND pass the stream on
dmesg | tee /tmp/boot.log | grep -i error

Pipeline Exit Status and pipefail

By default a pipeline's exit status is the status of the last command only. grep pattern bigfile | head reports success whenever head succeeds, even if grep matched nothing or bigfile never existed. Scripts that test $? after a pipeline silently miss failures in every stage but the last.

Bash offers two fixes. set -o pipefail makes the pipeline return the status of the rightmost command that exited non-zero, so an upstream failure propagates. The PIPESTATUS array holds each stage's individual exit status, letting a script pinpoint which stage failed. PIPESTATUS is a bashism dash does not implement, and the dash shipped in Debian 12 and Ubuntu 24.04 has no pipefail either — its set builtin rejects the option. Dash gained pipefail only in a later revision (Debian 13 "trixie"), so a #!/bin/sh script that needs either still wants a #!/bin/bash shebang.

# without pipefail: status is head's (0) even though grep failed
set -o pipefail
grep missing bigfile | head || echo "a stage failed"
# per-stage status: (1 0) means grep failed, head succeeded
grep missing bigfile | head
echo "${PIPESTATUS[@]}"

Here-docs, Here-strings, and Process Substitution

A here-document feeds a literal block into stdin. <<EOF reads lines until one containing exactly the delimiter; <<-EOF additionally strips leading tab characters so the block can be indented. Quoting the delimiter (<<'EOF') disables variable and command expansion inside the body — essential when the block contains literal $ signs. A here-string, <<<, feeds a single string into stdin without a temporary file.

Process substitution is a bashism that turns a command's output into a filename. <(cmd) exposes cmd's stdout as a path — a /dev/fd entry — that another command can open for reading; >(cmd) does the reverse for writing. It beats a temporary file for tools that demand filename arguments rather than reading stdin, such as diff comparing the output of two commands: no temp file to create, name, secure, or clean up. It also escapes the subshell trap, so while read x; do ...; done < <(cmd) keeps the variables the loop sets.

# here-doc: literal block to a config file, no expansion
cat <<'EOF' > /etc/motd
Maintenance window: Sunday 02:00 UTC
EOF
# here-string: one line straight into stdin
grep root <<< "$(getent passwd)"
# process substitution: diff two commands, no temp files
diff <(ssh web01 dpkg -l) <(ssh web02 dpkg -l)

2>&1 >file vs >file 2>&1

>file 2>&1 — stdout is redirected to the file first, then stderr is duplicated onto the same target. Both streams end up in the file. This is what you want when you mean "capture everything".

2>&1 >file — stderr is duplicated onto fd 1 while fd 1 still points at the terminal, then stdout is moved to the file. Errors stay on the terminal; only normal output is captured. Almost always a bug, and the single most common redirection mistake.

&>file — the bash shorthand that sends both streams to the file with no ordering to get wrong. Prefer it when the target is bash, but remember it is not POSIX, so dash and #!/bin/sh scripts need the explicit two-operator form.

Common Mistakes

Writing cmd 2>&1 >file to capture both streams. Redirections apply left to right, so stderr is duplicated onto the terminal before stdout moves to the file, and errors never reach it. Use >file 2>&1 or &>file.
Running cmd > file and expecting error messages in the file. A bare > redirects only stdout; stderr still hits the terminal, so the log silently omits the very messages you needed during an incident.
Testing $? after a pipeline without pipefail. The status is the last stage's only, so a failed grep or a missing input file upstream goes completely unnoticed.
Piping into a while read loop and reading its variables afterward. The loop runs in a subshell, so anything it sets is discarded when the pipeline ends. Use < <(cmd) process substitution or enable shopt -s lastpipe.
Overwriting a file with > by accident because noclobber is off — sort f > f truncates f to empty before sort can read it, destroying the data irrecoverably.
Relying on PIPESTATUS in a #!/bin/sh script. On Debian and Ubuntu /bin/sh is dash, which has no PIPESTATUS array (and on Debian 12 and Ubuntu 24.04 has no set -o pipefail either), so the script errors or silently behaves differently. Use #!/bin/bash.
Prefixing a file-reading command with cat, as in cat access.log | grep 500. The extra process and pipe add nothing; grep 500 access.log is equivalent and lets grep report the filename on matches.

Best Practices

Start every bash script with set -euo pipefail so a failure in any pipeline stage stops execution instead of being masked by the last stage's success.
Combine streams with >file 2>&1 or &>file and never with 2>&1 >file; keep them separate when you want diagnostics visible and only results captured.
Enable noclobber interactively with set -C so a stray > refuses to overwrite an existing file, and use >| when you genuinely intend to clobber.
Read PIPESTATUS immediately after a pipeline when you need to know which specific stage failed — the next command overwrites the array.
Reach for process substitution <(cmd) instead of creating, naming, securing, and deleting a temporary file whenever a tool needs a filename argument.
Feed loops with while read x; do ...; done < <(cmd) rather than cmd | while read, so the loop body runs in the current shell and keeps the variables it sets.
Use a quoted here-doc (<<'EOF') for multi-line literal input such as config snippets, so dollar signs and backticks in the body are not expanded by the shell.

Comparable toolsPowerShell — a pipeline that passes structured objects, not text streams; Select-Object and Where-Object filter properties rather than re-parsing linescmd.exe — supports >, >>, <, |, and 2>&1, but no here-docs, no pipefail, and far weaker compositionnamed pipes — mkfifo creates a persistent FIFO on disk for connecting unrelated processes, where | only wires up one command line

Knowledge Check

You run make 2>&1 >build.log and are surprised that compiler errors still scroll past on your terminal while only normal output lands in build.log. Why?

Redirections apply left to right, so stderr is duplicated onto the terminal before stdout is moved to the file
2>&1 only takes effect when it follows a pipe, and is silently ignored when it follows a plain file redirect
make writes its compiler errors straight to /dev/tty, bypassing file descriptor 2 so the redirection never catches them
build.log was opened in append mode, which silently discards the stderr stream

A script runs grep ERROR app.log | tail -n 20 and checks $? to decide whether any errors were found. The check reports success even on a log with no ERROR lines. What is the correct explanation?

A pipeline's exit status is the last command's by default, and tail succeeds regardless of what grep matched
grep returns 0 whenever the input file simply exists and is readable, independent of whether any line actually matched
The pipe operator explicitly resets $? back to 0 once the pipeline ends, before the script can read it
tail actively suppresses any non-zero exit status produced by an earlier stage of the pipeline

You need a while read loop to build a running total in a variable and use that total after the loop. Which construction keeps the variable's value?

while read n; do total=$((total+n)); done < <(cat data) — process substitution keeps the loop in the current shell
cat data | while read n; do total=$((total+n)); done — the pipe preserves the loop's variables
Either form works identically in bash and preserves the total, so the choice between them is purely a matter of style
Only a here-string fed with <<< can carry a variable's accumulated value back out of the loop body

Why is diff <(ssh a dpkg -l) <(ssh b dpkg -l) preferable to writing the two package lists to temp files first?

It exposes each command's stdout as a /dev/fd path, so there are no temp files to name, secure, or clean up
Process substitution runs the two ssh commands sequentially, which is faster than parallel temp-file writes
diff cannot read ordinary disk files at all, only the special /dev/fd paths created by the <(...) syntax
Writing to temp files first would route each command's output through stderr, which diff silently ignores

Your script begins #!/bin/sh on Ubuntu and reads ${PIPESTATUS[0]} after a pipeline, but the array is always empty. What is the cause?

On Debian and Ubuntu /bin/sh is dash, which does not implement the PIPESTATUS array; the script needs #!/bin/bash
PIPESTATUS must first be turned on with a shopt option rather than read directly, regardless of which shell runs
The correct array name is actually spelled PIPE_STATUS with an underscore, and that underscore was mistakenly left out
PIPESTATUS is only populated by the shell when the script happens to be executed with root privileges

You got correct