Loops and Functions
Topic 61

Loops and Functions

ScriptingBash

Loops and functions are the two constructs that turn a linear list of commands into a program. A loop repeats a block of work over a list of items or a stream of lines; a function names a block so you can call it many times and from many places. Together they are what separates a copy-pasted sequence of commands from something you can maintain.

In bash, both have edges that cut. A for loop word-splits its list, so a filename with a space becomes two iterations. A while read on the right of a pipe runs in a subshell, so any variable it sets vanishes when the loop ends. And a function has no real return value — return only sets an exit status from 0 to 255. Knowing where these traps sit is the difference between a script that works on your laptop and one that survives a directory of real-world filenames in production.

for, while, and until

A for loop iterates over a list of words. The list is whatever the shell produces after expansion — a glob, an array, the output of a command substitution — and the loop runs its body once per word. For numeric ranges, the C-style form for (( i=0; i<n; i++ )) is more honest than brace expansion because the bound can be a variable. Use a glob, not ls, when you want to walk files: the glob produces real filenames as separate words, with no parsing in between.

A while loop runs as long as its condition exits 0, which makes it the right tool for consuming a stream. The canonical pattern, while IFS= read -r line, reads one line at a time from standard input and is the only correct way to process file contents line by line. until is the inverse — it runs until its condition succeeds — and its natural home is a retry loop that keeps trying until a service comes up.

# glob, not ls — each file is one word even with spaces
for f in /var/log/*.log; do
  echo "rotating $f"
done

# C-style numeric loop with a variable bound
for (( i=0; i<5; i++ )); do
  echo "attempt $i"
done

# until: retry with a real backoff and a cap
n=0
until curl -fsS http://localhost:8080/healthz; do
  n=$((n+1))
  [ "$n" -ge 10 ] && echo "gave up" && exit 1
  sleep "$((n*2))"
done

Reading Input Safely

The single most common scripting bug is for line in $(cat file). It does not iterate over lines — it iterates over words, because the unquoted command substitution is split on every character in IFS (space, tab, newline) and then glob-expanded. A line containing two words becomes two iterations, and a line containing * expands to your directory listing. The fix is not a tweak; it is a different construct entirely.

Use while IFS= read -r line. Setting IFS= for that one command disables the leading and trailing whitespace trimming that read does by default, so the line arrives intact. The -r flag stops read from treating backslashes as escapes, so a path like C:\dir survives. For filenames specifically, even newlines are legal characters, so the safe pattern is find -print0 piped into read -d '', which delimits on the NUL byte that cannot appear in a path.

# wrong: word-splits and glob-expands every line
for line in $(cat hosts.txt); do ssh "$line" uptime; done

# right: one line per iteration, intact
while IFS= read -r host; do
  ssh "$host" uptime
done < hosts.txt

# NUL-delimited: survives spaces AND newlines in filenames
find . -type f -name '*.bak' -print0 |
while IFS= read -r -d '' f; do
  rm -- "$f"
done

Functions and Scope

A function is defined with name() { ...; } and called by name like any command. Its arguments arrive as the positional parameters $1, $2, and the whole set as "$@" — the same mechanism a script uses for its own arguments. Always expand it as "$@" in double quotes: unquoted $@ re-splits each argument on whitespace, while "$@" preserves every argument exactly as passed, even ones containing spaces.

Variables in bash are global by default, including inside functions — assign to x in a function and you have silently overwritten the caller's x. Declare every function-local variable with local to scope it to that call. The deeper point is that bash functions do not return values. return sets an exit status in the range 0 to 255 and nothing else; to hand back data, you print it to stdout and let the caller capture it with command substitution, reserving the exit status for success-or-failure signaling.

# data comes back on stdout; status signals success/failure
get_pid() {
  local name="$1"
  local pid
  pid=$(pgrep -x "$name") || return 1
  echo "$pid"
}

if pid=$(get_pid nginx); then
  echo "nginx is $pid"
else
  echo "nginx not running" >&2
fi

Loop Control and Pitfalls

break leaves the loop entirely; continue skips to the next iteration. Both take an optional level — break 2 exits two nested loops at once — which is cleaner than a flag variable when you need to bail out of an inner loop and its parent together.

The pitfall that bites everyone is the piped while. When you write cmd | while read line; do ...; done, the right-hand side of the pipe runs in a subshell, so a counter or array you build inside the loop is gone the instant the loop ends — the parent shell never saw the assignments. There are two real fixes: feed the loop with a redirection or process substitution instead of a pipe, so it runs in the current shell, or set shopt -s lastpipe (with job control off) so the last command of a pipeline runs in the parent. Redirection is the portable choice.

# BROKEN: count is 0 here — the loop ran in a subshell
count=0
grep -c ERROR *.log | while read n; do count=$((count+n)); done
echo "$count"   # prints 0

# FIXED: process substitution keeps the loop in the current shell
count=0
while read n; do count=$((count+n)); done < <(grep -c ERROR *.log)
echo "$count"   # prints the real total
for f in $(ls) vs for f in * vs while read

for f in $(ls) — broken. The command substitution is word-split on whitespace and glob-expanded, so any filename with a space or a * derails the loop. Never use it; there is no input type for which this is the right choice.

for f in * — the glob-safe form. The shell expands the pattern into real filenames as separate words, with no parsing step in between, so spaces and special characters survive. Use it whenever you are iterating over files in a directory.

while IFS= read -r line — the line-safe form. Use it when the input is a stream of lines from a file, a pipe, or a command — one intact line per iteration, which a for loop can never give you.

Common Mistakes
  • for f in $(ls) or for f in $(cat list) — the unquoted substitution word-splits and glob-expands, so a filename with a space becomes two iterations and a name containing * explodes into a directory listing. Use a glob or find -print0 instead.
  • Piping into while read and then reading a variable set inside the loop. The loop body ran in a subshell, so the counter, array, or flag is empty afterward — and the bug is silent, the script just produces wrong totals.
  • Using return to pass data out of a function. return only sets an exit status from 0 to 255; return 300 wraps to 44, and any string is meaningless. Print the data to stdout and capture it instead.
  • Dropping -r from read, so backslashes are interpreted as escapes. A Windows path or a line ending in \ gets mangled or silently joined to the next line.
  • Writing $@ instead of "$@" when forwarding arguments. Unquoted, each argument is re-split on whitespace, so a single path with a space is passed along as two separate arguments.
  • A while or until retry loop with no sleep and no attempt cap. It hammers the target thousands of times a second and, if the target never recovers, spins forever pinning a CPU.
  • Forgetting local on a function variable, so an assignment inside the function clobbers a same-named variable in the caller — a bug that only surfaces when the names happen to collide.
Best Practices
  • Read streams with while IFS= read -r line every time — it is the only construct that gives you one intact line per iteration without trimming or backslash mangling.
  • Quote every expansion: "$var", "$@", "$f" inside loops. Quoting is what makes the difference between handling a filename with a space and corrupting your data.
  • Declare local for every variable a function uses, so the function cannot leak into or stomp on the caller's namespace.
  • Return data on stdout and signal success or failure through the exit status. Let the caller use x=$(fn) for the value and if fn; then for the outcome.
  • Walk files with a glob or find -print0 | while IFS= read -r -d '', never by parsing ls. Globs and NUL delimiters are the only forms that survive arbitrary filenames.
  • Give every retry loop a max-attempts counter and a sleep with backoff, so a down dependency degrades gracefully instead of turning into a busy-wait.
  • Feed a counting or accumulating while loop with a redirection or process substitution (done < <(cmd)) rather than a pipe, so it runs in the current shell and its variables survive.
Comparable toolsPowerShellforeach and function with real typed return values and objects on the pipeline instead of word-split textPythonfor/def with real return values, exceptions, and lexical scope; the tool to reach for when a bash loop or function grows past readabilityawk — its own for loop and function syntax built for line-and-field stream processing, often replacing a while read loop entirely

Knowledge Check

Why is for line in $(cat file) the wrong way to read a file line by line?

  • The unquoted command substitution is word-split on IFS and glob-expanded, so it iterates over words (and expands any *), not over lines
  • cat buffers the entire file into memory before the loop body can start, so the construct fails outright on any file that happens to be larger than the available RAM
  • for can only iterate over a fixed numeric range, never over the output of a command like cat
  • It reads the lines correctly but quietly strips the trailing newline off the final line

After cmd | while read n; do count=$((count+n)); done, why does $count read as 0 in the parent shell?

  • The right side of the pipe runs in a subshell, so assignments inside the loop never propagate back to the parent
  • read resets count to 0 at the start of every iteration
  • Arithmetic with $(( )) cannot accumulate a running total across successive loop iterations
  • The pipe discards the loop's standard output when it closes at the end, and the accumulated count variable is carried away along with it

A function needs to hand a hostname back to its caller. What is the correct mechanism in bash?

  • Print the hostname to stdout and have the caller capture it with h=$(fn), reserving the exit status for success or failure
  • Pass the hostname string to return, e.g. return "$host", and read the value back from $? in the caller
  • Assign the hostname directly to $1 inside the function body so that the caller sees the updated positional parameter once the function returns
  • Use exit "$host" so the value lands in the caller's exit code

What does setting IFS= and adding -r in while IFS= read -r line accomplish?

  • IFS= stops leading/trailing whitespace from being trimmed and -r stops backslashes from being treated as escapes, so the line arrives byte-for-byte intact
  • It makes read split the line into an array of fields on whitespace
  • It forces read to consume the entire file in one call rather than one line
  • It enables NUL-delimited reading so that the loop can safely handle even the pathological filenames that happen to contain embedded newline characters of their own

You got correct