Expansion and Globbing
Before bash runs a command, it rewrites the line through a fixed sequence of expansions: brace expansion, then tilde, then parameter and variable expansion, command substitution and arithmetic expansion (these three happen together, left to right), then word splitting, and finally pathname expansion — globbing. The program you are calling never sees *.log or $HOME or {a,b}; it sees the already-expanded list of words the shell handed it. By the time execve() runs, the original text is gone.
That order is the single most useful thing to internalize, because almost every "weird" shell result is just expansion happening in a stage you did not expect — or not happening because something was quoted. Word splitting runs after variables expand and before globbing, which is exactly why an unquoted variable holding a filename with spaces turns into multiple arguments and then gets glob-matched against the directory. Quote it and the whole chain collapses to a single literal word.
Pathname Expansion (Globbing)
Globbing is the shell matching unquoted patterns against existing filenames. Three operators do the work: * matches any string including the empty string (but not a leading dot and not /), ? matches exactly one character, and [...] matches one character from a set or range, such as [0-9] or [!a-z] for negation. The shell expands the pattern into a sorted list of matching paths and passes that list to the command — the command itself does no matching.
This is why ls *.log and rm *.log behave identically with respect to which files are chosen: ls and rm both receive the same already-expanded argument list. It is also why a leading dot is protected — * deliberately skips dotfiles, so cp * /backup will not copy .bashrc or .ssh. To include hidden files you need an explicit pattern or the dotglob option, covered below.
# the shell expands these; the command receives the file list ls *.log # every .log in the current dir ls log-?.txt # log-1.txt, log-a.txt — exactly one char ls access-[0-9].log # access-0.log .. access-9.log ls *.[ch] # .c and .h files # prove it: echo runs no matching of its own echo *.log # prints the same expanded list ls would get
Brace Expansion
Brace expansion looks like globbing but is fundamentally different: it generates text, and it does not touch the filesystem at all. {a,b,c} expands to three words; {1..10} is a sequence; {01..10} zero-pads; {a..z} walks the alphabet; {1..10..2} takes a step. Whether the resulting names exist is irrelevant — brace expansion happens first, before any file is consulted.
That distinction has a practical consequence. mkdir {src,test,docs} creates three directories that did not exist a moment ago, because brace expansion produces the names regardless of the filesystem. A glob could never do that — * can only ever return things that already exist. The classic atomic backup, cp config.yaml{,.bak}, expands to cp config.yaml config.yaml.bak for the same reason.
# brace expansion generates words — files need not exist mkdir -p project/{src,test,docs} cp nginx.conf{,.bak} # -> cp nginx.conf nginx.conf.bak touch log-{01..05}.txt # log-01.txt .. log-05.txt echo {a..e} # a b c d e # brace first, then glob the generated words against real files ls /var/log/{syslog,auth.log}*
Tilde, Parameter, and Command Substitution
The remaining expansions fill in values rather than match files. Tilde expansion turns ~ into $HOME and ~user into that user's home directory — but only when the tilde is unquoted and at the start of a word, which is why "~/file" in a script is a common surprise that leaves a literal tilde. Parameter expansion is $VAR or ${VAR}, and the brace form unlocks defaults and slicing: ${VAR:-default} substitutes a fallback, ${VAR:?msg} aborts with an error if unset, and ${PATH##*:} strips a prefix.
Command substitution, $(command), runs the command and replaces the construct with its stdout, with trailing newlines stripped. The older backtick form `command` does the same but does not nest cleanly, so prefer $(...). Arithmetic expansion, $((expr)), evaluates integer math. All three feed their result back into the line and — if unquoted — into word splitting and globbing, which is the source of most substitution bugs.
# parameter expansion with defaults and string ops echo "${DEPLOY_ENV:-staging}" # fallback if unset/empty port="${PORT:?PORT must be set}" # abort if unset echo "${file##*/}" # basename via prefix strip # command and arithmetic substitution — quote the result now="$(date +%F)" count="$(ls -1 *.log | wc -l)" echo "$(( count * 2 ))"
Word Splitting and IFS
After variable and command substitution, bash splits the result into words on the characters in IFS — by default space, tab, and newline. This step runs only on unquoted expansions. So files=$(ls) followed by for f in $files breaks on every whitespace character, which destroys any filename containing a space and is the reason "parse ls" is a perennial bug. The fix is to quote: "$var" is never word-split.
Arrays are the correct tool when you genuinely have a list. "${arr[@]}" expands to one word per element with no splitting inside elements, preserving spaces in each filename. For iterating files, a glob in a for loop — for f in *.log — is safe because pathname expansion produces properly separated words without going through whitespace splitting. Setting IFS=$'\n' changes the split character, but quoting and arrays are the answer that holds up against spaces and newlines in filenames.
# unquoted expansion is word-split AND globbed — usually a bug name="my report.txt" rm $name # tries: rm my report.txt (two args!) rm "$name" # correct: one argument # safe iteration: glob directly, quote the variable for f in *.log; do gzip "$f" done
nullglob, globstar, and dotglob
By default, a glob that matches nothing is passed through literally. ls *.xml in a directory with no XML files runs ls with the literal argument *.xml, producing a confusing "No such file" error — or worse, a loop body that runs once with a nonexistent path. The nullglob shell option changes this so a non-matching pattern expands to nothing, which is what you almost always want inside a script. failglob goes further and raises an error instead.
globstar enables ** for recursive matching: **/*.log matches log files at any depth below the current directory. It is off by default and must be enabled per shell with shopt -s globstar. dotglob makes * include dotfiles, which is occasionally wanted and frequently dangerous — rm * with dotglob set will also delete .gitignore and .env. These are bash-specific shopt options, distinct from the POSIX set options.
# shopt options change globbing behavior (bash-specific) shopt -s nullglob # non-matching glob -> empty, not literal shopt -s globstar # enable ** for recursion shopt -s dotglob # * also matches dotfiles (careful) # with globstar on, descend the whole tree for f in **/*.log; do gzip "$f"; done # disable globbing entirely for a risky line set -f # noglob: * stays literal
Globs are the shell's filename-matching language, and they are anchored to the whole name. * means "any string", ? means "one character", and there is no quantifier syntax. They run only on words the shell treats as paths, and they only ever match files that exist.
Regular expressions are a different, richer language used inside programs like grep, sed, and awk — never by the shell for filenames. Here * means "zero or more of the previous element", . means "any character", and matching is unanchored by default. The glob *.log is the regex ^.*\.log$ — the symbols overlap but mean different things.
The practical rule: if the pattern is a bare argument to a command, it is a glob and the shell expands it; if it is quoted and handed to grep or find -regex, it is a regex and that program interprets it. Quoting a regex (grep "ab*c") also stops the shell from mistaking it for a glob.
- Leaving a variable unquoted —
rm $file— when$fileholds a name with spaces: word splitting turns one path into several arguments, and any*in the value then gets glob-expanded against the directory. Always writerm "$file". - Treating
*as a regex.ls *.confmatches files ending in.conf, but ingrepthe same*means "repeat the previous character", sogrep *.conf filemeans something entirely different — and the shell may glob it first. - Looping over a glob without
nullglob:for f in *.bakruns once with the literal string*.bakwhen no.bakfiles exist, so the loop body operates on a nonexistent file instead of skipping. - Running
rm *in the wrong directory, or after acdthat silently failed. The glob expands to every visible file beforermsees it, and there is no undo — verify withecho *first, or usecd target && rm *. - Parsing
files=$(ls)and iterating$files: command substitution plus word splitting mangles any name containing a space, tab, or newline. Glob directly in the loop instead. - Assuming
*includes hidden files. It never does withoutdotglob, so acp * dest"backup" silently omits.env,.ssh, and every dotfile. - Forgetting that brace expansion does not check the filesystem:
mv report-{2023,2024}.csv dest/fails the whole command ifreport-2023.csvdoes not exist, because the brace generates both names unconditionally.
- Quote every variable and command substitution —
"$var","$(cmd)"— unless you have a specific reason to want word splitting or globbing on that expansion. - Preview any destructive glob with
echofirst:echo rm *.tmpshows the exact argument list before you commit to running it. - Set
shopt -s nullglobnear the top of scripts that loop over globs, so a non-matching pattern skips the loop instead of running on a literal pattern string. - Enable
shopt -s globstarand use**/*.extfor recursive matching instead of shelling out tofindwhen you only need a simple depth-agnostic file list. - Use brace expansion for generating names that may not exist yet —
mkdir -p {src,test,docs},cp file{,.bak}— and reserve globs for matching names that do exist. - Disable globbing with
set -fbefore any line that passes patterns to a program that does its own matching (find -name '*.log'), then re-enable withset +f. - Iterate files with arrays or a direct glob in
for f in *.log; never parse the output ofls, which breaks on whitespace and special characters in names.
*, ?, [...]) are matched by the cmdlets and the provider, not pre-expanded by a shell as in bashzsh — extended globbing with qualifiers (*(.), ** on by default) far richer than bash'sfish — globbing with no word splitting on variables, so unquoted expansions are safe by designKnowledge Check
In rm *.log, which component decides which files are deleted?
- The shell expands
*.loginto the matching file list beforermstarts;rmonly receives and deletes that list - rm receives the literal pattern
*.logas its argument and runs its own internal filename matching against the directory - The kernel performs the pattern matching at the moment
rminvokes theunlink()system call on the wildcard - The filesystem driver interprets the
*as a wildcard down at the path-lookup layer and returns every match
Why can mkdir {src,test,docs} create directories that do not yet exist, while ls *.txt can only list files that already exist?
- Brace expansion generates text without consulting the filesystem; globbing only ever expands to existing matching paths
- Brace expansion runs after globbing has finished, so it operates on an already-cleaned directory listing and can add new names
mkdirhas a built-in special mode that parses and interprets the braces itself, whereaslslacks that mode- Both forms are really globs, but the
{}syntax implicitly turns onnullglobso unmatched names still appear
A script sets name="my file.log" then runs gzip $name unquoted. What goes wrong?
- Word splitting on the space passes two arguments,
myandfile.log, sogzipoperates on the wrong (nonexistent) names - Nothing goes wrong — bash preserves the embedded space because the variable was already quoted back at the assignment line
gzipreceives the literal text$nameas its argument because variables are not expanded when used as command arguments- The embedded space triggers brace expansion, which splits the value and ends up creating two separate output files
Without nullglob, what happens when for f in *.bak runs in a directory containing no .bak files?
- The pattern is passed through literally, so the loop runs once with
fset to the string*.bak - The loop body is skipped entirely because the glob matched nothing, so
fis never assigned a value - Bash raises a fatal "no match" error on the unmatched pattern and exits the script immediately
- The glob falls back to expanding into every file in the current directory, iterating over all of them
You need to match .log files at any depth below the current directory using a glob. What enables this?
shopt -s globstar, then**/*.log— recursion is off by default and must be enabled*.logalready recurses on its own; the shell descends into every subdirectory automatically when matching itset -f, which switches the*wildcard out of single-level mode and into recursive whole-tree matchingshopt -s dotglob, which lets the*wildcard cross directory boundaries and reach into nested folders
You got correct