Navigating and Managing Files
Topic 08

Navigating and Managing Files

FilesShell

The everyday verbs are a short list — ls, cd, pwd, cp, mv, rm, mkdir, touch — and you will type some of them thousands of times. They sit on top of two ideas that decide what they actually do: how a path is resolved (absolute versus relative, with ., .., and ~ as shorthands), and the fact that the shell, not the command, expands the wildcards you type. Get those two right and the commands are predictable.

The flags are where the cost lives. cp without -a silently drops permissions, ownership, timestamps, and symlinks; rm -r with one stray space deletes the wrong tree; mv across a filesystem boundary is not an atomic rename but a copy-then-delete that can leave a half-written file if it fails. There is no recycle bin and no undo — a wrong rm is gone the instant the link count hits zero and no process holds the file open.

The Path Model

An absolute path starts at the root and is unambiguous from anywhere: /var/log/syslog means the same thing in every directory. A relative path is resolved against your current working directory — log/syslog only works if you are sitting in /var. The shell tracks where you are in $PWD and remembers the previous directory in $OLDPWD, which is what cd - jumps back to.

Three shorthands cover almost every move. . is the current directory, .. is the parent, and ~ expands to your home directory (~user to that user's home). These are not magic files — . and .. are real directory entries present in every directory, while ~ is tilde expansion done by the shell before the command runs. That distinction matters in scripts: a command never sees the literal ~, only the path it expanded to.

cd /var/log        # absolute: same target from anywhere
cd ../cache        # relative: up one, then into cache → /var/cache
cd -              # jump back to $OLDPWD
echo "$PWD"       # where am I right now
cd ~deploy/app     # the deploy user's home, then app/

Listing and Inspecting

ls on its own hides almost everything you need on a server. ls -l gives the long form: type and permission bits, link count, owner, group, size, and mtime. Add -a to show dotfiles, -h for human-readable sizes (KiB/MiB instead of raw bytes), and -i to print the inode number — the only reliable way to tell whether two names are hard links to the same file. The combination ls -lah is muscle memory for a reason.

When you need facts ls rounds off, drop to stat: it prints exact byte size, all three timestamps, the inode, link count, and the permission bits in both octal and symbolic form. file answers a different question — what a file actually is — by reading its magic bytes rather than trusting the extension, which is how you avoid cat-ing a binary and scrambling your terminal.

ls -lah /etc            # long, hidden, human sizes
ls -li                  # show inode numbers (spot hard links)
stat deploy.tar.gz      # exact size, atime/mtime/ctime, links
file /usr/bin/python3   # reads magic bytes, not the name

Copying and Moving

A plain cp src dst creates a new file owned by you, with a fresh timestamp, default permissions filtered through your umask, and any symlink in the source dereferenced into a full copy. For configuration trees, backups, or anything where metadata is part of the content, that is data loss in slow motion. cp -a (archive) is the safe default: it recurses, preserves owner, group, mode, and timestamps, and copies symlinks as symlinks instead of following them.

mv behaves in two completely different ways depending on geography. Within one filesystem it is a rename — it rewrites a directory entry, touches no data blocks, and is atomic: the file is either fully at the old name or fully at the new one, never both, never neither. Across a filesystem boundary it cannot rename, so it falls back to copy-then-unlink. That cross-device move is not atomic, takes time proportional to the data, and can leave a partial file at the destination if it is interrupted — which is exactly why deploy scripts stage to a temp file on the same filesystem and then rename into place.

CommandPermissions / ownerTimestampsSymlinks
cp src dstreset to your umask defaultsset to nowfollowed (copied as files)
cp -a src dstpreservedpreservedkept as symlinks
mv (same fs)unchanged (rename only)unchangedunchanged
mv (cross-device)copied, then original removedcopiedcopied as symlinks

Creating and Removing

mkdir -p path/to/dir creates every missing parent in one call and, critically, does not error if the directory already exists — which makes it idempotent and safe to run repeatedly in scripts. touch creates an empty file if it is absent and updates the mtime if it is present; that second behavior is useful for triggering rebuilds and rotating sentinel files, not just for making blank files.

rm deletes a directory entry; rm -r walks a tree and deletes all of it; rm -f suppresses prompts and ignores missing files. The danger is that the shell expands wildcards and variables before rm runs, so rm -rf $DIR/* with DIR unset becomes rm -rf /*, and rm -rf /var/tmp /old with an accidental space targets two trees instead of one. GNU rm enables --preserve-root by default, so a bare rm -rf / is refused — but that only guards the literal /: the glob /* expands to /bin, /etc, and every other top-level entry, which rm deletes one by one. There is no undelete — recovery means restoring from backup.

mkdir -p releases/2026/05   # makes all parents, no error if present
touch .rebuild              # create, or bump mtime if it exists
ls -d /tmp/cache-*          # LIST the glob first
rm -r -- /tmp/cache-*       # then delete what you just confirmed

Wildcards in Practice

The single most important fact about wildcards is that rm, cp, and mv never see them. The shell expands *.log into the matching filenames first and hands the command the finished list; the program receives a hundred arguments, not one pattern. That is why rm *.log in an empty directory passes the literal *.log through (and errors on a missing file), and why a filename with a leading dash can look like a flag once the glob expands.

The operational consequence is a habit, not a flag: preview every destructive glob with ls or echo first, because that shows you the exact same list the dangerous command will act on. Use -- to mark the end of options so a file named -rf or -i is treated as a filename rather than a switch. The full mechanics of brace, tilde, and pathname expansion get their own topic later — here the rule is simply: look before you delete.

Common Mistakes
  • rm -rf "$DIR"/* with DIR unset or empty — the shell expands it to rm -rf /*, which the default --preserve-root does not stop because each top-level entry is its own argument; and an accidental space in rm -rf /path /old deletes two trees instead of one. Quote variables and run set -u in scripts so an unset variable aborts.
  • Copying a config tree with plain cp instead of cp -a — permissions, ownership, and timestamps reset to your umask defaults and symlinks turn into full copies, so the restored tree no longer matches what the service expects.
  • Assuming mv is always atomic, then crossing a filesystem boundary (e.g. /tmp to /data on separate mounts) where it becomes copy-then-delete and can leave a half-written destination file if interrupted.
  • Running rm by glob without listing it first — ls the same pattern, see the actual matches, then delete. Skipping that step is how a stray * in the wrong directory becomes an outage.
  • Forgetting that the shell expands wildcards before the command runs, so rm *.log in an empty directory passes the literal *.log and errors, and a file named -rf gets read as a flag — use -- to end option parsing.
  • Treating touch as harmless — it bumps mtime on an existing file, which can falsely trigger a rebuild system, cache invalidation, or a backup tool that keys on modification time.
Best Practices
  • Preview every destructive glob with ls -d or echo first — the output is the exact argument list the rm or mv will receive.
  • Use cp -a whenever metadata matters (config, backups, deploy artifacts); it preserves owner, group, mode, and timestamps and keeps symlinks as symlinks.
  • Use mkdir -p in scripts for idempotency — it creates missing parents and succeeds silently when the directory already exists.
  • Stage cross-filesystem writes to a temp file on the destination filesystem, then mv into place so the final step is an atomic same-filesystem rename.
  • Quote every path variable ("$f") and add set -u to scripts so an unset variable aborts instead of expanding into a path you never meant to touch.
  • Put -- before filename arguments to rm, cp, and mv so names beginning with a dash are treated as files, not options.
  • Reach for stat and file instead of guessing — stat for exact size and all three timestamps, file to identify content by magic bytes before opening it.
Comparable toolsWindowscopy/move/del and robocopy for metadata-preserving recursive copiesmacOS — the same coreutils verbs as BSD variants, where some flags (e.g. cp, stat) differ from GNUPowerShellCopy-Item/Move-Item/Remove-Item, object-oriented rather than text-stream

Knowledge Check

Why is mv within one filesystem instant and atomic, but slow and non-atomic across two filesystems?

  • On one filesystem it only rewrites a directory entry; across filesystems it can't rename, so it copies the data then unlinks the original — which takes time and can leave a partial file
  • Cross-filesystem moves are always routed through the kernel's network stack, even between two local disks, and that per-packet round trip is what adds the latency you observe on every single byte transferred
  • mv always copies and deletes; the same-filesystem case just happens to have a faster disk
  • The kernel locks both filesystems during a cross-device move, serializing it behind other I/O

What does cp -a give you that a plain cp does not?

  • It recurses and preserves owner, group, mode, and timestamps, and copies symlinks as symlinks instead of following them
  • It transparently compresses every copied file in flight, so the archived tree lands at the destination taking far less space than the source
  • It creates hard links instead of new files, so the copy shares the original's inode
  • It verifies each copied file with a checksum and retries on mismatch

Before running rm -rf $DIR/* in a script, which precaution actually prevents wiping /* when DIR is accidentally unset?

  • Quoting the variable and enabling set -u, so an unset DIR aborts the script instead of expanding to nothing
  • Adding -i so rm interactively prompts for confirmation before each deletion, a guard that scripts always honor at run time
  • Relying on --preserve-root, which would block deletion of the top-level entries the glob /* expands into
  • Running the script as a non-root user, since rm cannot remove files owned by root

When you type rm *.log, what does the rm program actually receive as its arguments?

  • The list of filenames the shell already expanded the glob into — rm never sees the * pattern itself
  • The literal string *.log, which rm expands internally against the current directory
  • A regular expression that rm compiles internally and then matches against every filename in the current working directory
  • A file descriptor pointing at the directory, which rm scans for matches

You got correct