Topic 18

Viewing and Paging Files

FilesPaging

Reading a file or a stream is the most common thing you do on a server, and the right tool depends entirely on two facts: how big the file is and whether it is still growing. cat dumps a file whole and is for small, finished files. less pages through a large one without loading it all into memory. head and tail show only the ends. tail -f follows a file as new lines are appended, which is how you watch a log during an incident.

Picking the wrong one has real consequences. cat on a 2 GB log fills your scrollback and pins a slow terminal until it finishes; cat on a binary sends control bytes that can leave your terminal unreadable. tail -f on a logfile that gets rotated keeps following the old, now-deleted inode and shows you nothing while the action continues in a new file. None of these tools modifies the file — they are read-only views — which is exactly why you reach for them instead of opening an editor on a live config or a multi-gigabyte log.

cat and the Useless-cat Anti-pattern

cat concatenates files to standard output. Its real job is joining several files in order — cat part1 part2 part3 > whole — or dumping one short file so you can read it at a glance. For a file that fits on a screen or two, that is the fastest possible read. Past a few hundred lines it scrolls off the top before you can read it, and on a multi-gigabyte file it is actively harmful: the whole thing streams to the terminal and you wait for it.

The widespread misuse is feeding a single file into another command through cat when that command already reads files directly. grep, awk, sort, wc, and almost every filter take filenames as arguments, so the cat in front of them spawns an extra process and an extra pipe for nothing. It also throws away the filename in grep's output and breaks wc -l's ability to label counts per file.

# Useless cat: an extra process and pipe for no benefit
cat /var/log/syslog | grep "error"
# Direct: grep opens the file itself, keeps the filename, is faster
grep "error" /var/log/syslog

less and Interactive Paging

less is the pager you want for anything larger than a screenful. It reads the file lazily — it does not load the whole thing into memory, so opening a 10 GB file is instant and uses almost no RAM. Inside it you navigate with the arrow keys, Space and b for page forward and back, g and G to jump to the top and bottom, and /pattern to search forward (?pattern backward, n for the next match). Press q to quit. These are the same keys man uses, because man pages through less.

The name is a joke on its predecessor, more, and it is not just a joke — less can scroll backward, more historically could only go forward. less also starts displaying before it has read the entire file, searches with regular expressions, and on Debian and Ubuntu can pipe a file through lesspipe so that less archive.tar.gz shows the listing instead of binary garbage. There is no reason to use more on a modern system except habit.

Key in less	Action
`/pattern` · `n`	Search forward, jump to next match
`G` · `g`	Jump to end of file · jump to start
`F`	Follow new lines like `tail -f`, with Ctrl-C to stop and scroll
`&pattern`	Show only lines matching the pattern
`q`	Quit

head and tail for the Ends

Often you only care about the ends of a file. head prints the first ten lines by default, tail the last ten; -n changes the count, so tail -n 50 access.log gives the last fifty lines. head -n -5 with GNU coreutils prints everything except the last five lines, and tail -n +5 prints from line five to the end — the leading + means "starting at", which is how you skip a header row before piping into another tool.

Both also work in bytes with -c: head -c 512 file reads the first 512 bytes, useful for peeking at a file header without pulling the whole thing. Because tail seeks to the end rather than reading from the start, tail -n 100 on a huge log is effectively instant — it does not scan the gigabytes in front of the part you asked for.

# Last 50 lines of a log, then skip a one-line CSV header
tail -n 50 /var/log/nginx/access.log
tail -n +2 data.csv | awk -F',' '{print $3}'

Following Live Files

tail -f keeps the file open and prints new lines as they are appended — the standard way to watch a log in real time. The trap is log rotation. When logrotate renames app.log to app.log.1 and the service opens a fresh app.log, plain tail -f stays attached to the old inode by file descriptor and goes silent, because the new lines are landing in a different file. tail -F (capital F, GNU) watches the path by name instead: when the file is replaced it reopens the new one and keeps printing. On a server with rotating logs, -F is almost always the flag you want.

There are two other ways to follow a file worth knowing. less +F follows like tail -f but lets you press Ctrl-C to stop, scroll back through history with the full less keymap, then press F to resume following — useful when something scrolls past and you need to read what just went by. On a systemd host, application logs often go to the journal rather than a flat file, so journalctl -u nginx -f follows a unit's output the same way; that lives in the systemd chapter, but it is the modern equivalent for service logs.

# -f stays on the old inode after rotation and goes silent
tail -f /var/log/app.log
# -F re-opens the path when logrotate replaces the file
tail -F /var/log/app.log

Viewing Non-text Files

Not everything in a log directory is text, and sending a binary to the terminal with cat spews escape sequences that can switch the character set and leave your prompt showing gibberish — reset or tput reset fixes it afterward. Before dumping an unknown file, run file mystery.dat: it reads the magic bytes and tells you whether it is ELF, gzip, a JPEG, or UTF-8 text, so you know which tool to reach for.

For a binary you actually need to inspect, xxd file (or hexdump -C file) shows a hex dump with the printable bytes alongside, which is how you check a file header or a corrupted record. strings file pulls out the runs of printable characters and skips the rest, so strings /usr/bin/ssh | grep -i version surfaces embedded text from an executable without flooding the terminal. These three — file, xxd, and strings — are the safe way to look at anything you are not sure is text.

# Identify before you dump
file /usr/bin/ssh
# Hex + ASCII view of the first bytes; readable strings only
xxd /usr/bin/ssh | head
strings /usr/bin/ssh | grep -i version

Common Mistakes

Piping a single file through cat into grep, awk, or wc — cat file | grep x spawns an extra process, drops the filename from the output, and is strictly slower than grep x file, which opens the file itself.
cat-ing a multi-gigabyte log and waiting while it floods scrollback, when less would have opened it instantly and let you search — cat reads the whole file, less reads lazily.
Using tail -f on a logfile that logrotate renames every night — the lowercase -f follows the old inode and goes silent after rotation, so you watch a dead file while the live lines land elsewhere. Use tail -F.
cat-ing a binary and corrupting the terminal with embedded control bytes — run file first, and recover a garbled terminal with reset.
Opening a live, multi-gigabyte log in a full editor like vim or nano — it tries to load the whole file into memory and locks up, where less or tail -F would read only what you need.
Reaching for more out of habit and then being unable to scroll back to a line that already passed — less scrolls both directions and searches with regex; more on old systems only went forward.
Assuming tail -c and tail -n +N mean the same byte/line semantics across systems — on macOS and the BSDs the tail flags and -F behavior differ from GNU coreutils, so a script tested on Ubuntu can misbehave there.

Best Practices

Default to less for any file you cannot eyeball in a screen or two — it opens large files instantly, searches with /pattern, and never loads the whole file into memory.
Pass filenames straight to the command that consumes them — write grep error syslog, not cat syslog | grep error — to drop the extra process and keep the filename in the output.
Use tail -F, not tail -f, on any rotated log so following survives the moment logrotate replaces the file.
Run file on anything you are not certain is text before you cat it, and inspect real binaries with xxd or strings instead of dumping them raw.
Reach for less +F when you want to follow a log but still scroll back through what just passed — Ctrl-C pauses, F resumes following.
Strip a header before piping with tail -n +2 file rather than a fragile grep -v, and grab only the recent end of a huge log with tail -n 200, which seeks rather than scans.
For service logs on a systemd host, follow the journal with journalctl -u <unit> -f instead of guessing which flat file the daemon writes to.

Comparable toolsWindows — more for paging and Get-Content -Wait to follow a growing file, the rough equivalent of tail -fmacOS / BSD — the same cat, less, head, and tail, but BSD tail flags and -F behavior differ from GNU coreutilsPowerShell — Get-Content reads files and -Tail / -Wait cover the last-N and follow cases

Knowledge Check

A logfile is rotated nightly by logrotate. You leave tail -f /var/log/app.log running and notice it shows nothing after midnight even though traffic continues. What fixes it?

Use tail -F, which follows the path by name and reopens the new file when the old one is rotated away
Add -n 0 so tail starts reading from the very end and then picks up the freshly rotated file automatically
Pipe it through cat first, since cat transparently reopens the underlying file on each new line of output
Run the whole command as root, because tail -f loses permission to the inode once logrotate moves it aside

Why is cat file | grep error considered an anti-pattern compared with grep error file?

grep opens files directly, so the cat adds an extra process and pipe for no benefit and discards the filename from the output
grep cannot read a file passed directly as an argument, so it only works correctly when the contents are fed to it through a pipe
cat buffers the entire file in memory before sending it on, which makes the pipeline run out of RAM on large multi-gigabyte logs
The pipe can reorder lines under load, so grep ends up missing matches that cat happened to send out of sequence

You need to read a 4 GB log and search it for a string. Why is less the right tool over cat?

less reads the file lazily and searches with /pattern, so it opens instantly and uses little memory; cat streams all 4 GB to the terminal
less transparently compresses the file in memory as it loads it, so even a 4 GB log becomes small enough to hold entirely in RAM all at once
cat cannot open any file larger than the 2 GB offset limit, whereas less has no such size restriction at all
less edits the file in place to write a persistent search index into it, which then speeds up every later read

You want to inspect an unknown file in /var/log that might be binary. What is the safe first step?

Run file on it to identify the type from its magic bytes, then use xxd or strings if it turns out to be binary
cat it directly to the screen — the terminal automatically filters out the control bytes whenever it detects a binary file
Open it straight in vim, since it is the only tool that can render arbitrary binary content safely
Run head -c 0 on it first, which previews the file's type without actually reading any of its bytes

You got correct