Topic 20

sed

Text Processing

sed is a non-interactive stream editor: it reads input one line at a time, applies an editing script to each line, and writes the result to stdout. There is no file open, no cursor, no undo — the program runs the same script against every line and exits. That model is why sed shows up in pipelines, init scripts, and config-management glue everywhere there is text to rewrite without a human in the loop.

The operational consequence: sed is the standard tool for scripted, repeatable text transformations, but it carries two sharp edges. Its regular-expression dialect is POSIX BRE or ERE, not the Perl regex you may expect, and its in-place editing rewrites files with no safety net unless you ask for one. Both bite hardest at scale, when one script runs across thousands of files.

The Substitute Command

Substitution is the command you will reach for most. The form is s/regex/replacement/flags. Without flags it replaces only the first match on each line. The g flag replaces every match on the line; a numeric flag like 2 replaces only the second match; p prints the line when a substitution happened; I makes the match case-insensitive in GNU sed.

The delimiter after s does not have to be a slash. When your pattern or replacement contains slashes — file paths are the common case — pick another character to avoid escaping every one. In the replacement, & inserts the whole matched text.

# Replace all occurrences on each line with the g flag
sed 's/localhost/127.0.0.1/g' hosts.conf

# Use | as the delimiter so paths need no escaping
sed 's|/var/log/old|/var/log/new|g' rsyslog.conf

# & reuses the whole match: wrap each port number in brackets
sed -E 's/[0-9]+/[&]/g' ports.txt

Addresses and Ranges

By default a command applies to every line. An address restricts it. An address can be a line number (10), the last line ($), or a regex (/pattern/). A pair addr1,addr2 selects an inclusive range, and a trailing ! negates the address so the command runs on every line the address did not match.

Two range forms matter in practice. A regex-to-regex range /start/,/end/ selects a block between markers. The GNU form 0,/re/ matches from the start of input up to and including the first line that matches the regex, which is the clean way to edit only the first occurrence in a file.

# Delete lines 1 through 5
sed '1,5d' file.txt

# Comment out only the lines inside a [backend] block
sed '/^\[backend\]/,/^\[/ s/^/#/' app.ini

# Negation: print every line that is NOT blank
sed -n '/^$/!p' notes.txt

In-Place Editing

The -i flag rewrites the input file instead of printing to stdout. It is convenient and dangerous. GNU sed writes a temporary file and renames it over the original, so a clean run is atomic, but a script that matches nothing still rewrites the file, and a malformed script can leave you with content you did not intend and no copy of the original.

Give -i a suffix and it keeps a backup: -i.bak writes file.bak before editing. Two portability traps wait here. GNU and BSD sed disagree on the syntax — on macOS and the BSDs you must write -i '' with an explicit empty argument for no backup. And in-place editing replaces a symlink with a regular file, breaking the link.

# GNU sed: edit in place, keep file.bak as the original
sed -i.bak 's/DEBUG/INFO/g' log4j.properties

# macOS / BSD sed: empty suffix is REQUIRED for no backup
sed -i '' 's/DEBUG/INFO/g' log4j.properties

Beyond Substitution

Substitution is not the whole language. sed processes input in a cycle: it reads the next line into the pattern space, runs every command against it, then — by default — prints the pattern space and starts again. The -n flag suppresses that automatic print, so you emit only what you explicitly ask for, usually with the p command. That pairing is how sed becomes a filter that returns a subset of lines.

The line commands round out the toolkit. d deletes the pattern space and starts the next cycle; a, i, and c append, insert, and change whole lines; y/abc/xyz/ transliterates characters like tr; and q quits after the current line, which makes sed '100q' a fast head that stops reading at line 100.

# Without -n this duplicates matches; with -n it filters
sed -n '/error/p' app.log

# Delete blank lines, then insert a header before line 1
sed '/^$/d; 1i\# generated file' config.txt

Capture Groups and Back-references

A capture group records part of the match so the replacement can reuse it. In POSIX BRE — sed's default dialect — you write a group as $...$ and refer back to it with \1 through \9. The -E flag (also -r) switches to Extended Regular Expressions, where groups are plain (...) and the operators +, ?, and | need no backslash. Use -E and the regex reads the way you expect.

What neither BRE nor ERE gives you is Perl shorthand. \d, \w, and lookahead do not exist in POSIX sed. GNU sed adds \w and \s as extensions, but those are not portable to BSD sed; use POSIX bracket classes like [[:digit:]] and [[:space:]] when the script must run on more than one platform.

# Swap "Last, First" into "First Last" using two groups
sed -E 's/^(\w+), (\w+)/\2 \1/' names.txt

# Portable BRE form: groups need backslashes
sed 's/^\([^,]*\), \(.*\)/\2 \1/' names.txt

sed vs awk vs perl -pe

sed — line-oriented edits: substitution, deletion, range selection. Reach for it for simple find-and-replace and stream filtering, where the unit of work is a line and the logic fits on one line of script.

awk — field- and record-aware processing with variables, arithmetic, and conditionals. Choose it the moment the task is about columns, sums, or per-record state rather than character-level edits.

perl -pe — sed's line-loop with the full Perl regex engine and language behind it. Use it when you need PCRE features sed lacks — lookahead, \d, non-greedy *? — or logic too involved for either sed or awk.

Common Mistakes

Running -i with no backup against a regex that matches more than intended, overwriting an irreplaceable file with no copy to recover from.
Assuming GNU sed -i 's/x/y/' works on macOS or BSD, where the suffix is mandatory and -i '' is required for no backup — the GNU form treats the script as the backup suffix and edits the wrong thing.
Leaving / as the delimiter when the pattern contains file paths, so every slash needs escaping and one missed backslash silently changes the match.
Using an unanchored .* that matches greedily from the first character to the last on the line, rewriting text you meant to leave alone.
Writing the script in double quotes or unquoted, so the shell expands $, &, and backticks before sed sees them and the substitution does the wrong thing.
Relying on \d or \w Perl classes that POSIX BRE and ERE do not define, so the pattern matches a literal d or nothing at all.

Best Practices

Single-quote every sed script so the shell passes it through untouched and only sed interprets the metacharacters.
Run the script without -i first and read the stdout output; add -i.bak only once the transformation is confirmed correct.
Choose a non-/ delimiter such as | or # when the pattern contains paths, instead of escaping every slash.
Prefer -E for any regex with groups or alternation; the unescaped form is far easier to read and review.
Anchor patterns with ^ and $ to stop .* from over-matching across a line.
Reach for awk or perl -pe the moment a task needs multi-line state, arithmetic, or field-aware logic; sed's hold-space scripts for that become unmaintainable.

Comparable toolsawk — field- and record-oriented processing for column data and per-record stateperl -pe — the same line loop with the full Perl/PCRE regex enginePowerShell — the -replace operator for regex substitution on Windows

Knowledge Check

You run sed '/error/p' app.log without -n. What appears in the output?

Every line, with lines containing "error" printed twice — the auto-print plus the explicit p
Only the lines that contain "error", each printed exactly once
No output at all, because the p command needs -n before it will function
Only the first matching line that contains "error", with every subsequent match skipped entirely

A teammate's sed -i 's/x/y/' f.conf works on Ubuntu but corrupts files on macOS. Why?

BSD sed requires a suffix argument after -i, so it reads the script as the backup suffix; no-backup needs -i ''
macOS sed does not implement the s substitution command at all, so the whole script is silently rejected and the file mangled
macOS sed will only edit files whose name ends in .txt, and silently corrupts anything else
The single quotes around the script are interpreted differently by the default macOS shell

Your pattern uses \d to match a digit and it never matches. What is the portable fix?

Use a POSIX class like [[:digit:]], since BRE and ERE do not define \d
Add the -i flag, which turns on the Perl character classes that define \d
Switch the substitution delimiter from / to | so the class is parsed
Double-escape the sequence as \\d so sed passes it through to the engine

A task needs to sum the third whitespace-delimited column of a report. Why prefer awk over sed?

awk is field-aware with arithmetic and variables, while sed has no concept of columns or running totals
sed cannot read a report from a named file argument at all and will only ever accept its input piped in on stdin
sed silently truncates any input line longer than 1024 characters, dropping the column
awk runs faster than sed on every possible input by design, so it finishes the sum first

Why give -i a suffix, as in -i.bak, on an important file?

It writes the original to file.bak before editing, giving you a copy to restore if the script matched more than intended
It makes the in-place edit fully atomic against a crash or power loss mid-write, which a plain -i with no suffix otherwise can never guarantee
It forces sed to skip writing back any line that the script would not actually change
It enables case-insensitive matching across every substitution in the whole script

You got correct