grep and Regular Expressions
grep reads its input line by line and prints the lines that match a pattern. The pattern is a regular expression — a small language for describing sets of strings — and the same language drives sed, awk, and the search in every editor you will touch. Learn the regex model once and it pays off across the entire toolset, which is why this topic is worth more than the size of a single command suggests.
The trap is that there is no single regex. grep defaults to Basic Regular Expressions (BRE), where +, ?, |, and () are literal characters until you backslash-escape them; grep -E switches to Extended Regular Expressions (ERE), where those metacharacters are active without escaping; and grep -P hands the pattern to a Perl-compatible engine with lookarounds and non-greedy quantifiers. A pattern that works in one dialect silently matches the wrong thing — or nothing — in another. Knowing which dialect you are in is half of using grep correctly.
Basic grep
At its simplest grep PATTERN FILE prints matching lines, and with no file argument it reads standard input, which is how it lives in pipelines. The flags that matter day to day are few. -i makes the match case-insensitive, -v inverts it to print non-matching lines, -n prefixes each hit with its line number, -c prints only the count of matching lines, and -o prints just the matched text rather than the whole line — useful when you want to extract tokens, not read context.
# lines mentioning the word, case-insensitive, with line numbers grep -in error /var/log/syslog # everything that is NOT a comment or a blank line grep -v '^#\|^$' /etc/ssh/sshd_config # count failed SSH logins; -o pulls out just the matched IPs grep -c 'Failed password' /var/log/auth.log grep -oE '([0-9]{1,3}\.){3}[0-9]{1,3}' /var/log/auth.log
grep is line-oriented to its core: the unit of matching is a single line, and a pattern can never span a newline. Searching for foo.*bar finds foo and bar only when they sit on the same line. This is the most common surprise for people coming from editor search, and it shapes how you build pipelines — you filter lines first, then reach for sed or awk when the logic crosses line boundaries.
Regex Building Blocks
A regular expression is built from a handful of pieces. Anchors tie a match to a position: ^ is the start of the line, $ is the end, so ^root: matches only lines that begin with root:. Character classes match one character from a set: [0-9] is any digit, [a-zA-Z] any letter, [^0-9] any non-digit, and POSIX names like [[:space:]] and [[:alnum:]] stay correct across locales. A bare . matches any single character except newline.
Quantifiers control repetition. * means zero or more of the preceding atom, + one or more, ? zero or one, and {m,n} a bounded count — [0-9]{1,3} matches one to three digits. Grouping with () applies a quantifier to a whole subexpression and captures it for back-references, while alternation with | matches one branch or another: (GET|POST|PUT) matches any of the three HTTP methods. These atoms compose: ^([0-9]{1,3}\.){3}[0-9]{1,3}$ is a rough IPv4 matcher.
| Construct | Meaning | Example |
|---|---|---|
^ $ | start / end of line | ^Jan, denied$ |
. [...] | any char / one from a set | [0-9], [^/] |
* + ? | 0+, 1+, 0-or-1 repetitions | ab*, colou?r |
{m,n} | bounded repetition | [0-9]{1,3} |
( ) | | group / alternation | (cat|dog)s? |
\b \w | word boundary / word char (PCRE) | \bsudo\b |
BRE versus ERE versus PCRE
The dialect decides what needs escaping. In BRE — plain grep — the characters +, ?, {, |, (, and ) are ordinary literals, and you must write \+, \?, \{, \|, \(, \) to make them act as metacharacters. ERE — grep -E — reverses this: those characters are active by default, and you backslash them only when you want the literal. ERE is what most people picture when they think "regex," and it is the readable default for anything beyond a fixed string.
# BRE: groups and alternation need backslashes grep '\(GET\|POST\)' access.log # ERE: the same pattern, no backslashes grep -E '(GET|POST)' access.log # PCRE: lookbehind and non-greedy, unavailable in BRE/ERE grep -P '(?<=user=)\w+' auth.log
grep -P is a different engine entirely: PCRE, the Perl-compatible library. It adds lookahead and lookbehind ((?=...), (?<=...)), non-greedy quantifiers (*?, +?), and the convenience shorthands \d, \w, \s, and \b. The cost is portability: -P is a GNU extension — the manual flags it as experimental when combined with -z — and on minimal images or BSD it may be absent. On macOS and the BSDs the default grep is the BSD implementation, where -P is unsupported entirely unless you install GNU grep. Reach for -P when ERE genuinely cannot express what you need — and not before.
Context and Recursive Search
A matching line is often useless without the lines around it. -A N prints N lines after each match, -B N N lines before, and -C N N lines on both sides — indispensable for reading a stack trace or a log event that spreads across several lines. For locating files rather than lines, -l prints only the names of files that contain a match and -L the names of files that do not.
# show the matching line plus 3 lines of trailing context grep -A3 'Traceback' app.log # recursive code search, only in Python files, case-insensitive grep -rn --include='*.py' -i 'todo' ./src # which config files reference the old hostname grep -rl 'db-old.internal' /etc
grep -r walks a directory tree, and --include and --exclude globs narrow it to the file types you care about, while --exclude-dir=.git keeps you out of version-control noise. For searching a whole codebase this is workable, but it has no concept of a .gitignore and scans every byte of every matched file, which is exactly where the purpose-built code searchers pull ahead.
Fixed Strings and Performance
grep -F — historically the fgrep command — turns off regex entirely and treats the pattern as a literal string. Use it whenever your needle contains regex metacharacters that you mean literally: searching for an IP address 10.0.0.1, a version string 1.2.3, or a path /usr/lib/x86_64. Without -F, every . in those is a wildcard, so grep 10.0.0.1 also matches 1000x010. -F makes the match both correct and faster, because the engine can use a plain string search instead of compiling an automaton.
Performance also turns on the regex itself. Greedy quantifiers like .* backtrack hard on long lines, and a poorly bounded pattern can take seconds where a tight one takes milliseconds. Anchoring with ^ lets the engine reject most lines after the first character, and a fixed-string prefix lets it skip whole lines outright. Setting LC_ALL=C before a byte-oriented search drops Unicode collation work and speeds up large scans measurably — at the cost of locale-correct case folding, so reserve it for ASCII data.
BRE — the default for plain grep and sed. +, ?, {}, |, and () are literal until escaped with a backslash. Fine for fixed-shape patterns; reach for it when you are stuck with portable, POSIX-only tooling.
ERE — grep -E (and sed -E, awk). Those same metacharacters are active without backslashes, so patterns read the way most people expect. Make this your default for any pattern with grouping or alternation.
PCRE — grep -P, a separate Perl-compatible engine. Adds lookaround, non-greedy *?, and \d/\w/\b shorthands. Use it only when ERE cannot express the match — and only where it is installed, since BSD and macOS grep lack it.
- Writing a multiline pattern like
foo.*barand expecting it to match across lines —grepmatches one line at a time, so a match spanning a newline never fires, and you silently get zero hits. - Using
+,?, or|in plaingrepand wondering why nothing matches — in BRE they are literal characters; either escape them (\+) or switch togrep -E. - Searching for a literal string full of dots, such as
grep 10.0.0.1, without-F— each.is a wildcard, so the pattern also matches10x0y0z1and produces false positives. - Leaning on
grep -Pin a script that runs on Alpine, BSD, or macOS — the default grep there has no PCRE support, and the script fails with an unrecognized-option error in production. - Catastrophic greedy patterns:
.*against very long lines forces heavy backtracking, turning a one-second search into a hang on large logs. - Recursing with
grep -rstraight through.git,node_modules, and binary blobs — results drown in noise and the scan crawls, when--exclude-dirand--includewould have scoped it.
- Use
grep -Fwhenever the pattern is a literal string with dots, slashes, or brackets — it is both correct and faster than letting them act as metacharacters. - Default to
grep -Efor any pattern with grouping or alternation; the absence of backslash-escaping makes the intent readable and reduces dialect mistakes. - Anchor patterns with
^or$when you know the position — it kills false matches and lets the engine reject most lines after the first character. - Scope code searches with
grep -rn --include='*.ext' --exclude-dir=.gitrather than a bare-r, so you read signal instead of binaries and vendored trees. - Pull context with
-A,-B, or-Cwhen reading logs and stack traces — a bare matching line rarely carries enough to act on. - Reserve
grep -Pfor lookaround or non-greedy needs, and confirm GNU grep is present before depending on it in a portable script. - Prefix heavy ASCII scans with
LC_ALL=Cto skip Unicode collation, but keep the locale default when correct case folding on non-ASCII text matters.
.gitignore and skip binaries by defaultWindows findstr — the built-in line filter, with a limited, non-POSIX pattern syntaxKnowledge Check
Why does grep '(GET|POST)' access.log fail to match either method, while grep -E '(GET|POST)' works?
- Plain
grepuses BRE, where(,), and|are literal characters;-Eswitches to ERE, where they act as grouping and alternation - The first form is case-sensitive by default, so the method names will only ever match once you also pass the
-iflag to ignore case -Equietly enables recursive directory search, which grep requires before alternation will work at all- BRE is unable to match uppercase letters at all unless they are wrapped in an explicit character class
You want to find the literal string 10.0.0.1 in a log. Why is grep -F '10.0.0.1' the better choice than grep '10.0.0.1'?
- Without
-Feach.is a wildcard, so the pattern also matches strings like10x0y0z1;-Ftreats it as a literal and avoids false positives -Fswitches the whole search to case-insensitive matching, which is exactly what you need to match dotted IP addresses reliably across a log- Without the
-Fflag,greponly ever scans the very first line of the file and stops there -Fis the flag that is required beforegrepwill print the line number alongside each match
A script using grep -P works on your Ubuntu workstation but errors out on an Alpine container. What is the most likely reason?
-Pneeds the PCRE-backed GNU grep; Alpine's BusyBox grep lacks PCRE support, so the option is unrecognized- PCRE patterns are only ever treated as line-oriented on Debian-based systems, which is precisely why Alpine rejects them
-Prequires root privileges that the unprivileged Alpine container does not grant by default- Alpine disables regular expressions entirely to save image space, so every
greppattern fails there
Why does grep 'foo.*bar' app.log miss a case where foo appears on one line and bar on the next?
grepmatches one line at a time, so.*never crosses a newline; a match spanning two lines cannot fire.*is greedy and overshoots pastbarwhenever the two words sit too far apart on the line- The pattern silently requires the
-zflag by default and fails without it on essentially every input file grepstops scanning at the very first match it finds and so never reads as far as the second line
You got correct