The Commit Graph
Git history is a directed acyclic graph of commits. Each commit points back at its parent, or parents, and a branch is nothing more than a movable pointer to one node in that graph. HEAD, in turn, points at whichever branch you currently have checked out. Once you can see the graph instead of a list, every rewrite, merge, and recovery in this chapter becomes a question of which nodes are reachable from which refs.
This is the topic that turns Git from a set of memorized commands into something you can reason about. The plumbing commands — git rev-parse, git cat-file, git merge-base — let you read the graph directly, and the ancestry notation lets you name any node relative to another without copying a single hash by hand.
Commits as Snapshots, Not Diffs
Every commit stores a pointer to a full tree object plus the SHA of its parent or parents. It does not store a diff against the previous commit; the diff you see in git show is computed on demand by comparing two trees. This is why a commit is conceptually a complete snapshot of the project, yet storage stays small: unchanged files in the new tree point at the exact same blob SHAs as before, so identical content is shared rather than copied.
Run git cat-file -p HEAD and you see the raw commit: a tree line, one or more parent lines, the author and committer, then the message. Nowhere is there a delta. The snapshot model is the literal on-disk truth, not a teaching simplification.
Parents and Merge Commits
A normal commit has exactly one parent. A merge commit has two or more. The very first commit in a repository — the root — has zero. Those parent pointers are what make the history a graph rather than a flat line, and specifically a DAG: you can never create a cycle, because a commit's hash depends on its parents' hashes, so a commit cannot be its own ancestor.
The number of parents also explains a common surprise: git show of a merge commit shows a combined diff (the --cc format) by default — only the hunks that differ from every parent at once, which on a conflict-free merge is often very little. Pass --first-parent to see the change relative to the mainline only, or -m for a separate diff against each parent. (git log, by contrast, omits a merge's diff entirely unless you ask for it.)
Refs, HEAD, and Detached HEAD
Branches and tags are tiny files under .git/refs, each holding one 40-character SHA. HEAD is normally a symbolic ref — a file containing ref: refs/heads/main rather than a hash — which is how Git knows which branch a new commit should advance. Check out a raw commit instead of a branch and HEAD goes detached: it now holds a SHA directly, and any commit you make is reachable only from HEAD itself.
Detached-HEAD work is real and saved on disk, but the moment you check out a branch again, nothing points at it. It survives only via the reflog until garbage collection eventually reclaims it.
Reachability and Garbage Collection
An object is alive if it is reachable by following pointers from any ref — a branch, a tag, the reflog, the index. Everything unreachable is a candidate for garbage collection. This single rule underlies pruning, fetch negotiation, and what git log will show you. Deleting a branch does not delete its commits; it removes one pointer, and the commits live on as long as they remain reachable from another ref or the reflog.
Reading the Graph
The default view worth memorizing is git log --oneline --graph --decorate --all, which draws the branch topology before you touch anything. To name nodes, learn the two ancestry operators: ~ walks first-parents (so HEAD~2 is two commits straight back), while ^ selects a specific parent (so HEAD^2 is the second parent of a merge). The range syntax answers divergence questions directly: git log B..A lists commits on A but not B, and git merge-base A B names their common ancestor.
One caveat when reading git log: its ordering is topological, not strictly chronological. After a rebase, author dates can run backwards relative to commit order, so do not treat the printed sequence as a clock.
- Treating
HEAD~2andHEAD^2as the same thing —~2walks two first-parents back, while^2selects the second parent of a single merge, so the two resolve to entirely different commits. - Assuming a commit stores the diff from its parent, then being surprised that
git showof a clean merge shows almost nothing — its default combined (--cc) diff lists only hunks that differ from every parent, and a conflict-free merge has few or none. - Committing while on a detached HEAD and then checking out a branch, losing the work because no ref pointed at the new commit and only the reflog still holds it.
- Believing that deleting a branch deletes its commits — the commits survive until garbage collection as long as they stay reachable from another ref or the reflog.
- Reading
git logoutput as chronological order when it is topological, so author dates appear to run backwards after a rebase and you misjudge what happened when.
- Use
git rev-parse --short HEADin scripts to capture the current commit, rather than parsinggit logoutput that can change format. - Inspect any object's contents with
git cat-file -p <sha>and confirm its type withgit cat-file -t <sha>before acting on it. - Visualize topology with
git log --graph --oneline --decorate --allbefore any rewrite, so you know what the graph looked like going in. - Answer "which commits are on A but not B" with the range syntax
git log B..Ainstead of eyeballing two logs. - Name the divergence point explicitly with
git merge-base A Bwhen reasoning about how two branches relate.
Knowledge Check
A commit is a full snapshot, yet repository storage stays small. Why?
- The new tree reuses the same blob SHAs for unchanged files, so identical content is shared rather than copied
- Each commit stores only a compressed line-by-line diff computed against its immediate parent, never a full tree
- Git keeps only the latest snapshot on disk and quietly discards the older ones once a newer commit lands
- The remote server deduplicates the repeated content for you when you push, shrinking what stays locally
What does A...B (three-dot) select that A..B (two-dot) does not?
- The symmetric difference — commits reachable from either A or B but not both, rather than just those on B and not A
- Exactly the same set of commits as two-dot, only printed in reverse chronological order
- Only the merge commits that sit between A and B, filtering out every ordinary single-parent commit
- Every commit reachable from both A and B at the same time, that is, only their shared common ancestry below the divergence point
You commit on a detached HEAD, then check out main. What is the status of that commit?
- It exists on disk and is reachable only through the reflog until garbage collection eventually reclaims it
- It was never written to disk at all, because Git blocks the commit outright whenever HEAD is detached and refuses to create the object
- It is automatically grafted onto
main, which advances its pointer to include the new commit - It is deleted instantly and unrecoverably the moment you switch branches away from it, with no reflog entry left behind
What decides whether an object survives garbage collection?
- Whether it is reachable by following pointers from any ref, including the reflog
- Whether the object was created within the last 24 hours
- Whether it has already been pushed to at least one remote
- Whether its SHA is mentioned anywhere in the text of a recent commit message on the current branch
You got correct