Topic 23

The Index in Depth

Internals

The index — the staging area — is a single binary file at .git/index. It holds a flat, sorted list of every tracked path along with that path's blob SHA, file mode, and a cache of stat data. It is the proposed contents of your next commit, and it is also the data structure that makes git status fast enough to run on every prompt.

Most people meet the index as "the thing git add writes to." Looking at what it actually stores explains a lot: why status is quick, why a file edited after add is not staged, and why two confusingly similar flags exist for "stop noticing my local changes."

What the Index Actually Stores

Each entry is a path with its mode, the SHA of the blob holding that path's content, and cached stat fields — modification time, size, inode. Critically, the index references a blob SHA; it does not store the file's bytes. The content already lives in the object database, and the index just names which blob represents each path right now.

The stat cache is the performance trick. To answer "did this file change?", Git compares the file's current stat data against the cached values; only if they differ does it bother re-hashing the content. That is how git status stays fast in a large tree — it skips reading files that look untouched.

The Three Trees and the Index Between Them

The index sits between HEAD (your last commit) and the working tree (files on disk). The two diff commands target the two boundaries: git diff compares the working tree against the index, showing unstaged changes, while git diff --cached compares the index against HEAD, showing what is staged for the next commit. Knowing which boundary you are looking at is what stops a staged change from appearing to vanish.

Plumbing Around the Index

Three plumbing commands operate on the index directly. git ls-files --stage dumps every entry with its mode and blob SHA, git update-index manipulates entries, and git write-tree turns the current index into a tree object — exactly the step git commit performs internally before recording the commit.

$ git ls-files --stage
100644 8d0e41234f5a6b7c... 0	README.md
100644 1a2b3c4d5e6f7a8b... 0	src/main.py
$ git write-tree
7a1c9e4f2b8d6c0a3f5e1b9d8c2a4e6f0b3d7c5a

assume-unchanged and skip-worktree

Two flags look similar and are constantly confused. git update-index --assume-unchanged <file> is a performance hint: it tells Git to stop checking that file for changes, but Git is free to ignore the promise and will happily clobber the flag on a checkout or merge. git update-index --skip-worktree <file> is the right tool for "keep my local edits to a tracked config file" — it is designed to persist your changes and survives far more operations.

The practical rule: reach for --skip-worktree when you want local edits to a tracked file to stick, and treat --assume-unchanged as nothing more than a speed hint you cannot rely on.

The Sparse Index

On a monorepo, sparse-checkout cone mode plus a sparse index lets the index store a single collapsed tree entry for whole directories outside your cone, instead of one entry per file. That keeps operations cheap when millions of paths exist outside the slice you actually work in. Enable it with git sparse-checkout set --cone <dirs> and index.sparse=true; without cone mode the index cannot collapse those entries and you get none of the benefit.

assume-unchanged vs skip-worktree

assume-unchanged — a performance hint that Git may disregard, and that checkout or merge will reset, overwriting your local edits. Use it only to speed up status on files you genuinely will not touch, never to protect local changes.

skip-worktree — designed to persist local edits to a tracked file and to survive most operations. This is the correct flag for "ignore my local changes to a committed config file." Choose it whenever the goal is keeping your edits.

Common Mistakes

Using --assume-unchanged to hide local config edits, then losing them when a pull or merge resets the flag and overwrites the file.
Believing the index stores file content and reasoning about its size accordingly — it holds blob SHAs and stat data; the content is in the object store.
Letting a script write files directly and then committing with a stale index, so the changes silently miss the commit because they were never staged.
Enabling the sparse index without cone mode and getting none of the index-collapsing benefit, since only cone mode can collapse out-of-cone trees.
Reading plain git diff after staging and seeing nothing, then assuming the change was lost, when git diff --cached is the view against HEAD.

Best Practices

Persist local edits to a tracked file with git update-index --skip-worktree <file>, not --assume-unchanged.
Inspect exactly what is staged with git ls-files --stage.
Distinguish unstaged from staged changes using git diff versus git diff --cached.
Enable the sparse index on a large monorepo with git sparse-checkout set --cone <dirs> plus index.sparse=true.
Materialize the current index into a tree for inspection with git write-tree.

Comparable toolsMercurial commits straight from the working dir; dirstate caches status but is not a staging treeSubversion commits the working copy, no staging indexPerforce changelists group files but are not a content snapshot

Knowledge Check

What does each index entry reference for a path's content?

The blob's SHA, along with the mode and cached stat data — not the bytes themselves
A full inline copy of the file's current content, stored verbatim inside the index entry itself
A line-level diff of the path against its version in HEAD
The working-tree path only, with content resolved lazily on read

Why does the index cache stat data like mtime and size?

So git status can skip re-hashing files whose stat data is unchanged, keeping it fast
To keep a redundant backup copy of each file's metadata for later recovery after corruption
To let Git restore each file's original timestamps on the next checkout
Because the resulting commit object requires the stat fields

You want your local edits to a tracked config file to stick. Which flag?

git update-index --skip-worktree <file>
git update-index --assume-unchanged <file>
git rm --cached <file>
git add --intent-to-add <file>

What does git diff --cached compare?

The index against HEAD — what is staged for the next commit
The working tree against the index — your as-yet unstaged edits to tracked files
The working tree directly against HEAD, ignoring the index entirely
Two arbitrary commits passed explicitly as arguments

Why does a sparse index need cone mode to help?

Only cone mode lets the index collapse whole out-of-cone directories into a single tree entry
Cone mode encrypts the out-of-cone paths so they take up far less space on disk
Without cone mode the index is stored as plain text and so it cannot be made sparse
Cone mode disables the per-file stat cache entirely, and that cache is the thing that was slowing status down

You got correct