The Object Model
Git's database stores exactly four kinds of object — blob, tree, commit, and annotated tag — and every one of them is addressed by the hash of its own content. Once you can name those four and trace how a commit resolves down to the bytes of a file, the rest of Git stops being magic. Branches, merges, even history rewriting are all operations on this one small set of immutable, content-addressed objects.
The payoff of learning the model directly is that the plumbing commands — git cat-file, git hash-object, git ls-tree — let you see precisely what Git stored, rather than guessing from the porcelain. When something looks wrong, the object database is the ground truth you can always fall back to.
The Four Object Types
A blob holds the raw bytes of a file with no name and no metadata — just content. A tree is a directory listing: a sorted set of entries, each pairing a name with a file mode and the SHA of a blob or sub-tree. A commit points at one root tree, lists zero or more parents, and carries the author, committer, timestamps, and message. An annotated tag is a named, optionally signed pointer to another object, usually a commit.
The split between blob and tree is the part worth dwelling on: a tree records names, modes, and child SHAs, but never the file content itself. The content lives only in blobs. That separation is exactly why renaming a file is cheap — the blob is untouched and only one tree entry changes.
Content-Addressable Storage
Every object's ID is the hash of a small header plus its content, so the name is a digest of the bytes. git hash-object <file> computes that ID without writing anything; git cat-file -p <sha> reads the object back. Because identical bytes always hash to the same value, the same content is stored exactly once no matter how many commits or paths reference it.
This is also what makes corruption detectable. Flip a single byte in a stored object and its hash no longer matches its name, so git fsck can spot the mismatch. Silent rot and tampering cannot hide in a content-addressed store.
How a Commit Resolves to Files
A commit does not contain files; it contains a pointer to a root tree, which points at sub-trees and blobs. You can walk that chain by hand. Start at the commit, read its tree SHA, list the tree, then print a blob:
$ git cat-file -p HEAD tree 9d2f8c41a7b6e0d3c5f1a9b8e2d7c4a06f3b5e1d parent a1b2c3d4e5f6... author Sergey <s@example.com> 1700000000 +0000 committer Sergey <s@example.com> 1700000000 +0000 Add parser $ git ls-tree HEAD 100644 blob 8d0e41234f... README.md 040000 tree 9a7b3c2d1e... src
Each step is a direct hash lookup, which is why checking out any commit is reading a snapshot rather than replaying diffs. The diff you see in git show is computed on demand by comparing two trees, not stored anywhere.
SHA-1 and SHA-256
Git's default object format is still SHA-1, hardened since 2.13 against the SHAttered collision attack with collision detection. Git also supports SHA-256 repositories, chosen at creation with git init --object-format=sha256. The two formats are not yet interoperable: a SHA-1 repo and a SHA-256 repo cannot exchange objects directly, so the choice only makes sense for a greenfield repo with no SHA-1 interop needs.
- Treating a tree object as if it held file content, then being surprised that
git cat-file -p <tree>shows names and SHAs but no bytes — the bytes live in the blobs the tree points at. - Assuming a lightweight tag creates a tag object, then looking for tagger metadata that does not exist — only
git tag -awrites an annotated tag object; a lightweight tag is just a ref. - Believing that renaming a file produces a new blob and inflates the repo — the content is identical, so the blob SHA is unchanged and only the tree entry moves.
- Expecting a SHA-1 and a SHA-256 repository to push and fetch between each other today — the formats are not yet interoperable.
- Reasoning that two files with identical content cost twice the storage — content-addressing stores that blob exactly once.
- Check any object's type with
git cat-file -t <sha>and dump its content withgit cat-file -p <sha>before guessing what it is. - Compute an object ID without writing it using
git hash-object <file>when you want to predict a SHA. - Trace how a path is stored by walking commit to tree to blob with
git ls-tree HEAD. - Pick SHA-256 with
git init --object-format=sha256only for a new repo that will never need SHA-1 interop. - Run
git fsckperiodically to verify object integrity across the database.
Knowledge Check
Where do a file's actual bytes live in Git's object model?
- In a blob object; the tree only records the name, mode, and the blob's SHA
- In the tree object, stored inline alongside that directory's listing of names
- In the commit object, packed in right next to the log message
- In the index, which the commit then copies in verbatim
Why does renaming a file not create a new blob?
- The content is unchanged, so its hash is unchanged; only the tree entry naming that blob changes
- Git records the rename as a dedicated move entry in the tree rather than touching the underlying blob at all
- The old blob is deleted and a byte-identical one is rewritten under the new name
- Renames are tracked only as a note in the commit message text
What distinguishes an annotated tag from a lightweight tag?
- An annotated tag is a real object with tagger, date, message, and optional signature; a lightweight tag is just a ref
- A lightweight tag is the one that gets cryptographically signed with your key, whereas an annotated tag never carries a signature
- Annotated tags are purely local and cannot be pushed to a remote
- There is no real difference; the two names are just aliases
Why can content-addressing detect silent corruption?
- The object's name is the hash of its content, so any byte change makes the name stop matching the bytes
- Git keeps a separate per-object checksum file on disk that it reads back and compares against on every single access
- The underlying operating system flags any modified object files automatically
- Each object records its own last-modified timestamp for comparison
You got correct