Large Files and Partial Clones
Git stores every version of every file forever, which is exactly what you want for source code and exactly what you do not want for a 200 MB binary that changes every week. The same design that makes history cheap for text makes it ruinous for large or numerous binaries, and the bloat is permanent once it is committed.
There are two separate problems and two separate families of fixes. Git LFS keeps large files out of the packfiles entirely by storing pointers; shallow clone, partial clone, and sparse-checkout each fetch less of the history or tree so you do not download data you will never use.
Why Big Files Hurt
Git's storage relies on delta compression: similar versions of a text file are stored as small diffs against each other. Binaries — images, compiled artifacts, video — do not delta-compress, so each version is stored in full. Because Git keeps every version forever, a binary edited fifty times bakes fifty full copies into history, and every clone downloads all of them permanently.
The only real fix is to keep those blobs out of the packfiles in the first place; removing them after the fact requires rewriting history, which is disruptive for everyone.
Git LFS
git lfs track "*.psd" writes a filter rule into .gitattributes. From then on, committing a matching file stores a small text pointer in Git while LFS uploads the actual blob to a separate store; on checkout, LFS fetches the real content back. Your history stays small because it carries pointers, not bytes.
The catch is that the filter lives in .gitattributes: if you do not commit that file, other clones never see the rule and store the raw binaries instead of pointers, defeating the whole arrangement.
Shallow Clone
git clone --depth 1 fetches only the most recent commit (or the last N with a larger depth), truncating history. The clone is fast and small, which is ideal for a throwaway checkout. The cost is that the missing history is genuinely absent: git blame, git log beyond the cut, and merges that reach across the boundary fail with "shallow" errors.
That makes shallow clone a tool for ephemeral environments, not for a working copy where you will investigate history or integrate branches.
Partial Clone
git clone --filter=blob:none keeps the full commit history but skips downloading file contents until something actually needs them, fetching blobs on demand. A variant, --filter=blob:limit=1m, downloads small blobs up front and defers only the large ones.
This is the right choice when you need real history — for blame and cross-history merges — but do not want to pull down every blob of every version up front, as in a large repo with heavy assets.
Sparse-Checkout
git sparse-checkout set --cone <dirs> populates only the directories you name from a monorepo, leaving everything else out of the working tree while history stays intact. Cone mode restricts patterns to whole directories, which lets Git match them quickly.
Without cone mode, sparse-checkout falls back to full pattern matching against every path, which is noticeably slow on a huge repository.
Shallow clone (--depth 1) truncates history: you lose old commits, blame, and cross-history merges. Fast and tiny, but a one-way trip for anything that reaches back.
Partial clone (--filter=blob:none) keeps full history but defers downloading file contents until accessed. Use shallow for throwaway CI checkouts, partial clone when you need history but not every blob up front.
- Committing large binaries before setting up LFS, baking the blobs into history so removing them needs a full history rewrite.
- Using a
--depth 1shallow clone in CI then runninggit blameor merging across the cut and hitting "shallow" errors. - Setting up LFS but forgetting to commit
.gitattributes, so other clones store the raw files instead of pointers. - Running out of LFS storage or bandwidth quota mid-release because the budget was never planned.
- Using sparse-checkout without cone mode on a huge repo and suffering slow non-cone pattern matching.
- Track binaries with
git lfs trackand commit the resulting.gitattributesbefore adding the files. - Use
git clone --filter=blob:nonefor large repos where you need history but not all content up front. - Reserve
--depth 1for ephemeral CI checkouts that never blame or merge across history. - Scope a monorepo working tree with
git sparse-checkout set --cone <dirs>. - Audit bloat with
git lfs ls-filesandgit count-objects -vHbefore it becomes unmanageable.
Knowledge Check
Why do binary files defeat Git's delta compression?
- They do not delta well, so each version is stored in full, and Git keeps every version forever
- Git refuses to store binaries at all and silently drops them on commit, recording only their file paths while discarding the actual bytes during the staging step
- Binaries are kept only on the server and never copied to local clones, which fetch lightweight placeholders instead and stream the real bytes on demand
- Git compresses binaries so well that the repository history shrinks instead
What does a shallow clone omit that a partial clone does not?
- Shallow omits old history; partial keeps full history but defers downloading blob contents
- Shallow omits the blob contents and fetches them lazily; partial omits the older history beyond a depth boundary
- They both omit exactly the same commits and objects, applying one identical truncation rule so the two clone types end up with the same trimmed-down history
- Shallow omits the working tree; partial omits the staging index
Why does LFS require committing .gitattributes to work for everyone?
- The track rule lives in
.gitattributes; without it, other clones store raw files instead of pointers - It holds the LFS server password and authentication tokens, so without committing it other clones cannot log in to the large-file store and download the real binaries
- Git rejects any commit made in a repo lacking the file, aborting the commit with an error until a valid
.gitattributesis present at the repository root - It is only needed on the central server, never in clones
When does a shallow clone break blame and merge?
- When the operation reaches across the truncated history that the
--depthcut left out - Only when the repository also stores files in LFS, since the missing pointer objects are what trip up blame and merge on a truncated clone
- Never; shallow clones fully support every history operation, reconstructing any missing commits on the fly whenever blame or merge needs them
- Only on the very first commit you make after cloning
What does cone mode optimize in sparse-checkout?
- It restricts patterns to whole directories so Git can match them quickly instead of scanning every path
- It compresses the working tree into a single cone-shaped pack file on disk, replacing the loose checked-out files with one bundled archive that Git unpacks on access
- It downloads file blobs lazily on demand like a partial clone
- It removes all commit history older than the cone depth, truncating the log at a boundary so the repository keeps only the most recent slice of commits
You got correct