Rewriting History at Scale
Sometimes the rewrite you need is not one commit but all of them: purge a leaked secret from every commit that ever contained it, strip giant binaries out of the entire history, or remap an author email across the whole repository. git filter-repo — and the older BFG Repo-Cleaner — rewrite every affected commit and remap all their descendants in a single streaming pass, far faster and safer than the deprecated git filter-branch.
Two facts dominate this topic. First, removing a file in a new commit does not remove it from history — the old blob is still there and still clonable, so a real purge requires rewriting every commit that held it. Second, and non-negotiable: a leaked secret that ever touched a pushed repository is compromised, so you rotate the credential regardless of how cleanly you rewrite. The rewrite stops future leaks of that blob; it does not un-leak what already escaped.
Why filter-branch Is Out
git filter-branch forks a shell process per commit, which makes it punishingly slow on any real history, and its many sharp edges make subtle corruption easy. Git's own documentation now actively discourages it and points users to git filter-repo instead. Treat filter-branch as legacy you may encounter in old runbooks, not as a tool to reach for.
git filter-repo
git filter-repo is a single Python tool that rewrites and remaps the whole repository in one streaming pass. Its common levers are --path and --invert-paths (keep or remove paths), --strip-blobs-bigger-than 10M (drop oversized files), --replace-text (scrub secret strings), and --mailmap (rewrite author and committer identities). Because it processes the object stream directly rather than spawning a process per commit, it finishes in a fraction of filter-branch's time.
git filter-repo --invert-paths --path secrets.env git filter-repo --strip-blobs-bigger-than 10M
BFG Repo-Cleaner
BFG is a JVM tool built for exactly two jobs: deleting big blobs with --strip-blobs-bigger-than and scrubbing secrets with --replace-text. Within that narrow scope it is fast and simple. It cannot do the path, identity, and ref surgery that filter-repo handles, so it is the right choice only when those two tasks are all you need.
The Aftermath
Any rewrite changes the SHA of every commit downstream of the earliest edit, because a commit's hash depends on its parents'. That means all existing forks and clones diverge from the rewritten history. You will force-push the rewritten refs, and every collaborator must re-clone — a merge or pull from an old clone would simply reintroduce the purged content under the old SHAs. This is why a history rewrite on a shared branch is a coordinated event, not a quiet cleanup.
Finishing the Cleanup
The rewrite makes the old objects unreachable, but they linger until you reclaim them. Run git reflog expire --expire=now --all to drop the reflog entries still referencing them, then git gc --prune=now --aggressive to actually delete the loose objects and shrink the repository. Skip this step and you will wonder why the repo size did not drop — the purged blobs are still on disk, merely unreferenced.
git filter-repo — general-purpose and the official recommendation. Handles paths, content replacement, author rewriting, and ref surgery in one streaming pass; reach for it for anything beyond the two narrow cleanup jobs.
BFG Repo-Cleaner — a JVM tool limited to stripping oversized blobs and replacing secret text. Faster and simpler for those two tasks, but cannot do path or identity rewrites; choose it only when its narrow scope is all you need.
- Believing a
git rmplus a normal commit removes a leaked secret — the old blob remains in history and stays clonable, so a full rewrite is required. - Rewriting history but not rotating the leaked credential, when the secret already left the building and must be treated as compromised regardless.
- Force-pushing rewritten history without warning the team, leaving everyone's clones diverged and their next pull reintroducing the purged content.
- Skipping the
git reflog expireplusgit gc --prune=nowstep and then wondering why the repository size never dropped. - Running a rewrite on a
mainshared by CI and release tooling without coordinating, breaking pipelines that pinned the now-changed old SHAs.
- Treat any leaked secret as compromised and rotate it immediately, regardless of how cleanly the history rewrite goes.
- Prefer
git filter-repo --invert-paths --path secrets.envover the deprecatedgit filter-branchfor removing a path. - Strip oversized files from all history with
git filter-repo --strip-blobs-bigger-than 10Mrather than committing a deletion. - Run the rewrite on a fresh clone first and verify the result before touching the canonical repository.
- Finish with
git reflog expire --expire=now --all && git gc --prune=nowto reclaim space, then force-push and tell the team to re-clone.
hg convert and filter extensions for similar bulk rewritesSubversion svndumpfilter over a dump/reimport, treating history as otherwise immutablePerforce no equivalent; history is immutableFossil no equivalent; history is immutable by designKnowledge Check
Why does deleting a file in a new commit fail to remove its data from history?
- The old blob still exists in earlier commits and remains reachable and clonable; only rewriting those commits removes it
- Git keeps a hidden protected copy of every deleted file forever by design, so a deletion commit can never reach it
- The deletion is staged but only takes its real effect after the next scheduled garbage collection finally sweeps the old blob away
- A normal commit can only add or modify files in the tree, never record a deletion, so the file simply stays
Why does every descendant commit's SHA change after a history rewrite?
- A commit's hash depends on its parents' hashes, so changing an ancestor cascades new hashes through all of its descendants
- Git randomly reassigns a fresh SHA to every commit each time filter-repo runs, regardless of whether the content changed
- Only the single rewritten commit gets a new SHA; its descendants keep their original hashes since their content is untouched
- The remote server assigns brand-new SHAs to the whole branch during the force-push step
When is BFG the right tool instead of filter-repo?
- When the job is only stripping oversized blobs or replacing secret text, which is BFG's narrow, fast specialty
- When you need to rewrite every author and committer email address consistently across the whole repository history
- When you must keep or remove specific paths by glob pattern, reshaping the directory layout across all commits
- Always, since filter-repo has been deprecated and BFG is now the officially recommended replacement
Why must you still rotate a leaked credential after a clean history rewrite?
- The secret already left the repository and must be treated as compromised; the rewrite removes the blob but cannot un-leak what was exposed
- The rewrite leaves the secret sitting in the local reflog permanently, where a reflog expire cannot reach it
- filter-repo keeps a readable backup copy of every removed secret under refs/original that anyone who clones the repository can still recover
- Rotation becomes optional once the rewrite plus gc complete, because the credential is then fully scrubbed
What does the git reflog expire plus git gc --prune=now step accomplish?
- It drops the reflog references to the purged objects and deletes them from disk, so the repository actually shrinks
- It uploads the rewritten history to the remote and updates the branch pointer there in a single step
- It restores the original pre-rewrite commits from the reflog in case the rewrite turned out to be wrong
- It automatically re-clones a fresh copy of the repository for every collaborator on the team so that their old objects get purged too
You got correct