Topic 01

What Version Control Is

Concept

Version control is a system that records every change to a set of files over time, so you can recover any past state, see who changed what and why, and let several people work on the same code without overwriting each other. Without it, your history is a folder of report_final_v2_REAL_fixed.zip files and a Slack thread nobody can reconstruct.

That recovery-and-collaboration guarantee is the whole point. The tool stores not just the latest bytes but the full sequence of states that produced them, each labelled with an author, a time, and — if you do it right — a reason. Git is the version control system that won; the rest of this course is about using it deliberately rather than by superstition.

The Cost of No Version Control

Picture the alternative honestly. Backups are manual, so they are skipped under deadline. Two people edit the same file and reconcile it by emailing copies back and forth. A change breaks production and nobody can answer the only question that matters — what changed, and when — because the previous version was overwritten in place. Every one of these is a failure of memory, and version control exists to give a project a memory that does not depend on anyone's discipline.

The subtler cost is the one you feel six months later. A line of code looks wrong, you are about to delete it, and there is no record of why it was added. With version control that line carries its history: the commit that introduced it, the message explaining the bug it fixed, the change that shipped alongside it. Delete it now and you may reintroduce the exact bug it was written to prevent.

Centralized vs Distributed Models

Two designs dominate. A centralized system — Subversion, Perforce — keeps the single authoritative history on a server. To commit, view history, or branch, you talk to that server. A distributed system — Git, Mercurial — gives every clone the entire history. You commit, branch, diff, and read the log offline, then synchronize with others when you choose.

The practical consequences are large. In Git, committing on a plane is normal because the repository is on your disk, not behind a network call. Branching is nearly free because a branch is a pointer, not a server-side copy. The cost of the distributed model is symmetric: every clone holds the full history, which is exactly why a repository stuffed with large binaries becomes painful — everyone pays for every version forever.

Centralized vs Distributed

Centralized (SVN, Perforce) — one server holds history; commits and history browsing need the network, and the server can lock individual files. Choose it when you must lock un-mergeable binary assets or enforce a single point of control.

Distributed (Git, Mercurial) — every clone has the full history and works offline; branching and merging are cheap. Choose it for almost all source code, and accept that multi-gigabyte binary histories are its weak spot.

What a Commit Records

A commit is not a diff in a mailbox. It records a complete snapshot of the tracked files at one moment, plus metadata: who made it, when, which commit it followed, and a message. The snapshot is what lets you restore any past state exactly; the parent link is what turns a pile of commits into a navigable history; the message is the part future-you actually reads.

That message is where most of the durable value lives, and it is the part most often wasted. "fix" tells you nothing in a month. A message that explains why the change was made — the bug, the constraint, the decision — turns the history into documentation that can never drift from the code, because it ships with it.

Git as the De Facto Standard

Git is not the only version control system, and it is not the friendliest. It won anyway. The overwhelming majority of new projects use it, which means the tooling, the hosting, the integrations, and the hiring pool all assume it. Learning Git well is no longer optional knowledge for a working engineer; it is the substrate everything else sits on.

That dominance is the practical reason to invest in the mental model rather than memorizing commands. The same Git concepts power GitHub, GitLab, your CI system, and your editor's source-control panel. Understand how Git stores history once, and every tool built on top of it stops being magic.

Common Mistakes

Treating version control as a backup tool and never writing real commit messages — six months later no commit explains why a line exists, throwing away the one benefit that does not degrade over time.
Committing generated artifacts and dependencies like node_modules/ or build output — the repository bloats to gigabytes and a fresh clone takes minutes instead of seconds.
Storing large binaries — video, datasets, game assets — directly in Git, where every version is kept forever in every clone and the history cannot be shrunk without a disruptive rewrite.
Assuming a hosting service like GitHub is the version control. The service is one remote copy; the version control is local and keeps working when the service is down.

Best Practices

Run git init on day one, before the first line of real code, so the history starts clean rather than with one giant "initial dump" commit.
Commit in logical units, each with a message that explains why the change was made, not what the diff already shows.
Keep one concern per repository; resist the urge to dump unrelated projects into a single repo to save setup time.
Route binaries larger than a few megabytes to Git LFS or a separate artifact store instead of committing them straight into history.

Comparable toolsMercurial distributed, like GitSubversion centralizedPerforce centralized, strong on large binariesFossil distributed, bundles issues and wiki

Knowledge Check

Why can you commit to a Git repository while offline on a plane, but not to a Subversion one?

Git is distributed, so the full history lives in your local clone; SVN is centralized and a commit must reach the server
Git compresses each commit and holds it temporarily in memory so that it fits until you reconnect back to the central server
SVN can commit offline into a local cache on disk too; it just syncs each of those commits up to the server more slowly afterward
Git only queues the commit in a buffer while you are offline and really records it for good once a network connection returns

What is actually lost when a team skips writing meaningful commit messages?

The durable record of why each change was made — the one benefit of version control the code itself cannot reconstruct
The ability to roll back to an earlier state, since checking out any past commit depends entirely on parsing its message text
The snapshots of file content themselves, which Git reconstructs on demand from the exact wording of each commit message
Nothing of any real consequence; commit messages are purely decorative text that carries no later value to anyone

Why does committing node_modules/ hurt a repository more than it helps?

Every version of thousands of dependency files is kept forever in every clone, bloating the repo and slowing clones
Git flatly refuses to diff deeply nested directories, so the folder permanently breaks git status for the whole repository
A dependency folder cannot be committed into the repository at all unless you first enable Git LFS to track every single file
It exposes the project's full dependency list to anyone holding a clone, a list that would otherwise have stayed private

A teammate says "GitHub is our version control." What is the precise correction?

Git is the version control and runs locally; GitHub is one remote host for it, and work continues if GitHub is down
They are genuinely the very same product, simply marketed under two interchangeable names by one and the same company
GitHub is the actual version control engine; Git is merely its bundled command-line client used for talking to that engine
Neither one is the real version control; the .git folder sitting on your disk is the actual system that does the work

You got correct