History & Why Git Won
On April 3, 2005, Linus Torvalds wrote the first lines of a new version control system. Two weeks later, Git was self-hosting. Within three months, it managed the entire Linux kernel. The speed of that development was not accidental — it was born from frustration and necessity.
The Linux kernel had been using BitKeeper, a proprietary distributed version control system, under a special free-of-charge license since 2002. When Andrew Tridgell reverse-engineered parts of the BitKeeper protocol in early 2005, BitMover (the company behind BitKeeper) revoked the free license. Torvalds needed a replacement immediately, and none of the existing open-source tools met his requirements: speed, distributed operation, strong integrity guarantees, and the ability to handle a project the size of the Linux kernel (then around 6.5 million lines of code across 20,000 files).
Think of version control like a library's card catalog system. In the early days of centralized systems like CVS (1990) and Subversion (2000), there was one master catalog in a single building. Every librarian had to phone the central office to check out or return a book. If the phone line went down, nobody could work. Git changed this to give every librarian a complete copy of the entire catalog. They can work independently, reorganize their local catalog however they want, and periodically synchronize with others when convenient.
The Timeline of Version Control
| Year | System | Type | Key Innovation |
|---|---|---|---|
| 1972 | SCCS (Source Code Control System) | Local | First VCS; stored deltas of individual files |
| 1982 | RCS (Revision Control System) | Local | Reverse deltas for faster access to latest version |
| 1990 | CVS (Concurrent Versions System) | Centralized | Network access; concurrent editing with merge |
| 2000 | Subversion (SVN) | Centralized | Atomic commits; directory versioning |
| 2000 | BitKeeper | Distributed | First widely-used DVCS; inspired Git's model |
| 2005 | Git | Distributed | Content-addressable storage; SHA-1 integrity; extreme speed |
| 2005 | Mercurial | Distributed | Simpler CLI; revlog storage format |
Why Distributed Won
The fundamental difference between centralized and distributed version control is where the history lives. In SVN, the server holds the complete history and each developer has only a working copy. In Git, every clone is a full repository with complete history. This has profound consequences:
- Speed: Nearly every operation is local. A
git logon the Linux kernel takes milliseconds; the equivalent SVN command requires a network round trip. - Offline work: You can commit, branch, merge, and view history on an airplane. You synchronize when you reconnect.
- Redundancy: Every clone is a backup. If the central server burns down, any developer's machine holds the entire project history.
- Branching cost: Creating a branch in Git is writing 41 bytes to a file. In SVN, it copies the entire directory tree on the server.
By 2023, the Stack Overflow Developer Survey showed Git at 93% adoption among professional developers. GitHub hosts over 200 million repositories. GitLab, Bitbucket, and Azure DevOps all built their platforms on Git. The war is over. Git won.
The Git Object Model
Understanding Git's internal object model is the single most important thing you can learn to move from "memorizing commands" to "understanding what is actually happening." Every file, every directory, every commit in Git is stored as one of four object types in a content-addressable database.
Think of Git's object store like a post office with numbered mailboxes. When you hand the post office a package, they weigh it, measure it, generate a unique ID based on its contents, and place it in the corresponding mailbox. If someone else brings an identical package, it gets the same ID — and the post office realizes it already has one, so it does not store a duplicate. This is content-addressable storage: the address is derived from the content itself.
The Four Object Types
| Object Type | What It Stores | Analogy | Created By |
|---|---|---|---|
| Blob | File contents (no filename, no metadata) | A page of text with no title | git add |
| Tree | Directory listing: filenames, permissions, pointers to blobs/trees | A folder's table of contents | git commit |
| Commit | Pointer to root tree, parent commit(s), author, committer, message | A snapshot label with a "previous snapshot" link | git commit |
| Tag | Pointer to a commit, tagger info, annotation message, optional GPG signature | A named bookmark with a sticky note | git tag -a |
Examining Objects Directly
You can inspect Git's internal objects using low-level "plumbing" commands:
# See what type an object is
git cat-file -t HEAD
# Output: commit
# See the content of the HEAD commit
git cat-file -p HEAD
# Output:
# tree 4b825dc642cb6eb9a060e54bf899d69f4ef8c39e
# parent 8a2fb3c1d5e7a4b9c0e1f2a3b4c5d6e7f8a9b0c1
# author Wasil Zafar <wasil@example.com> 1711929600 +0000
# committer Wasil Zafar <wasil@example.com> 1711929600 +0000
#
# Add user authentication module
# See the root tree of a commit
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob a1b2c3d4... .gitignore
# 100644 blob e5f6a7b8... README.md
# 040000 tree 9c0d1e2f... src
# Inspect a specific blob
git cat-file -p a1b2c3d4
# Output: (contents of .gitignore)
How Commits Form a DAG
Every commit points to its parent commit (or parents, in the case of a merge). This forms a Directed Acyclic Graph (DAG) — a chain of snapshots where each one knows where it came from, but the chain never loops back on itself.
| Commit | Parent(s) | Description |
|---|---|---|
a1b2c3d |
(none — initial commit) | Initial project setup |
d4e5f6a |
a1b2c3d |
Add login page |
b7c8d9e |
a1b2c3d |
Add API endpoint (branched) |
f0a1b2c |
d4e5f6a, b7c8d9e |
Merge: combine login + API |
SHA-1 and Content Integrity
Every object in Git is identified by the SHA-1 hash of its contents (prefixed with the object type and size). This means that if even a single byte of any file, commit message, or author name changes, the hash changes. And because every commit includes its parent's hash, changing any historical commit would cascade hash changes through every subsequent commit. This makes Git's history tamper-evident by design.
# Compute the hash Git would assign to a string
echo -n "hello" | git hash-object --stdin
# Output: ce013625030ba8dba906f756967f9e9ca394464a
# Verify repository integrity
git fsck --full
# Checks every object's hash against its contents
Refs: Human-Readable Pointers
A "ref" is simply a file containing a 40-character SHA-1 hash. Branches, tags, and HEAD are all refs:
# See where HEAD points
cat .git/HEAD
# Output: ref: refs/heads/main
# See where the 'main' branch points
cat .git/refs/heads/main
# Output: f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9
# List all refs
git show-ref
# Shows every branch, tag, and remote-tracking ref with its hash
Essential Commands
This section covers the commands you will use daily. Rather than listing every flag, we focus on the commands and options that matter most in professional workflows.
Repository Initialization
# Create a new repository
git init my-project
cd my-project
# Clone an existing repository
git clone https://github.com/user/repo.git
# Clone with a specific branch and shallow history (faster)
git clone --branch develop --depth 1 https://github.com/user/repo.git
The Three Areas: Working Tree, Staging Area, Repository
Git has three areas where file changes live. Understanding these is essential:
- Working Tree: The actual files on your disk. This is what you edit.
- Staging Area (Index): A preview of your next commit. You explicitly choose what goes here with
git add. - Repository (.git): The permanent history. Contents arrive here via
git commit.
# Check the status of all three areas
git status
# Add a specific file to the staging area
git add src/auth.js
# Add all changes in a directory
git add src/
# Stage parts of a file interactively (choose hunks)
git add -p src/auth.js
# Remove a file from staging (unstage) without deleting it
git restore --staged src/auth.js
# Commit with a message
git commit -m "Add JWT authentication to login endpoint"
# Commit with a multi-line message (opens editor)
git commit
Viewing History
# Standard log
git log
# Compact one-line format with graph
git log --oneline --graph --all
# Show commits by a specific author in the last 2 weeks
git log --author="Wasil" --since="2 weeks ago"
# Show commits that changed a specific file
git log --follow -- src/auth.js
# Show the actual diff for each commit
git log -p -3 # last 3 commits with diffs
# Search commit messages for a keyword
git log --grep="authentication"
# Search for when a string was added or removed (pickaxe)
git log -S "validateToken" --oneline
Comparing Changes
# Diff between working tree and staging area
git diff
# Diff between staging area and last commit
git diff --staged
# Diff between two branches
git diff main..feature/auth
# Diff with statistics only
git diff --stat main..feature/auth
# Show word-level diff (useful for prose)
git diff --word-diff
Stashing Work
Stash lets you save uncommitted changes temporarily without creating a commit. It is invaluable when you need to switch contexts quickly.
# Stash all uncommitted changes
git stash
# Stash with a descriptive message
git stash push -m "WIP: refactoring auth middleware"
# Stash including untracked files
git stash push -u -m "WIP: new test files"
# List all stashes
git stash list
# Output:
# stash@{0}: On feature/auth: WIP: refactoring auth middleware
# stash@{1}: On main: WIP: new test files
# Apply the most recent stash (keep it in the list)
git stash apply
# Apply and remove from the list
git stash pop
# Apply a specific stash
git stash apply stash@{1}
# View the diff of a stash
git stash show -p stash@{0}
git add -p (patch mode) religiously. It lets you stage individual hunks within a file, which means a single file with two unrelated changes can produce two clean, focused commits. Professional developers use this to keep commits atomic — each commit does exactly one thing.
Branching Strategies
A branching strategy defines how a team organizes parallel lines of development, integrates completed work, and delivers releases. Choosing the right strategy depends on your team size, release cadence, and deployment model.
Git Flow
Introduced by Vincent Driessen in 2010, Git Flow uses long-lived branches to separate concerns. It was designed for software with scheduled releases (e.g., desktop applications, mobile apps with app store review cycles).
- main: Always reflects production. Every commit is a release.
- develop: Integration branch. Features merge here first.
- feature/*: Short-lived branches for individual features, branched from develop.
- release/*: Created from develop when preparing a release. Bug fixes go here, then merge to both main and develop.
- hotfix/*: Emergency fixes branched from main, merged back to both main and develop.
# Start a new feature
git checkout develop
git checkout -b feature/user-dashboard
# Work on the feature...
git add .
git commit -m "Add dashboard layout component"
# Finish the feature
git checkout develop
git merge --no-ff feature/user-dashboard
git branch -d feature/user-dashboard
# Start a release
git checkout develop
git checkout -b release/2.1.0
# Fix bugs on the release branch, then finalize
git checkout main
git merge --no-ff release/2.1.0
git tag -a v2.1.0 -m "Release 2.1.0"
git checkout develop
git merge --no-ff release/2.1.0
git branch -d release/2.1.0
GitHub Flow
GitHub Flow is simpler: one long-lived branch (main) and short-lived feature branches. It was designed for web applications with continuous deployment.
- main: Always deployable. Protected by CI checks.
- Feature branches: Created from main, merged back via pull request after review and CI passes.
# Create a feature branch from main
git checkout main
git pull origin main
git checkout -b add-search-filter
# Work, commit, push
git add .
git commit -m "Add full-text search with Elasticsearch integration"
git push -u origin add-search-filter
# Open a pull request on GitHub, get review, merge via UI
# After merge, clean up locally
git checkout main
git pull origin main
git branch -d add-search-filter
Trunk-Based Development
In trunk-based development, all developers commit directly to a single branch (trunk/main), or use extremely short-lived feature branches (less than one day). This requires strong CI, feature flags, and disciplined small commits.
Comparison Table
| Aspect | Git Flow | GitHub Flow | Trunk-Based |
|---|---|---|---|
| Long-lived branches | main + develop | main only | main only |
| Feature branch lifespan | Days to weeks | Hours to days | Hours (or none) |
| Release process | Release branches | Deploy from main | Deploy from main |
| Best for | Scheduled releases, multiple versions | Web apps, continuous deployment | High-velocity teams, strong CI |
| Complexity | High | Low | Low (process), High (discipline) |
| Merge conflicts | Frequent (long branches) | Moderate | Rare (small, frequent merges) |
Merging vs Rebasing
This is the most debated topic in Git workflows. Both merge and rebase integrate changes from one branch into another, but they produce fundamentally different histories.
Merge: Preserve History As It Happened
A merge creates a new "merge commit" with two parents, preserving the fact that development happened in parallel.
# Merge feature branch into main
git checkout main
git merge feature/auth
# Force a merge commit even if fast-forward is possible
git merge --no-ff feature/auth
The resulting history shows the branch and the merge point:
* f0a1b2c (HEAD -> main) Merge branch 'feature/auth'
|\
| * d4e5f6a Add token refresh logic
| * b7c8d9e Add JWT validation middleware
|/
* a1b2c3d Previous commit on main
Rebase: Rewrite History to Be Linear
A rebase takes the commits from your branch and replays them on top of the target branch, creating new commits with new hashes but the same changes.
# Rebase feature branch onto latest main
git checkout feature/auth
git rebase main
# The history becomes linear:
# * d4e5f6a' Add token refresh logic
# * b7c8d9e' Add JWT validation middleware
# * a1b2c3d Previous commit on main
Interactive Rebase: Sculpt Your History
Interactive rebase is the most powerful history-editing tool in Git. It lets you reorder, squash, edit, or drop commits before sharing them.
# Interactively rebase the last 4 commits
git rebase -i HEAD~4
# In the editor, you'll see:
# pick a1b2c3d Add login form
# pick d4e5f6a Fix typo in login form
# pick b7c8d9e Add password validation
# pick f0a1b2c Fix password regex
# Change to:
# pick a1b2c3d Add login form
# fixup d4e5f6a Fix typo in login form
# pick b7c8d9e Add password validation
# fixup f0a1b2c Fix password regex
# Result: 2 clean commits instead of 4
When to Use Each
| Situation | Recommendation | Reason |
|---|---|---|
| Integrating a shared branch (develop) into main | Merge | Preserves the historical record of parallel development |
| Updating your feature branch with latest main | Rebase | Keeps your branch clean and up-to-date without merge bubbles |
| Cleaning up messy WIP commits before PR | Interactive rebase | Produces a clean, reviewable commit history |
| Branch already pushed and shared with others | Merge | Rebase rewrites history, breaking other people's work |
Squash Merging
A squash merge takes all the commits from a feature branch and combines them into a single commit on the target branch. GitHub, GitLab, and Bitbucket all offer this as a merge option on pull requests.
# Squash merge: all feature/auth commits become one commit on main
git checkout main
git merge --squash feature/auth
git commit -m "Add JWT authentication with token refresh (#42)"
Conflict Resolution
Conflicts occur when two branches modify the same part of the same file in incompatible ways. Git is remarkably good at auto-merging — it handles the vast majority of cases silently. But when it cannot determine the correct result, it asks you to decide.
How Three-Way Merge Works
Git does not simply compare the two conflicting versions. It uses a three-way merge: it finds the common ancestor (the point where the branches diverged) and compares both sides against it. If only one side changed a particular section, Git takes that change automatically. Conflicts only arise when both sides changed the same section differently.
Anatomy of a Conflict Marker
function getGreeting(user) {
<<<<<<< HEAD
return `Welcome back, ${user.displayName}!`;
=======
return `Hello, ${user.firstName} ${user.lastName}!`;
>>>>>>> feature/user-profile
}
<<<<<<< HEADmarks the start of your current branch's version=======separates the two versions>>>>>>> feature/user-profilemarks the end of the incoming branch's version
Resolving Conflicts Step by Step
# Step 1: Attempt the merge
git merge feature/user-profile
# Auto-merging src/greeting.js
# CONFLICT (content): Merge conflict in src/greeting.js
# Automatic merge failed; fix conflicts and then commit the result.
# Step 2: See which files have conflicts
git status
# Both modified: src/greeting.js
# Step 3: Open the file, resolve the conflict manually
# Choose one version, combine them, or write something new.
# Remove all conflict markers (<<<, ===, >>>)
# Step 4: Mark as resolved by staging
git add src/greeting.js
# Step 5: Complete the merge
git commit
# Git will auto-populate the merge commit message
Using Merge Tools
# Configure a merge tool (VS Code example)
git config --global merge.tool vscode
git config --global mergetool.vscode.cmd 'code --wait --merge $REMOTE $LOCAL $BASE $MERGED'
# Launch the merge tool during a conflict
git mergetool
# Other popular merge tools
# - vimdiff (terminal)
# - meld (Linux GUI)
# - kdiff3 (cross-platform)
# - Beyond Compare (commercial)
Common Conflict Patterns and Solutions
| Pattern | Cause | Resolution Strategy |
|---|---|---|
| Both sides added code at same location | Two features touch adjacent areas | Usually keep both additions in the correct order |
| One side renamed, other side modified | Refactoring + feature work in parallel | Apply the modification to the renamed file |
| Auto-generated file conflicts (package-lock.json) | Both sides added different dependencies | Accept either version, then run npm install to regenerate |
| Conflicting formatting changes | One branch ran a formatter, the other did not | Accept the formatted version; prevent with pre-commit hooks |
Git Hooks & Automation
Git hooks are scripts that run automatically at specific points in the Git workflow. They live in .git/hooks/ and can enforce coding standards, run tests, validate commit messages, or trigger deployments.
Client-Side Hooks
| Hook | When It Runs | Common Use |
|---|---|---|
pre-commit |
Before commit is created | Lint code, run formatters, check for secrets |
prepare-commit-msg |
After default message, before editor opens | Prepend branch name or ticket number |
commit-msg |
After message is entered | Validate commit message format (Conventional Commits) |
pre-push |
Before push to remote | Run full test suite, check branch naming |
post-commit |
After commit is created | Notifications, update dashboards |
Writing a pre-commit Hook
#!/bin/sh
# .git/hooks/pre-commit
# Prevent committing to main directly
branch=$(git symbolic-ref --short HEAD)
if [ "$branch" = "main" ]; then
echo "ERROR: Direct commits to main are not allowed."
echo "Create a feature branch: git checkout -b feature/your-feature"
exit 1
fi
# Run ESLint on staged JavaScript files
staged_js=$(git diff --cached --name-only --diff-filter=ACM | grep '\.js$')
if [ -n "$staged_js" ]; then
echo "Running ESLint on staged files..."
npx eslint $staged_js
if [ $? -ne 0 ]; then
echo "ESLint failed. Fix the errors before committing."
exit 1
fi
fi
# Check for secrets (API keys, passwords)
if git diff --cached --diff-filter=ACM | grep -qiE '(api_key|secret|password|token)\s*=\s*["\x27][^\s]+'; then
echo "WARNING: Possible secret detected in staged changes!"
echo "Review your changes: git diff --cached"
exit 1
fi
exit 0
Commit Message Validation
#!/bin/sh
# .git/hooks/commit-msg
# Enforce Conventional Commits format
commit_msg_file=$1
commit_msg=$(cat "$commit_msg_file")
# Pattern: type(scope): description
pattern="^(feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert)(\(.+\))?: .{1,72}$"
first_line=$(head -1 "$commit_msg_file")
if ! echo "$first_line" | grep -qE "$pattern"; then
echo "ERROR: Commit message does not follow Conventional Commits format."
echo ""
echo "Expected: type(scope): description"
echo "Examples:"
echo " feat(auth): add JWT token refresh"
echo " fix(api): handle null response in user endpoint"
echo " docs: update README with setup instructions"
echo ""
echo "Your message: $first_line"
exit 1
fi
exit 0
Husky: Managing Hooks in Teams
Hooks in .git/hooks/ are not committed to the repository (the .git directory is not tracked). Husky solves this by storing hooks in the project directory and installing them via npm.
# Install Husky
npm install --save-dev husky
# Initialize Husky (creates .husky/ directory)
npx husky init
# Add a pre-commit hook
echo "npx lint-staged" > .husky/pre-commit
# Add a commit-msg hook with commitlint
echo "npx --no -- commitlint --edit \$1" > .husky/commit-msg
Combined with lint-staged, this runs linters only on the files that are actually being committed:
// package.json
{
"lint-staged": {
"*.{js,ts}": ["eslint --fix", "prettier --write"],
"*.css": ["stylelint --fix"],
"*.md": ["prettier --write"]
}
}
Remote Collaboration
Git is a distributed system, but most teams use a central hosting service (GitHub, GitLab, Bitbucket) as the coordination point. Understanding how to work with remotes, pull requests, and forks is essential for professional development.
Working with Remotes
# List configured remotes
git remote -v
# origin https://github.com/you/project.git (fetch)
# origin https://github.com/you/project.git (push)
# Add a remote
git remote add upstream https://github.com/original/project.git
# Fetch all branches from a remote (does NOT merge)
git fetch upstream
# Pull = fetch + merge
git pull origin main
# Pull with rebase instead of merge
git pull --rebase origin main
# Push a branch and set it to track the remote
git push -u origin feature/search
The Pull Request Workflow
A pull request (PR) or merge request (MR in GitLab) is a formalized request to merge one branch into another. It is the centerpiece of collaborative development and serves multiple purposes: code review, discussion, CI validation, and documentation of why changes were made.
The typical PR workflow:
- Create a feature branch from main
- Make commits with clear, atomic changes
- Push the branch to the remote
- Open a pull request with a descriptive title and body
- CI runs automated tests and checks
- Team members review the code, leave comments
- Author addresses feedback with additional commits or force-pushed rebases
- Reviewer approves
- PR is merged (merge commit, squash, or rebase — per team policy)
- Feature branch is deleted
Forking and Upstream Sync
In open-source projects, contributors do not have write access to the main repository. Instead, they fork (create a personal copy), work on their fork, and submit pull requests to the original (upstream) repository.
# Fork the repo on GitHub, then clone your fork
git clone https://github.com/you/project.git
cd project
# Add the original repository as "upstream"
git remote add upstream https://github.com/original/project.git
# Sync your fork with upstream
git fetch upstream
git checkout main
git merge upstream/main
git push origin main
# Create a feature branch and work
git checkout -b fix/typo-in-readme
# ... make changes ...
git push -u origin fix/typo-in-readme
# Open PR from your fork to the upstream repo
Code Review Best Practices
- Review the PR description first. Understand what the PR is trying to accomplish before reading code.
- Look at the diff, not the full files. Focus on what changed, not what already existed.
- Check for correctness, clarity, and consistency — in that order of priority.
- Approve with suggestions: If the PR is fundamentally sound but has minor issues, approve it with non-blocking suggestions rather than requesting changes.
- Use "nit:" prefix for trivial suggestions that should not block merging.
Advanced Git
Git Bisect: Binary Search for Bugs
When you know a bug exists in the current version but not in an older version, git bisect performs a binary search through the commit history to find the exact commit that introduced the bug.
# Start bisecting
git bisect start
# Mark the current commit as bad (has the bug)
git bisect bad
# Mark a known-good commit
git bisect good v2.0.0
# Git checks out a commit halfway between. Test it, then:
git bisect good # if this commit does NOT have the bug
# or
git bisect bad # if this commit DOES have the bug
# Git narrows the range and checks out the next candidate.
# Repeat until it finds the first bad commit.
# Automate with a test script:
git bisect start HEAD v2.0.0
git bisect run npm test
# When done, reset to your original branch
git bisect reset
Git Reflog: Your Safety Net
The reflog records every change to HEAD — every commit, merge, rebase, reset, checkout. Even if you lose commits through a bad rebase or reset, they are still in the reflog for 90 days (by default).
# View the reflog
git reflog
# HEAD@{0}: commit: Add payment processing
# HEAD@{1}: rebase (finish): returning to refs/heads/feature/pay
# HEAD@{2}: rebase (pick): Add payment form
# HEAD@{3}: rebase (start): checkout main
# HEAD@{4}: commit: WIP: debug payment issue
# Recover a "lost" commit after a bad rebase
git checkout -b recovery-branch HEAD@{4}
# Or reset your branch to a previous state
git reset --hard HEAD@{4} # Use with caution!
Cherry-Pick: Surgical Commit Transfer
# Apply a specific commit from another branch to your current branch
git cherry-pick a1b2c3d
# Cherry-pick a range of commits
git cherry-pick a1b2c3d..f0a1b2c
# Cherry-pick without committing (stage the changes only)
git cherry-pick --no-commit a1b2c3d
Git Worktrees: Multiple Working Directories
Worktrees let you check out multiple branches simultaneously in separate directories, all sharing the same .git repository. This is invaluable when you need to work on a hotfix while keeping your feature branch state intact.
# Create a worktree for a hotfix branch
git worktree add ../project-hotfix hotfix/payment-bug
# Work in the new directory
cd ../project-hotfix
# ... fix the bug, commit, push ...
# Return to main work
cd ../project
# Remove the worktree when done
git worktree remove ../project-hotfix
Submodules: Repository-in-a-Repository
# Add a submodule
git submodule add https://github.com/lib/library.git vendor/library
# Clone a repo with submodules
git clone --recurse-submodules https://github.com/you/project.git
# Update all submodules to their latest commits
git submodule update --remote --merge
# Initialize submodules after a regular clone
git submodule init
git submodule update
git submodule update, add it to a post-checkout hook.
Git Filter-Repo: Rewriting History
When you need to remove a large file from the entire history, purge sensitive data, or restructure paths, git filter-repo (the modern replacement for the deprecated filter-branch) is the tool:
# Install git-filter-repo
pip install git-filter-repo
# Remove a file from all history
git filter-repo --path secrets.env --invert-paths
# Remove all files larger than 10MB from history
git filter-repo --strip-blobs-bigger-than 10M
# Move all files into a subdirectory (for monorepo migration)
git filter-repo --to-subdirectory-filter my-service/
Case Studies
Case Study 1: The Linux Kernel Workflow
The Linux kernel is the largest collaborative software project in history. As of 2024, it has over 35 million lines of code, more than 20,000 contributors, and accepts approximately 10,000 patches per release cycle (roughly every 9 weeks). Its Git workflow is a hierarchy of trust.
At the top, Linus Torvalds maintains the authoritative linux.git repository. Below him are approximately 100 subsystem maintainers (networking, file systems, drivers, etc.). Below them are thousands of individual contributors.
The workflow operates on email-based patches, not pull requests:
- A contributor writes a patch and sends it to the relevant mailing list using
git format-patchandgit send-email. - The subsystem maintainer reviews the patch on the mailing list, applies it to their tree using
git am, and tests it. - During the two-week merge window, subsystem maintainers send pull requests (via email) to Torvalds, who merges their trees into mainline.
- After the merge window closes, only bug fixes are accepted for the next 7+ weeks (release candidates).
Key takeaway: even the largest software project in the world uses a simple, disciplined workflow. The complexity is in the social structure (maintainers, reviewers, mailing lists), not in Git branching gymnastics.
Case Study 2: Google's Monorepo
Google stores virtually all of its code (billions of lines, across tens of thousands of projects) in a single repository called "google3." While Google uses its custom VCS (Piper) rather than Git, the monorepo philosophy has influenced many Git-based teams.
Companies like Stripe, Airbnb, and Twitter (now X) have adopted Git-based monorepos. The key challenges and solutions:
- Scale: Git struggles with repositories over ~10 GB. Solutions include Git LFS for large files, sparse checkout for working on subsets, and Microsoft's VFS for Git (now Scalar) which virtualizes the working tree.
- CI/CD: Running all tests on every commit is impractical. Build systems like Bazel use dependency graphs to determine which tests are affected by a change.
- Code ownership: CODEOWNERS files in GitHub/GitLab automatically assign reviewers based on which files are modified.
Case Study 3: Open-Source PR Workflow (React)
Facebook's React library receives hundreds of external pull requests per month. Their workflow demonstrates professional open-source collaboration:
- Contributors fork the repository and create feature branches.
- A CLA (Contributor License Agreement) bot checks that every PR author has signed the CLA before review begins.
- CI runs a comprehensive test suite (unit tests, integration tests, bundle size checks) on every PR. The "Danger" bot comments with bundle size impact.
- At least one core team member must approve the PR.
- Squash merge is used to keep the main branch history clean — each PR becomes a single commit.
- Automated release tooling (using Changesets) generates changelogs from PR titles.
Exercises
Commit Archaeology
Clone the Express.js repository. Using only git log commands, answer the following questions: (1) Who made the first commit? (2) How many commits exist in total? (3) Find the commit that introduced the app.listen() method. (4) What is the average number of commits per month over the last year?
Hint: Use git log --oneline | wc -l for total count, git log -S "app.listen" for searching content, and git log --since="1 year ago" --format="%h" for date filtering.
Branch, Conflict, Resolve
Create a local repository with a file index.html containing a basic page. Create two branches: feature/header and feature/nav. On each branch, modify the same <body> section differently. Then merge both branches into main, resolving the conflict. Document the exact commands you used and the resolution strategy you chose. Bonus: Set up a commit-msg hook that enforces Conventional Commits format.
Rebase Surgery and Bisect Automation
Create a repository with 20 commits. Intentionally introduce a bug in commit #12 (e.g., a function that returns the wrong value). Then: (1) Use git bisect run with an automated test script to find the bad commit. (2) Use interactive rebase to rewrite the history: squash commits 1-5 into one, reorder commits 8 and 9, and edit the commit message of commit 15. (3) Verify the final history is clean with git log --oneline --graph.
Git Workflow Assessment Generator
Use this tool to document your team's Git workflow configuration — branching strategy, merge policy, CI/CD integration, and hook setup. Download as Word, Excel, PDF, or PowerPoint for team onboarding or process documentation.
Document your Git workflow and export for team review. All data stays in your browser — nothing is sent to any server.
All data stays in your browser. Nothing is sent to or stored on any server.
Conclusion & Resources
Git is more than a tool — it is the foundation of modern software collaboration. We have covered the journey from Git's origin in 2005 through its internal object model, daily commands, branching strategies, merge vs rebase philosophy, conflict resolution, hook-based automation, remote collaboration patterns, and advanced techniques like bisect and worktrees.
The most important takeaways:
- Understand the object model. Once you know that branches are pointers, commits are snapshots, and the reflog is your safety net, Git stops being mysterious.
- Keep commits atomic. Each commit should do one thing. Use
git add -pand interactive rebase to achieve this. - Match your branching strategy to your deployment model. Git Flow for scheduled releases, GitHub Flow or trunk-based for continuous deployment.
- Automate quality with hooks. Pre-commit linting, commit message validation, and pre-push testing prevent entire categories of problems.
- Rebase local, merge shared. Clean up your history before sharing it, but never rewrite shared history.
Recommended Resources
- Pro Git (2nd Edition) by Scott Chacon and Ben Straub — free online at git-scm.com/book
- Git Internals — the Pro Git chapter on Git plumbing commands
- Conventional Commits specification at conventionalcommits.org
- DORA Metrics — dora.dev for research on engineering team performance
- Oh Shit, Git!?! at ohshitgit.com — practical recipes for fixing common mistakes