Foundations of Git
Understand Git's origins, its distributed snapshot architecture, and key concepts like objects, branches, and merging strategies.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
How is Git defined in terms of its software system type and purpose?
1 of 11
Summary
Understanding Git: Version Control for Collaborative Development
What is Git?
Git is a distributed version control system designed to manage versions of source code and data across multiple developers. Rather than relying on a central server to store your project history, Git maintains a complete copy of the entire repository on each developer's computer. This fundamental difference shapes everything about how Git works.
The system was created by Linus Torvalds in 2005 and was designed with three core goals: speed, data integrity, and support for distributed, non-linear workflows. These design principles make Git particularly powerful for managing large collaborative projects with many parallel development branches.
How Git Stores Information
The Object Model
Git doesn't store files like traditional systems. Instead, it stores objects identified by cryptographic SHA-1 hashes. Think of a SHA-1 hash as a unique fingerprint for each piece of data—if the content changes even slightly, the hash changes completely, which protects the integrity of your repository.
Git uses four main types of objects:
Blobs contain the actual contents of your files. When you save a file to Git, the file's contents become a blob object.
Trees represent directory structures. A tree object contains references to blobs (files) and other trees (subdirectories), essentially describing what a folder looked like at a particular point in time.
Commits are snapshots of your entire project. Each commit contains a reference to a tree object (representing the full state of your project) and one or more parent commits (linking it to previous states). This creates a historical chain.
Tags are simply named references to specific commits, useful for marking important versions like releases.
The Snapshot Model
This is an important distinction from some other version control systems: Git stores complete snapshots of your entire project directory at each commit, rather than storing only the differences (deltas) between files. This means Git can quickly access any version of your project without having to replay a series of file-level changes.
References and Branches
A reference in Git is simply a pointer to a commit's SHA-1 hash. Branches are named references—they're just pointers to commits that help you organize your work.
When you initialize a new repository with git init, Git creates a default branch called "master" by default. However, this branch name is not technically special; it's just a convention. You can create as many branches as you need, each pointing to different commits in your project history.
Understanding this is crucial: branches aren't separate copies of your code. They're lightweight pointers, which is why Git makes branching so fast and easy compared to other version control systems.
The Commit Graph
Commits form a directed acyclic graph (DAG)—a structure where each commit points to one or more parent commits, creating a historical chain without any circular references.
In most cases, a commit has exactly one parent (the previous commit). However, when you merge two branches together, you create a merge commit with multiple parents—it has both the commit from one branch and the commit from another branch as its ancestors. This preserves the complete history of how development happened.
The DAG structure is what enables Git to handle "non-linear workflows"—meaning you can have multiple independent lines of development (branches) that later converge through merges.
Distributed Architecture
Local Repositories
Here's what makes Git truly different: every developer gets a complete, independent copy of the entire repository on their machine. This means:
You can work completely offline
Local operations (commits, viewing history, branching) are fast because they don't require network access
You have the full history available without contacting a server
Each developer's machine is essentially a backup of the entire project
There is no required central server—though many teams use one (like GitHub or GitLab) as a convenient place to share work and coordinate among developers.
Script-Oriented Design
Git was intentionally built as a collection of small, focused scripts and commands that can be combined to perform complex version control operations. This modular design allows developers to automate workflows and extend Git's functionality using shell scripts and other tools.
<extrainfo>
This design philosophy comes from Unix principles: do one thing and do it well. Each Git command is relatively simple, but they combine powerfully.
</extrainfo>
Merging Strategies
When you integrate changes from different branches, Git needs a strategy for combining them. Git supports several approaches:
The "resolve" strategy uses the traditional three-way merge algorithm. It compares three versions: the common ancestor (the last commit both branches shared), your version, and the other version. This helps Git intelligently determine what changed and how to combine the changes.
The "recursive" strategy is Git's default for most merges. It's more sophisticated than resolve: when merging, it first identifies multiple common ancestors and creates a merged tree from them before performing the final three-way merge. This reduces the number of conflicts you'll encounter.
The "octopus" strategy is automatically used when merging more than two branches together at once, though this is less common in typical workflows.
<extrainfo>
Different merge strategies exist because different projects have different needs. Most of the time, you'll use the default "recursive" strategy without even thinking about it. You'd only need to explicitly choose a different strategy for special cases.
</extrainfo>
Why This Design Matters
The design choices discussed here—the object model, the snapshot approach, the distributed architecture, and the DAG structure—work together to achieve Git's stated goals. The speed comes from local operations on your complete repository. Data integrity comes from cryptographic hashing. And support for complex, non-linear workflows comes from the DAG structure and powerful merging capabilities.
Understanding these foundations will help you predict how Git will behave and use it more effectively in your own projects.
Flashcards
How is Git defined in terms of its software system type and purpose?
A distributed version control system that manages versions of source code or data
What are the primary design goals of Git?
Speed
Data integrity
Support for distributed non-linear workflows
What does Git maintain on each computer to ensure full history and version-tracking without network access?
A complete local copy of the entire repository
By what mechanism are Git data objects identified?
Cryptographic SHA-1 hashes
What are the four categories of objects in the Git data storage model?
Blobs (file contents)
Trees (directory structures)
Commits (snapshots)
Tags (named references)
In Git, what is a reference?
A pointer to a commit hash
What mathematical structure do Git commits form?
A directed acyclic graph (DAG)
What characterizes a merge commit in the Git commit graph?
It has multiple parent commits
Instead of storing deltas for individual files, how does Git record directory trees?
As snapshots
How does Git determine file history if it does not store explicit file-level revision relationships?
By walking the global commit history
How does Git's default "recursive" strategy reduce merge conflicts?
By creating a merged tree of multiple common ancestors before performing a three-way merge
Quiz
Foundations of Git Quiz Question 1: What prompted Linus Torvalds to begin developing Git?
- He discontinued using BitKeeper for Linux kernel development. (correct)
- He wanted a tool to manage personal projects.
- He aimed to create a graphical user interface for version control.
- He needed to replace Subversion due to performance issues.
Foundations of Git Quiz Question 2: Which of the following best describes Git?
- A distributed version control system for source code and data. (correct)
- A centralized database management system for web applications.
- A cloud-based file sharing platform for documents.
- An Integrated Development Environment (IDE) for Java.
Foundations of Git Quiz Question 3: What design principle characterizes Git's architecture?
- It is built as a collection of small, composable scripts. (correct)
- It relies on a monolithic executable handling all tasks.
- It uses a proprietary binary format inaccessible to users.
- It requires a graphical interface for all operations.
Foundations of Git Quiz Question 4: How does Git store the history of a project?
- As full snapshots of the entire directory tree at each commit. (correct)
- As incremental file-level diffs (deltas) between successive versions.
- As a single linear log stored on a remote server.
- As compressed archives of only changed files.
Foundations of Git Quiz Question 5: Which of the following is NOT a primary design goal of Git?
- Centralized repository management (correct)
- Speed
- Data integrity
- Support for distributed non‑linear workflows
Foundations of Git Quiz Question 6: In Git’s data storage model, which object type is used to represent directory structures?
- Tree (correct)
- Blob
- Commit
- Tag
Foundations of Git Quiz Question 7: Which Git merge strategy is used by default when merging more than two heads?
- octopus (correct)
- resolve
- recursive
- ours
Foundations of Git Quiz Question 8: By late 2006, Git had been adopted for which major compilation toolchains?
- GCC and Binutils (correct)
- Linux kernel
- Apache HTTP Server
- MySQL database
Foundations of Git Quiz Question 9: What capability does each local Git repository provide regarding version history?
- It contains the full project history and works without network access (correct)
- It stores only the latest commit and requires a server to view older versions
- It relies on a central repository for all history queries
- It tracks only changed files and discards unchanged ones
Foundations of Git Quiz Question 10: What is the result of Git storing a complete repository on every developer’s machine?
- Most operations can be performed without a central server (correct)
- Developers must constantly sync with a remote repository
- Only a single copy of the repository exists on a central host
- Branching requires network access to a central server
Foundations of Git Quiz Question 11: How does Git determine the history of an individual file?
- By walking the global commit graph (correct)
- By storing explicit file‑level revision links
- By consulting a separate file‑history database
- By referencing a linear list of file snapshots
What prompted Linus Torvalds to begin developing Git?
1 of 11
Key Concepts
Git Fundamentals
Git
Distributed version control
Git data storage model
History of Git
Git Operations
Git commit
Git branch
Git snapshot model
Git merging strategies
Definitions
Git
A distributed version control system created by Linus Torvalds for tracking changes in source code and data.
Distributed version control
A model where each developer’s machine holds a complete copy of the repository, enabling offline work and fast local operations.
Git commit
An immutable snapshot object identified by a SHA‑1 hash that records the state of the entire project at a point in time.
Git branch
A lightweight named reference that points to a specific commit, allowing parallel lines of development.
Git data storage model
A content‑addressable system that stores objects (blobs, trees, commits, tags) identified by cryptographic SHA‑1 hashes.
Git snapshot model
The approach of recording whole directory trees as complete snapshots rather than storing file‑level deltas.
Git merging strategies
Algorithms such as resolve, recursive, and octopus that combine divergent commit histories into a unified result.
History of Git
The origin of Git in 2005 after Linus Torvalds abandoned BitKeeper, leading to rapid adoption for large projects like GCC.