When you run git init, Git creates a .git/ directory with some essential files and subdirectories.
.git/
├── objects/
├── refs/
└── HEADobjects/
Stores all Git objects (blobs, trees, commits) in a compressed format.refs/
Contains references to commits, such as branches and tags.HEAD
Points to the current branch. For a new repository, it contains:ref: refs/heads/main
📚 Learn more: What’s in .git/
Git uses three primary object types to represent and track your data. All objects are stored in the .git/objects/ directory.
| Type | Description | mygit Command |
|---|---|---|
| 🟡 Blob | Stores raw file content, but not the filename or permissions. | hash-object |
| 🟢 Tree | Represents a directory, containing pointers to blobs and other trees. | write-tree, ls-tree |
| 🔴 Commit | A snapshot of the project, pointing to a single root tree and parent commit(s). | commit-tree |
To inspect the contents of any Git object, you can use mygit cat-file. This command is invaluable for understanding how Git stores information.
# Pretty-print the contents of any object given its SHA-1 hash
$ mygit cat-file -p <object-sha>The implementation in src/commands/cat_file.cpp performs these steps:
- Finds the object file in
.git/objects/based on the provided SHA. - Decompresses the file content using Zlib.
- Reads past the header (e.g.,
blob 11\0) and prints only the actual content to the console.
All objects are stored in .git/objects/, with hash-based paths to avoid too many files in one directory.
For an object with this SHA-1 hash:
e88f7a929cd70b0274c4ea33b209c97fa845fdbc
The object file is stored at:
.git/objects/e8/8f7a929cd70b0274c4ea33b209c97fa845fdbce8/= First 2 characters (directory)8f7a...= Remaining 38 characters (filename)
blob <size>\0<content>
<size>= size in bytes of the content\0= null byte separator<content>= actual file content
For a file with the contents:
hello world
The blob object (after decompression) looks like:
blob 11\0hello world
- 11 = byte length of "hello world" (includes the space)
The name of this blob will be calculated like this, using SHA-1:
sha1("blob 11\0hello world")The content stored in .git/objects/... is the result of:
zlibCompress("blob 11\0hello world")# Create a new file
$ echo "hello world" > hello.txt
# Create a blob object from the file and write it to the .git/objects database
$ mygit hash-object -w hello.txt
3b18e512dba79e45b138245893a07c91355b1b4dGit tree objects represent directories and capture the structure of the project at a given point in time. While blob objects store file contents, tree objects record which files and folders exist, their names, permissions, and how they map to other objects (blobs or trees).
A tree object consists of a series of entries, each representing a file or subdirectory:
<mode> <filename>\0<20-byte binary SHA-1 hash>
<mode>: File mode (Unix-style):100644– regular file100755– executable file120000– symbolic link40000– directory (i.e., a tree)
<filename>: Name of the file or directory\0: Null byte separator<20-byte SHA-1>: Binary SHA-1 hash of the referenced object (blob or tree)
This is what a real tree object might look like after decompression using Zlib:
tree 192\0
40000 octopus-admin\0 a84943494657751ce187be401d6bf59ef7a2583c
40000 octopus-deployment\0 14f589a30cf4bd0ce2d7103aa7186abe0167427f
40000 octopus-product\0 ec559319a263bc7b476e5f01dd2578f255d734fd
100644 pom.xml\0 97e5b6b292d248869780d7b0c65834bfb645e32a
40000 src\0 6e63db37acba41266493ba8fb68c76f83f1bc9dd
- This tree represents a directory with:
- 4 subdirectories (
octopus-admin,octopus-deployment,octopus-product,src) → each one references another tree object - 1 file:
pom.xml→ references a blob object
- 4 subdirectories (
192is the total number of bytes in the uncompressed tree object content (everything aftertree 192\0)
Git stores this object compressed in .git/objects/ under a path based on its SHA-1:
For example, if the SHA-1 of the tree object is:
8a58e0e2a65b315d2b61f9123b93c2a36b44c4b9
It will be saved at:
.git/objects/8a/58e0e2a65b315d2b61f9123b93c2a36b44c4b9
Before storing, Git computes the SHA-1 hash of the object like this:
sha1("tree 192\0<raw tree content>")Then compresses the full content with Zlib:
zlib.compress(b"tree 192\0" + b"<binary tree content>")📘 Pro tip: This is how Git tracks not just the content of files (via blobs), but also their hierarchy and structure at each commit.
This command scans the current directory and creates a tree object that represents its state.
# After creating files and adding them with hash-object...
# Create a tree object representing the current directory
$ mygit write-tree
b123a9e...The writeTreeFromDirectory function in src/commands/write_tree.cpp implements this by:
- Scanning the Directory: It iterates through all files and subdirectories, ignoring
.git. - Creating Objects Recursively:
- For each file, it calls
createBlobAndGetRawShato create a blob object. - For each subdirectory, it calls itself recursively to create a subtree object.
- For each file, it calls
- Assembling the Tree: It constructs the binary tree content by concatenating entries in the format
<mode> <filename>\0<20-byte-sha>. To ensure the final tree hash is deterministic, it sorts the entries by filename. - Writing the Tree Object: It prepends the
tree <size>\0header and callswriteGitObjectto save the new tree object to the database.
This command displays the contents of a tree object in a readable format, similar to the ls command.
# List the contents of the tree created earlier
$ mygit ls-tree b123a9e...
100644 blob 3b18e512dba79e45b138245893a07c91355b1b4d hello.txt
40000 tree 9a8b7c6... docs
# List only the filenames
$ mygit ls-tree --name-only b123a9e...
hello.txt
docsThe handleLsTree function in src/commands/ls_tree.cpp achieves this by:
- Reading the tree object with
readGitObject. - Using a
parseTreeObjectutility to parse the binary data into a list ofTreeEntrystructs. - Iterating through the entries and printing the mode, type, SHA, and filename for each one.
A commit object is Git’s way of capturing a complete snapshot of your project — content and history. While blobs record file contents and trees capture directory structure, commits add time, authorship, and lineage.
commit <size>\0
tree <TREE_SHA>
parent <PARENT_SHA> # 0 – many lines (first commit has none)
author <Name> <email> <timestamp> <timezone>
committer <Name> <email> <timestamp> <timezone>
<commit-message>\n
| Field | Purpose |
|---|---|
tree |
Root tree object for this snapshot (represents the project directory). |
parent |
SHA-1(s) of previous commit(s). A merge commit has multiple parent lines. |
author |
Original creator: name, email, Unix seconds since epoch, and timezone offset. |
committer |
Who actually wrote the commit to the repo (often the same as author). |
| (blank line) | Separator between headers and the message. |
| message | The human-written commit message, ending with a newline. |
commit 222\0
tree 8a58e0e2a65b315d2b61f9123b93c2a36b44c4b9
parent 14f589a30cf4bd0ce2d7103aa7186abe0167427f
author Alice Example <alice@example.com> 1720752000 +0200
committer Alice Example <alice@example.com> 1720752000 +0200
Add README with project overview
222– byte count of everything aftercommit 222\0tree ...– root tree of the project snapshotparent ...– previous commit in history- Timestamps
1720752000translate to July 12 2024 09:20:00 UTC
(Git stores raw seconds; timezone+0200converts to local time) - Descriptive commit message follows the blank line.
If the SHA-1 of the commit object is:
e1d9e71790b6f97f1b930d8d8d0bac8e7d5bd0a4
Git writes it (zlib-compressed) to:
.git/objects/e1/d9e71790b6f97f1b930d8d8d0bac8e7d5bd0a4
Directory e1/ comes from the first 2 hex digits; the file name is the remaining 38.
- Concatenate the header, null byte, and raw content (exact bytes shown above).
- SHA-1 hash the full string:
sha1(b"commit <size>\0" + raw_commit_content)
- Compress with zlib before writing to
.git/objects/….
Because the hash includes the tree SHA and the parent SHA, any change to a file, directory, or history rewrites every descendant commit’s hash, ensuring integrity all the way down the chain.
- Snapshot Integrity – The commit hash “seals” the exact state of your project and its ancestry.
- History Graph –
parentlinks form Git’s famous directed-acyclic graph (DAG). - Metadata – Author/committer lines track who changed what and when, powering commands like
git log --author. - Immutability – Altering any part (message, tree, or parent) yields a brand-new SHA, guaranteeing tamper-evidence.
This is a lower-level command that creates a single commit object.
# 1. Get the SHA of the root tree of our project
$ mygit write-tree
e9f5068de1389c935613861234950346c1e55041
# 2. Get the SHA of the parent commit (optional, for the first commit there is no parent)
$ cat .git/refs/heads/main
f1a2b3c...
# 3. Create the commit object
$ mygit commit-tree e9f5068... -p f1a2b3c... -m "Add project structure"
a4b5c6d...The handleCommitTree function in src/commands/commit_tree.cpp assembles the commit's plain-text content:
tree e9f5068de1389c935613861234950346c1e55041
parent f1a2b3c...
author Mathis-L <mathislafon@gmail.com> 1720752000 +0200
committer Mathis-L <mathislafon@gmail.com> 1720752000 +0200
Add project structure
It then prepends the commit <size>\0 header and calls writeGitObject to save it, returning the new commit's SHA-1.