Skip to content

Commit 2140697

Browse files
committed
Merge branch 'je/doc-data-model' into seen
Add a new manual that describes the data model. * je/doc-data-model: SQUASH??? work around AsciiDoc xml that does not validate doc: add a explanation of Git's data model
2 parents 0d40db4 + ea4d919 commit 2140697

4 files changed

Lines changed: 287 additions & 2 deletions

File tree

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ MAN7_TXT += gitcli.adoc
5353
MAN7_TXT += gitcore-tutorial.adoc
5454
MAN7_TXT += gitcredentials.adoc
5555
MAN7_TXT += gitcvs-migration.adoc
56+
MAN7_TXT += gitdatamodel.adoc
5657
MAN7_TXT += gitdiffcore.adoc
5758
MAN7_TXT += giteveryday.adoc
5859
MAN7_TXT += gitfaq.adoc

Documentation/gitdatamodel.adoc

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
gitdatamodel(7)
2+
===============
3+
4+
NAME
5+
----
6+
gitdatamodel - Git's core data model
7+
8+
SYNOPSIS
9+
--------
10+
gitdatamodel
11+
12+
DESCRIPTION
13+
-----------
14+
15+
It's not necessary to understand Git's data model to use Git, but it's
16+
very helpful when reading Git's documentation so that you know what it
17+
means when the documentation says "object", "reference" or "index".
18+
19+
Git's core operations use 4 kinds of data:
20+
21+
1. <<objects,Objects>>: commits, trees, blobs, and tag objects
22+
2. <<references,References>>: branches, tags,
23+
remote-tracking branches, etc
24+
3. <<index,The index>>, also known as the staging area
25+
4. <<reflogs,Reflogs>>: logs of changes to references ("ref log")
26+
27+
[[objects]]
28+
OBJECTS
29+
-------
30+
31+
Commits, trees, blobs, and tag objects are all stored in Git's object database.
32+
Every object has:
33+
34+
[[object-id]]
35+
1. an *ID* (aka "object name"), which is a cryptographic hash of its
36+
type and contents.
37+
It's fast to look up a Git object using its ID.
38+
This is usually represented in hexadecimal, like
39+
`1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
40+
2. a *type*. There are 4 types of objects:
41+
<<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
42+
and <<tag-object,tag objects>>.
43+
3. *contents*. The structure of the contents depends on the type.
44+
45+
Once an object is created, it can never be changed.
46+
Here are the 4 types of objects:
47+
48+
[[commit]]
49+
commits::
50+
A commit contains these required fields
51+
(though there are other optional fields):
52+
+
53+
1. All the *files* in the commit, stored as the *<<tree,tree>>* ID of
54+
the commit's base directory.
55+
2. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
56+
regular commits have 1 parent, merge commits have 2 or more parents
57+
3. An *author* and the time the commit was authored
58+
4. A *committer* and the time the commit was committed.
59+
If you cherry-pick (linkgit:git-cherry-pick[1]) someone else's commit,
60+
then they will be the author and you'll be the committer.
61+
5. A *commit message*
62+
+
63+
Here's how an example commit is stored:
64+
+
65+
----
66+
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
67+
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
68+
author Maya <maya@example.com> 1759173425 -0400
69+
committer Maya <maya@example.com> 1759173425 -0400
70+
71+
Add README
72+
----
73+
+
74+
Like all other objects, commits can never be changed after they're created.
75+
For example, "amending" a commit with `git commit --amend` creates a new
76+
commit with the same parent.
77+
+
78+
Git does not store the diff for a commit: when you ask Git for a
79+
diff it calculates it on the fly.
80+
81+
[[tree]]
82+
trees::
83+
A tree is how Git represents a directory. It lists, for each item in
84+
the tree:
85+
+
86+
1. The *file mode*, for example `100644`.
87+
+
88+
[[file-mode]]
89+
The format is inspired by Unix
90+
permissions, but Git's modes are much more limited. Git only supports these file modes:
91+
+
92+
- `100644`: regular file (with type `blob`)
93+
- `100755`: executable file (with type `blob`)
94+
- `120000`: symbolic link (with type `blob`)
95+
- `040000`: directory (with type `tree`)
96+
- `160000`: gitlink, for use with submodules (with type `commit`)
97+
98+
2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory),
99+
or <<commit,`commit`>> (a Git submodule, which is a
100+
commit from a different Git repository)
101+
3. The <<object-id,*object ID*>>
102+
4. The *filename*
103+
+
104+
For example, this is how a tree containing one directory (`src`) and one file
105+
(`README.md`) is stored:
106+
+
107+
----
108+
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
109+
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
110+
----
111+
112+
113+
[[blob]]
114+
blobs::
115+
A blob is how Git represents a file. A blob object contains the
116+
file's contents.
117+
+
118+
When you make a new commit, Git only needs to store new versions of
119+
files which were changed in that commit. This means that commits
120+
can use relatively little disk space even in a very large repository.
121+
122+
[[tag-object]]
123+
tag objects::
124+
Tag objects contain these required fields
125+
(though there are other optional fields):
126+
+
127+
1. The *ID* and *type* of the object (often a commit) that they reference
128+
2. The *tagger* and tag date
129+
3. A *tag message*, similar to a commit message
130+
131+
Here's how an example tag object is stored:
132+
133+
----
134+
object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3
135+
type commit
136+
tag v1.0.0
137+
tagger Maya <maya@example.com> 1759927359 -0400
138+
139+
Release version 1.0.0
140+
----
141+
142+
NOTE: All of the examples in this section were generated with
143+
`git cat-file -p <object-id>`, which shows the contents of a Git object.
144+
145+
[[references]]
146+
REFERENCES
147+
----------
148+
149+
References are a way to give a name to a commit.
150+
It's easier to remember "the changes I'm working on are on the `turtle`
151+
branch" than "the changes are in commit bb69721404348e".
152+
Git often uses "ref" as shorthand for "reference".
153+
154+
References can either be:
155+
156+
1. References to an object ID, usually a <<commit,commit>> ID
157+
2. References to another reference. This is called a "symbolic reference".
158+
159+
References are stored in a hierarchy, and Git handles references
160+
differently based on where they are in the hierarchy.
161+
Most references are under `refs/`. Here are the main types:
162+
163+
[[branch]]
164+
branches: `refs/heads/<name>`::
165+
A branch is a name for a commit ID.
166+
That commit is the latest commit on the branch.
167+
+
168+
To get the history of commits on a branch, Git will start at the commit
169+
ID the branch references, and then look at the commit's parent(s),
170+
the parent's parent, etc.
171+
172+
[[tag]]
173+
tags: `refs/tags/<name>`::
174+
A tag is a name for a commit ID, tag object ID, or other object ID.
175+
Tags that reference a tag object ID are called "annotated tags",
176+
because the tag object contains a tag message.
177+
Tags that reference a commit, blob, or tree ID are
178+
called "lightweight tags".
179+
+
180+
Even though branches and tags are both "a name for a commit ID", Git
181+
treats them very differently.
182+
Branches are expected to change over time: when you make a commit, Git
183+
will update your <<HEAD,current branch>> to reference the new changes.
184+
Tags are usually not changed after they're created.
185+
186+
[[HEAD]]
187+
HEAD: `HEAD`::
188+
`HEAD` is where Git stores your current <<branch,branch>>.
189+
`HEAD` can either be:
190+
1. A symbolic reference to your current branch, for example `ref:
191+
refs/heads/main` if your current branch is `main`.
192+
2. A direct reference to a commit ID. This is called "detached HEAD
193+
state", see the DETACHED HEAD section of linkgit:git-checkout[1] for more.
194+
195+
[[remote-tracking-branch]]
196+
remote tracking branches: `refs/remotes/<remote>/<branch>`::
197+
A remote-tracking branch is a name for a commit ID.
198+
It's how Git stores the last-known state of a branch in a remote
199+
repository. `git fetch` updates remote-tracking branches. When
200+
`git status` says "you're up to date with origin/main", it's looking at
201+
this.
202+
+
203+
`refs/remotes/<remote>/HEAD` is a symbolic reference to the remote's
204+
default branch. This is the branch that `git clone` checks out by default.
205+
206+
[[other-refs]]
207+
Other references::
208+
Git tools may create references anywhere under `refs/`.
209+
For example, linkgit:git-stash[1], linkgit:git-bisect[1],
210+
and linkgit:git-notes[1] all create their own references
211+
in `refs/stash`, `refs/bisect`, etc.
212+
Third-party Git tools may also create their own references.
213+
+
214+
Git may also create references other than `HEAD` at the base of the
215+
hierarchy, like `ORIG_HEAD`.
216+
+
217+
NOTE: By default, Git references are stored as files in the `.git` directory.
218+
For example, the branch `main` is stored in `.git/refs/heads/main`.
219+
This means that you can't have branches named both `maya` and `maya/some-task`,
220+
because there can't be a file and a directory with the same name.
221+
222+
[[index]]
223+
THE INDEX
224+
---------
225+
226+
The index, also known as the "staging area", contains a list of every
227+
file in the repository and its contents. When you commit, the files in
228+
the index are used as the files in the next commit.
229+
230+
You can add files to the index or update the version in the index with
231+
linkgit:git-add[1]. Adding a file to the index or updating its version
232+
is called "staging" the file for commit.
233+
234+
Unlike a <<tree,tree>>, the index is a flat list of files.
235+
Each index entry has 4 fields:
236+
237+
1. The *<<file-mode,file mode>>*
238+
2. The *<<blob,blob>> ID* of the file
239+
3. The *file path*, for example `src/hello.py`
240+
4. The *stage number*, either 0, 1, 2, or 3. This is normally 0, but if
241+
there's a merge conflict there can be multiple versions of the same
242+
filename in the index.
243+
244+
It's extremely uncommon to look at the index directly: normally you'd
245+
run `git status` to see a list of changes between the index and <<HEAD,HEAD>>.
246+
But you can use `git ls-files --stage` to see the index.
247+
Here's the output of `git ls-files --stage` in a repository with 2 files:
248+
249+
----
250+
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
251+
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
252+
----
253+
254+
[[reflogs]]
255+
REFLOGS
256+
-------
257+
258+
Git stores a history called a "reflog" for every branch, remote-tracking
259+
branch, and HEAD. This means that if you make a mistake and "lose" a
260+
commit, you can generally recover the commit ID by running
261+
`git reflog <reference>`.
262+
263+
Each reflog entry has:
264+
265+
1. Before/after *commit IDs*
266+
2. *User* who made the change, for example `Maya <maya@example.com>`
267+
3. *Timestamp* when the change was made
268+
4. *Log message*, for example `pull: Fast-forward`
269+
270+
Reflogs only log changes made in your local repository.
271+
They are not shared with remotes.
272+
273+
For example, here's how the reflog for `HEAD` in a repository with 2
274+
commits is stored:
275+
276+
----
277+
0000000000000000000000000000000000000000 4ccb6d7b8869a86aae2e84c56523f8705b50c647 Maya <maya@example.com> 1759173408 -0400 commit (initial): Initial commit
278+
4ccb6d7b8869a86aae2e84c56523f8705b50c647 750b4ead9c87ceb3ddb7a390e6c7074521797fb3 Maya <maya@example.com> 1759173425 -0400 commit: Add README
279+
----
280+
281+
GIT
282+
---
283+
Part of the linkgit:git[1] suite

Documentation/glossary-content.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,8 +297,8 @@ This commit is referred to as a "merge commit", or sometimes just a
297297
identified by its <<def_object_name,object name>>. The objects usually
298298
live in `$GIT_DIR/objects/`.
299299
300-
[[def_object_identifier]]object identifier (oid)::
301-
Synonym for <<def_object_name,object name>>.
300+
[[def_object_identifier]]object identifier, object ID, oid::
301+
Synonyms for <<def_object_name,object name>>.
302302
303303
[[def_object_name]]object name::
304304
The unique identifier of an <<def_object,object>>. The

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,7 @@ manpages = {
193193
'gitcore-tutorial.adoc' : 7,
194194
'gitcredentials.adoc' : 7,
195195
'gitcvs-migration.adoc' : 7,
196+
'gitdatamodel.adoc' : 7,
196197
'gitdiffcore.adoc' : 7,
197198
'giteveryday.adoc' : 7,
198199
'gitfaq.adoc' : 7,

0 commit comments

Comments
 (0)