Summary 💡
Tracking a minimal implementation of partial clones and their corresponding promisor objects/remotes, as discussed in #1041.
What is a partial clone?
Partial clone is a recent(ish) feature of git that allows the client to fetch a subset of objects from a remote repository based on criteria (i.e. a "filter") of its choosing. The missing objects are referred to as "promisor objects", and are expected to be able to be provided by "promisor remotes" on-demand after the clone, as needed.
The most common use-case of partial-clone are where the client requests a clone with either no historical blobs (e.g. --filter=blob:none), or only historical blobs under some size threshold (e.g. --filter=blob:512k). Tree objects can also be filtered by a partial clone, however that use-case is far less common.
Lessons learned from git
Because partial clone was retrofitted into git, there are several performance gaps that have not yet been resolved. Operations like fetch and checkout behave exactly as one would expect - the missing objects are fetched in a single transaction with the remote. Other operations, such as blame and rebase, do not do this, and instead end up lazily fetching missing objects one at a time (each with a separate transaction to the remote), which significantly slows things down.
To implement partial clones efficiently, operations that traverse history and require inspecting the contents of blobs and trees need to:
- Determine the set of object IDs needed by the operation (typically by walking a commit graph)
- Fetch any missing objects in a single transaction to the remote
- Continue with their "business logic"
That said, this feature does not aim to implement the optimized approach to partial clones across the board. However we would like to see APIs designed to facilitate the optimized approach, and possibly one implementation of the optimized approach to be used as a reference and proof that things can be made to work as expected.
Tasks
Summary 💡
Tracking a minimal implementation of partial clones and their corresponding promisor objects/remotes, as discussed in #1041.
What is a partial clone?
Partial clone is a recent(ish) feature of
gitthat allows the client to fetch a subset of objects from a remote repository based on criteria (i.e. a "filter") of its choosing. The missing objects are referred to as "promisor objects", and are expected to be able to be provided by "promisor remotes" on-demand after the clone, as needed.The most common use-case of partial-clone are where the client requests a clone with either no historical blobs (e.g.
--filter=blob:none), or only historical blobs under some size threshold (e.g.--filter=blob:512k). Tree objects can also be filtered by a partial clone, however that use-case is far less common.Lessons learned from
gitBecause partial clone was retrofitted into
git, there are several performance gaps that have not yet been resolved. Operations likefetchandcheckoutbehave exactly as one would expect - the missing objects are fetched in a single transaction with the remote. Other operations, such asblameandrebase, do not do this, and instead end up lazily fetching missing objects one at a time (each with a separate transaction to the remote), which significantly slows things down.To implement partial clones efficiently, operations that traverse history and require inspecting the contents of blobs and trees need to:
That said, this feature does not aim to implement the optimized approach to partial clones across the board. However we would like to see APIs designed to facilitate the optimized approach, and possibly one implementation of the optimized approach to be used as a reference and proof that things can be made to work as expected.
Tasks
gix fsck connectivity)remote.<name>.partialclonefilterremote.<name>.promisortrueif a partial clone filter was providedgixCLIgitpartialclonefilterandpromsiorare set appropriatelypromisorandpartialclonefilterconfig settings on the remotespromisorpack