Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions Documentation/git-backfill.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone
SYNOPSIS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> 302aff09223f (backfill: accept revision arguments, 2026-03-26) added
> support for passing revision arguments to 'git backfill' but documented
> them only with a prose sentence:
> 
>     You may also specify the commit limiting options from
>     git-rev-list(1).
> 
> No other command that accepts revision arguments documents them this
> way.  Commands like log, shortlog, and replay define a formal
> <revision-range> entry and include rev-list-options.adoc.  Commands like
> bundle, fast-export, and filter-branch, which pass arguments through to
> the revision machinery without including the full options file, still
> define a formal <git-rev-list-args> entry explaining what is accepted.
> 
> Add a formal <revision-range> entry in the synopsis and OPTIONS section,
> following the convention used by other commands, and mention that
> commit-limiting options from git-rev-list(1) are also accepted.

Thanks for your attention to detail here. I like this version.

-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>

> Add an extra --[no-]include-edges flag to allow grabbing blobs from
> edge commits.  Since the point of backfill is to prevent on-demand blob
> loading and these are common commands, default to --include-edges.

I like this option and your motivation for including it.

> @@ -116,6 +117,8 @@ static int do_backfill(struct backfill_context *ctx)
>  	/* Walk from HEAD if otherwise unspecified. */
>  	if (!ctx->revs.pending.nr)
>  		add_head_to_pending(&ctx->revs);
> +	if (ctx->include_edges)
> +		ctx->revs.edge_hint = 1;

This would still work if...

>  		.revs = REV_INFO_INIT,
> +		.include_edges = 1,

...this was initialized to -1 to allow for "no user option".

We don't need this change unless we were deciding to make a
config option that specified a different default. That seems
like overkill right now, so this doesn't need a change. Just
something that I like to think about.

I also like how your tests don't just verify the backfill
behavior but the ultimate behavior of 'git log' and friends
after the fact.

Thanks,
-Stolee

--------
[synopsis]
git backfill [--min-batch-size=<n>] [--[no-]sparse]
git backfill [--min-batch-size=<n>] [--[no-]sparse] [<revision-range>]

DESCRIPTION
-----------
Expand Down Expand Up @@ -43,7 +43,7 @@ smaller network calls than downloading the entire repository at clone
time.

By default, `git backfill` downloads all blobs reachable from the `HEAD`
commit. This set can be restricted or expanded using various options.
commit. This set can be restricted or expanded using various options below.

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR MAY CHANGE IN THE FUTURE.

Expand All @@ -63,7 +63,16 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.

You may also specify the commit limiting options from linkgit:git-rev-list[1].
`<revision-range>`::
Backfill only blobs reachable from commits in the specified
revision range. When no _<revision-range>_ is specified, it
defaults to `HEAD` (i.e. the whole history leading to the
current commit). For a complete list of ways to spell
_<revision-range>_, see the "Specifying Ranges" section of
linkgit:gitrevisions[7].
+
You may also use commit-limiting options understood by
linkgit:git-rev-list[1] such as `--first-parent`, `--since`, or pathspecs.

SEE ALSO
--------
Expand Down
25 changes: 24 additions & 1 deletion builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
#include "path-walk.h"

static const char * const builtin_backfill_usage[] = {
N_("git backfill [--min-batch-size=<n>] [--[no-]sparse]"),
N_("git backfill [--min-batch-size=<n>] [--[no-]sparse] [<revision-range>]"),
NULL
};

Expand Down Expand Up @@ -78,6 +78,28 @@ static int fill_missing_blobs(const char *path UNUSED,
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 4/15/2026 7:58 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Some rev-list options accepted by setup_revisions() are silently
> ignored or actively counterproductive when used with 'git backfill',
> because the path-walk API has its own tree-walking logic that bypasses
> the mechanisms these options rely on:
> 
>   * -S/-G (pickaxe) and --diff-filter work by computing per-commit
>     diffs in get_revision_1() and filtering commits whose diffs don't
>     match.  Since backfill's goal is to download all blobs reachable
>     from commits in the range, filtering out commits based on diff
>     content would silently skip blobs -- the opposite of what users
>     want.
> 
>   * --follow disables path pruning (revs->prune) and only makes
>     sense for tracking a single file through renames in log output.
>     It has no useful interaction with backfill.
> 
>   * -L (line-log) computes line-level diffs to track the evolution
>     of a function or line range.  Like pickaxe, it filters commits
>     based on diff content, which would cause blobs to be silently
>     skipped.

I think these make a lot of sense, especially because these
computations require downloading missing blobs in order to find
the diffs that justify some of the choices of commit filtering.
 
>   * --diff-merges controls how merge commit diffs are displayed.
>     The path-walk API walks trees directly and never computes
>     per-commit diffs, so this option would be silently ignored.

I think there are a few other "format" based options that were
silently ignored on purpose, because there's no output. Perhaps
we should change the use of options like this to a warning instead
of a failure?

>   * --filter (object filtering, e.g. --filter=blob:none) is used by
>     the list-objects traversal but is completely ignored by the
>     path-walk API, so it would silently do nothing.

This is correct to remove because while it doesn't work with
path-walk right now, it might in the future. We don't want the
filter to mess with the functionality of 'git backfill' that sets
its own scope for which blobs to download.

> Rather than letting users think these options are being honored,
> reject them with a clear error message.

I agree that the majority of these should be hard failures. As
mentioned, some could be soft warnings. That could be an
adjustment to make in the future, so is not blocking for this
patch.

> +static void reject_unsupported_rev_list_options(struct rev_info *revs)
> +{
> +	if (revs->diffopt.pickaxe)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    (revs->diffopt.pickaxe_opts & DIFF_PICKAXE_REGEX) ? "-G" : "-S");
> +	if (revs->diffopt.filter || revs->diffopt.filter_not)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--diff-filter");
> +	if (revs->diffopt.flags.follow_renames)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--follow");
> +	if (revs->line_level_traverse)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "-L");
> +	if (revs->explicit_diff_merges)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--diff-merges");
> +	if (revs->filter.choice)
> +		die(_("'%s' cannot be used with 'git backfill'"),
> +		    "--filter");
> +}
> +

My only nit-pick suggestion is to make the translated string a
macro so it can be more obvious that it is repeated exactly.

Thanks,
-Stolee

}

static void reject_unsupported_rev_list_options(struct rev_info *revs)
{
if (revs->diffopt.pickaxe)
die(_("'%s' cannot be used with 'git backfill'"),
(revs->diffopt.pickaxe_opts & DIFF_PICKAXE_REGEX) ? "-G" : "-S");
if (revs->diffopt.filter || revs->diffopt.filter_not)
die(_("'%s' cannot be used with 'git backfill'"),
"--diff-filter");
if (revs->diffopt.flags.follow_renames)
die(_("'%s' cannot be used with 'git backfill'"),
"--follow");
if (revs->line_level_traverse)
die(_("'%s' cannot be used with 'git backfill'"),
"-L");
if (revs->explicit_diff_merges)
die(_("'%s' cannot be used with 'git backfill'"),
"--diff-merges");
if (revs->filter.choice)
die(_("'%s' cannot be used with 'git backfill'"),
"--filter");
}

static int do_backfill(struct backfill_context *ctx)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
Expand Down Expand Up @@ -144,6 +166,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit

if (argc > 1)
die(_("unrecognized argument: %s"), argv[1]);
reject_unsupported_rev_list_options(&ctx.revs);

repo_config(repo, git_default_config, NULL);

Expand Down