Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Documentation/git-backfill.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);

So we used to "cheat" and did an initialization without even knowing
in which directory we were started ...

> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

... but by initializing the revs correctly in the caller, we would
be correcting it.  Looking good.

> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Kristoffer Haugsbakk" wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>[snip]
> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
> index b8394dcf22..fdfe22d623 100644
> --- a/Documentation/git-backfill.adoc
> +++ b/Documentation/git-backfill.adoc
> @@ -63,9 +63,12 @@ OPTIONS
>  	current sparse-checkout. If the sparse-checkout feature is enabled,
>  	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>
> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
> +
>  SEE ALSO
>  --------
>  linkgit:git-clone[1].
> +linkgit:git-rev-list[1].

Should there be a comma between these two?

>[snip]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 11:37 AM, Kristoffer Haugsbakk wrote:
> On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>> [snip]
>> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
>> index b8394dcf22..fdfe22d623 100644
>> --- a/Documentation/git-backfill.adoc
>> +++ b/Documentation/git-backfill.adoc
>> @@ -63,9 +63,12 @@ OPTIONS
>>   	current sparse-checkout. If the sparse-checkout feature is enabled,
>>   	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>>
>> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
>> +
>>   SEE ALSO
>>   --------
>>   linkgit:git-clone[1].
>> +linkgit:git-rev-list[1].
> > Should there be a comma between these two?

Good catch. Also there shouldn't be a hard stop, either.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 11:37 AM, Kristoffer Haugsbakk wrote:
> On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>> [snip]
>> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
>> index b8394dcf22..fdfe22d623 100644
>> --- a/Documentation/git-backfill.adoc
>> +++ b/Documentation/git-backfill.adoc
>> @@ -63,9 +63,12 @@ OPTIONS
>>   	current sparse-checkout. If the sparse-checkout feature is enabled,
>>   	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>>
>> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
>> +
>>   SEE ALSO
>>   --------
>>   linkgit:git-clone[1].
>> +linkgit:git-rev-list[1].
> > Should there be a comma between these two?

Good catch. Also there shouldn't be a hard stop, either.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
> 
> The existing implementation of 'git backfill' only includes downloading
> missing blobs reachable from HEAD. Advanced uses may desire more general
> commit limiting options, such as '--all' for all references, specifying a
> commit range via negative references, or specifying a recency of use such as
> with '--since=<date>'.
> 
> All of these options are available if we use setup_revisions() to parse the
> unknown arguments with the revision machinery. This opens up a large number
> of possibilities, only a small set of which are tested here.
> 
> For documentation, we avoid duplicating the option documentation and instead
> link to the documentation of 'git rev-list'.
> 
> Note that these arguments currently allow specifying a pathspec, which
> modifies the commit history checks but does not limit the paths used in the
> backfill logic. This will be updated in a future change.

Makes me wonder whether reversing the order would have avoided this
slight awkwardness. But let's just stick with the current order, the end
result would be the same anyway.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  Documentation/git-backfill.adoc |   3 +
>  builtin/backfill.c              |  19 ++--
>  t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>  3 files changed, 172 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..1b5595b27c 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>  		}
>  	}
>  
> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);
> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

Can we use `add_head_to_pending(&ctx->revs)` instead?

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 5:54 AM, Patrick Steinhardt wrote:
> On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The existing implementation of 'git backfill' only includes downloading
>> missing blobs reachable from HEAD. Advanced uses may desire more general
>> commit limiting options, such as '--all' for all references, specifying a
>> commit range via negative references, or specifying a recency of use such as
>> with '--since=<date>'.
>>
>> All of these options are available if we use setup_revisions() to parse the
>> unknown arguments with the revision machinery. This opens up a large number
>> of possibilities, only a small set of which are tested here.
>>
>> For documentation, we avoid duplicating the option documentation and instead
>> link to the documentation of 'git rev-list'.
>>
>> Note that these arguments currently allow specifying a pathspec, which
>> modifies the commit history checks but does not limit the paths used in the
>> backfill logic. This will be updated in a future change.
> > Makes me wonder whether reversing the order would have avoided this
> slight awkwardness. But let's just stick with the current order, the end
> result would be the same anyway.
True, we could have added the pathspec logic first, but we wouldn't be able
to test it right away because the parsing comes through the rev-list.

>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>>   Documentation/git-backfill.adoc |   3 +
>>   builtin/backfill.c              |  19 ++--
>>   t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>>   3 files changed, 172 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..1b5595b27c 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>>   		}
>>   	}
>>   >> -	repo_init_revisions(ctx->repo, &revs, "");
>> -	handle_revision_arg("HEAD", &revs, 0, 0);
>> +	/* Walk from HEAD if otherwise unspecified. */
>> +	if (!ctx->revs.pending.nr)
>> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);
> > Can we use `add_head_to_pending(&ctx->revs)` instead?

Nice. We absolutely can and should.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..90c9d84793 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

We should probably die here in case we still have unknown arguments.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..90c9d84793 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>>  					 builtin_backfill_usage, options);
>>  
>>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
>> -			     0);
>> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
>> +			     PARSE_OPT_KEEP_ARGV0 |
>> +			     PARSE_OPT_KEEP_DASHDASH);
>> +
>> +	repo_init_revisions(repo, &ctx.revs, prefix);
>> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);
> 
> We should probably die here in case we still have unknown arguments.

That is indeed the fix for the bad test in patch 6. I'll make the
necessary update in v3's patch 6 along with the test for it.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
> 
> The existing implementation of 'git backfill' only includes downloading
> missing blobs reachable from HEAD. Advanced uses may desire more general
> commit limiting options, such as '--all' for all references, specifying a
> commit range via negative references, or specifying a recency of use such as
> with '--since=<date>'.
> 
> All of these options are available if we use setup_revisions() to parse the
> unknown arguments with the revision machinery. This opens up a large number
> of possibilities, only a small set of which are tested here.
> 
> For documentation, we avoid duplicating the option documentation and instead
> link to the documentation of 'git rev-list'.
> 
> Note that these arguments currently allow specifying a pathspec, which
> modifies the commit history checks but does not limit the paths used in the
> backfill logic. This will be updated in a future change.

Makes me wonder whether reversing the order would have avoided this
slight awkwardness. But let's just stick with the current order, the end
result would be the same anyway.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  Documentation/git-backfill.adoc |   3 +
>  builtin/backfill.c              |  19 ++--
>  t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>  3 files changed, 172 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..1b5595b27c 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>  		}
>  	}
>  
> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);
> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

Can we use `add_head_to_pending(&ctx->revs)` instead?

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 5:54 AM, Patrick Steinhardt wrote:
> On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The existing implementation of 'git backfill' only includes downloading
>> missing blobs reachable from HEAD. Advanced uses may desire more general
>> commit limiting options, such as '--all' for all references, specifying a
>> commit range via negative references, or specifying a recency of use such as
>> with '--since=<date>'.
>>
>> All of these options are available if we use setup_revisions() to parse the
>> unknown arguments with the revision machinery. This opens up a large number
>> of possibilities, only a small set of which are tested here.
>>
>> For documentation, we avoid duplicating the option documentation and instead
>> link to the documentation of 'git rev-list'.
>>
>> Note that these arguments currently allow specifying a pathspec, which
>> modifies the commit history checks but does not limit the paths used in the
>> backfill logic. This will be updated in a future change.
> > Makes me wonder whether reversing the order would have avoided this
> slight awkwardness. But let's just stick with the current order, the end
> result would be the same anyway.
True, we could have added the pathspec logic first, but we wouldn't be able
to test it right away because the parsing comes through the rev-list.

>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>>   Documentation/git-backfill.adoc |   3 +
>>   builtin/backfill.c              |  19 ++--
>>   t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>>   3 files changed, 172 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..1b5595b27c 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>>   		}
>>   	}
>>   >> -	repo_init_revisions(ctx->repo, &revs, "");
>> -	handle_revision_arg("HEAD", &revs, 0, 0);
>> +	/* Walk from HEAD if otherwise unspecified. */
>> +	if (!ctx->revs.pending.nr)
>> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);
> > Can we use `add_head_to_pending(&ctx->revs)` instead?

Nice. We absolutely can and should.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..90c9d84793 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

We should probably die here in case we still have unknown arguments.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..90c9d84793 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>>  					 builtin_backfill_usage, options);
>>  
>>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
>> -			     0);
>> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
>> +			     PARSE_OPT_KEEP_ARGV0 |
>> +			     PARSE_OPT_KEEP_DASHDASH);
>> +
>> +	repo_init_revisions(repo, &ctx.revs, prefix);
>> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);
> 
> We should probably die here in case we still have unknown arguments.

That is indeed the fix for the bad test in patch 6. I'll make the
necessary update in v3's patch 6 along with the test for it.

Thanks,
-Stolee

then `--sparse` is assumed and can be disabled with `--no-sparse`.

You may also specify the commit limiting options from linkgit:git-rev-list[1].

SEE ALSO
--------
linkgit:git-clone[1].
linkgit:git-clone[1],
linkgit:git-rev-list[1]

GIT
---
Expand Down
22 changes: 16 additions & 6 deletions builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
struct rev_info revs;
};

static void backfill_context_clear(struct backfill_context *ctx)
Expand Down Expand Up @@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,

static int do_backfill(struct backfill_context *ctx)
{
struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;

Expand All @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}

repo_init_revisions(ctx->repo, &revs, "");
handle_revision_arg("HEAD", &revs, 0, 0);
/* Walk from HEAD if otherwise unspecified. */
if (!ctx->revs.pending.nr)
add_head_to_pending(&ctx->revs);

info.blobs = 1;
info.tags = info.commits = info.trees = 0;

info.revs = &revs;
info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;

Expand All @@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);

path_walk_info_clear(&info);
release_revisions(&revs);
return ret;
}

Expand All @@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
.revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
Expand All @@ -134,7 +135,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);

argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
0);
PARSE_OPT_KEEP_UNKNOWN_OPT |
PARSE_OPT_KEEP_ARGV0 |
PARSE_OPT_KEEP_DASHDASH);

repo_init_revisions(repo, &ctx.revs, prefix);
argc = setup_revisions(argc, argv, &ctx.revs, NULL);

if (argc > 1)
die(_("unrecognized argument: %s"), argv[1]);

repo_config(repo, git_default_config, NULL);

Expand All @@ -143,5 +152,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit

result = do_backfill(&ctx);
backfill_context_clear(&ctx);
release_revisions(&ctx.revs);
return result;
}
43 changes: 43 additions & 0 deletions path-walk.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "list-objects.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..0d640e2f24 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
>  				 match != MATCHED)
>  				continue;
>  		}
> +		if (ctx->revs->prune_data.nr) {
> +			struct pathspec *pd = &ctx->revs->prune_data;
> +			bool found = false;
> +
> +			/* remove '/' for these checks. */
> +			path.buf[path.len - 1] = 0;

Hm. Is this _always_ safe to do? We add the directory separator a few
lines further up, but only in the case where `type == OBJ_TREE`. So in
reverse this may mean that there are cases where we don't have a
trailing '/'.

Maybe we should instead:

    did_strip_suffix = strbuf_strip_suffix(path, "/");

    ...

    if (did_strip_suffix)
        strbuf_addch(path, "/");

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/path-walk.c b/path-walk.c
>> index 364e4cfa19..0d640e2f24 100644
>> --- a/path-walk.c
>> +++ b/path-walk.c
>> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
>>  				 match != MATCHED)
>>  				continue;
>>  		}
>> +		if (ctx->revs->prune_data.nr) {
>> +			struct pathspec *pd = &ctx->revs->prune_data;
>> +			bool found = false;
>> +
>> +			/* remove '/' for these checks. */
>> +			path.buf[path.len - 1] = 0;
> 
> Hm. Is this _always_ safe to do? We add the directory separator a few
> lines further up, but only in the case where `type == OBJ_TREE`. So in
> reverse this may mean that there are cases where we don't have a
> trailing '/'.
> 
> Maybe we should instead:
> 
>     did_strip_suffix = strbuf_strip_suffix(path, "/");
> 
>     ...
> 
>     if (did_strip_suffix)
>         strbuf_addch(path, "/");

This is much cleaner, too! Thanks.

-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..0d640e2f24 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
>  				 match != MATCHED)
>  				continue;
>  		}
> +		if (ctx->revs->prune_data.nr) {
> +			struct pathspec *pd = &ctx->revs->prune_data;
> +			bool found = false;
> +
> +			/* remove '/' for these checks. */
> +			path.buf[path.len - 1] = 0;

Hm. Is this _always_ safe to do? We add the directory separator a few
lines further up, but only in the case where `type == OBJ_TREE`. So in
reverse this may mean that there are cases where we don't have a
trailing '/'.

Maybe we should instead:

    did_strip_suffix = strbuf_strip_suffix(path, "/");

    ...

    if (did_strip_suffix)
        strbuf_addch(path, "/");

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/path-walk.c b/path-walk.c
>> index 364e4cfa19..0d640e2f24 100644
>> --- a/path-walk.c
>> +++ b/path-walk.c
>> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
>>  				 match != MATCHED)
>>  				continue;
>>  		}
>> +		if (ctx->revs->prune_data.nr) {
>> +			struct pathspec *pd = &ctx->revs->prune_data;
>> +			bool found = false;
>> +
>> +			/* remove '/' for these checks. */
>> +			path.buf[path.len - 1] = 0;
> 
> Hm. Is this _always_ safe to do? We add the directory separator a few
> lines further up, but only in the case where `type == OBJ_TREE`. So in
> reverse this may mean that there are cases where we don't have a
> trailing '/'.
> 
> Maybe we should instead:
> 
>     did_strip_suffix = strbuf_strip_suffix(path, "/");
> 
>     ...
> 
>     if (did_strip_suffix)
>         strbuf_addch(path, "/");

This is much cleaner, too! Thanks.

-Stolee

#include "object.h"
#include "oid-array.h"
#include "path.h"
#include "prio-queue.h"
#include "repository.h"
#include "revision.h"
Expand Down Expand Up @@ -62,6 +63,8 @@ struct path_walk_context {
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> Previously, walk_objects_by_path() silently ignored pathspecs containing
> wildcards or magic by clearing them. This caused all blobs to be
> downloaded regardless of the given pathspec. Wildcard pathspecs like
> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
> during 'git backfill').
>
> Support wildcard pathspecs by making three changes:
>
>  1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>     pathspec has no wildcards or magic, set this flag and use the
>     existing fast-path prefix matching in add_tree_entries(). When
>     wildcards are present, skip that block since prefix matching
>     cannot handle glob patterns.
>
>  2. Disable revision-level commit pruning (revs->prune = 0) for
>     wildcard pathspecs. The revision walk uses the pathspec to filter
>     commits via TREESAME detection. For exact prefix pathspecs this
>     works well, but wildcard pathspecs may fail to match through
>     TREESAME because fnmatch with WM_PATHNAME does not cross directory
>     boundaries. Disabling pruning ensures all commits are visited and
>     their trees are available for the path-walk to filter.

Hmph, I wonder how significant an impact does it have on the
performance that we have to disable pruning here.  With the bog
standard tree traversal, wouldn't tree_entry_interesting() already
be capable of doing this, even with fnmatch / WM_PATHNAME ?

>  3. Add a match_pathspec() check in walk_path() to filter out blobs
>     whose full path does not match the pathspec. This provides the
>     actual blob-level filtering for wildcard pathspecs.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The latter person cannot sign DCO or vouch for the origin of what
they have written in this patch, can they?

> ---
>  path-walk.c         | 22 ++++++++++++++--------
>  t/t5620-backfill.sh |  7 +++----
>  2 files changed, 17 insertions(+), 12 deletions(-)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/17/2026 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>> wildcards or magic by clearing them. This caused all blobs to be
>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>> during 'git backfill').
>>
>> Support wildcard pathspecs by making three changes:
>>
>>  1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>>     pathspec has no wildcards or magic, set this flag and use the
>>     existing fast-path prefix matching in add_tree_entries(). When
>>     wildcards are present, skip that block since prefix matching
>>     cannot handle glob patterns.
>>
>>  2. Disable revision-level commit pruning (revs->prune = 0) for
>>     wildcard pathspecs. The revision walk uses the pathspec to filter
>>     commits via TREESAME detection. For exact prefix pathspecs this
>>     works well, but wildcard pathspecs may fail to match through
>>     TREESAME because fnmatch with WM_PATHNAME does not cross directory
>>     boundaries. Disabling pruning ensures all commits are visited and
>>     their trees are available for the path-walk to filter.
> 
> Hmph, I wonder how significant an impact does it have on the
> performance that we have to disable pruning here.  With the bog
> standard tree traversal, wouldn't tree_entry_interesting() already
> be capable of doing this, even with fnmatch / WM_PATHNAME ?

I will explore what's possible here and see what I can do.

>>  3. Add a match_pathspec() check in walk_path() to filter out blobs
>>     whose full path does not match the pathspec. This provides the
>>     actual blob-level filtering for wildcard pathspecs.
>>
>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
> 
> The latter person cannot sign DCO or vouch for the origin of what
> they have written in this patch, can they?
No they cannot. Sorry for this error.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 9:16 AM, Derrick Stolee wrote:
> On 3/17/2026 6:19 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>>> wildcards or magic by clearing them. This caused all blobs to be
>>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>>> during 'git backfill').
>>>
>>> Support wildcard pathspecs by making three changes:
>>>
>>>   1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>>>      pathspec has no wildcards or magic, set this flag and use the
>>>      existing fast-path prefix matching in add_tree_entries(). When
>>>      wildcards are present, skip that block since prefix matching
>>>      cannot handle glob patterns.
>>>
>>>   2. Disable revision-level commit pruning (revs->prune = 0) for
>>>      wildcard pathspecs. The revision walk uses the pathspec to filter
>>>      commits via TREESAME detection. For exact prefix pathspecs this
>>>      works well, but wildcard pathspecs may fail to match through
>>>      TREESAME because fnmatch with WM_PATHNAME does not cross directory
>>>      boundaries. Disabling pruning ensures all commits are visited and
>>>      their trees are available for the path-walk to filter.
>>
>> Hmph, I wonder how significant an impact does it have on the
>> performance that we have to disable pruning here.  With the bog
>> standard tree traversal, wouldn't tree_entry_interesting() already
>> be capable of doing this, even with fnmatch / WM_PATHNAME ?
> > I will explore what's possible here and see what I can do.

I must have needed the 'revs->prune = 0' at some point during development
and left it even though it isn't actually necessary. Leaving it
implicitly at '1' should indeed be faster due to traversing fewer commits
and parsing fewer trees while still reaching all necessary blobs.

Only changes 1 and 3 are necessary.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 9:16 AM, Derrick Stolee wrote:
> On 3/17/2026 6:19 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>>> wildcards or magic by clearing them. This caused all blobs to be
>>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>>> during 'git backfill').
>>>
>>> Support wildcard pathspecs by making three changes:
>>>
>>>   1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>>>      pathspec has no wildcards or magic, set this flag and use the
>>>      existing fast-path prefix matching in add_tree_entries(). When
>>>      wildcards are present, skip that block since prefix matching
>>>      cannot handle glob patterns.
>>>
>>>   2. Disable revision-level commit pruning (revs->prune = 0) for
>>>      wildcard pathspecs. The revision walk uses the pathspec to filter
>>>      commits via TREESAME detection. For exact prefix pathspecs this
>>>      works well, but wildcard pathspecs may fail to match through
>>>      TREESAME because fnmatch with WM_PATHNAME does not cross directory
>>>      boundaries. Disabling pruning ensures all commits are visited and
>>>      their trees are available for the path-walk to filter.
>>
>> Hmph, I wonder how significant an impact does it have on the
>> performance that we have to disable pruning here.  With the bog
>> standard tree traversal, wouldn't tree_entry_interesting() already
>> be capable of doing this, even with fnmatch / WM_PATHNAME ?
> > I will explore what's possible here and see what I can do.

I must have needed the 'revs->prune = 0' at some point during development
and left it even though it isn't actually necessary. Leaving it
implicitly at '1' should indeed be faster due to traversing fewer commits
and parsing fewer trees while still reaching all necessary blobs.

Only changes 1 and 3 are necessary.

Thanks,
-Stolee

struct prio_queue path_stack;
struct strset path_stack_pushed;

unsigned exact_pathspecs:1;
};

static int compare_by_type(const void *one, const void *two, void *cb_data)
Expand Down Expand Up @@ -206,6 +209,33 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> The previous change allowed specifying revision arguments over the 'git
> backfill' command-line. This created the opportunity for pathspecs that
> specify a smaller set of starting commits, but otherwise did not restrict
> the blob paths that were downloaded.

"pathspecs that specify a smaller set of starting commits" is
puzzling, as starting commits would be coming from the revision
arguments.  "opportunity for pathspec to further filter commits
to those that touch only the matching paths...", or something?

> Update the path-walk API to accept certain kinds of pathspecs and to
> silently ignore anything too complex.

Hmph, "silently ignore", instead of "no, you cannot use that! and
die", or at least "sorry, I cannot do that, so the result may not be
what you wanted, you've been warned"?

> The current behavior focuses on
> pathspecs that match paths exactly. This includes exact filenames,
> including directory names as prefixes. Pathspecs containing wildcards
> or magic are cleared so the path walk downloads all blobs, as before.

Ah, "we punt and lift the limitation to grab everything, so at least
everything you wanted to have will become available to you, even
though we may download more than what you asked"?  OK, users would
survive that, and as we improve the pathspec support, the user
experience would only improve.  OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/17/2026 6:10 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The previous change allowed specifying revision arguments over the 'git
>> backfill' command-line. This created the opportunity for pathspecs that
>> specify a smaller set of starting commits, but otherwise did not restrict
>> the blob paths that were downloaded.
> 
> "pathspecs that specify a smaller set of starting commits" is
> puzzling, as starting commits would be coming from the revision
> arguments.  "opportunity for pathspec to further filter commits
> to those that touch only the matching paths...", or something?

You're right. I'm using "starting commits" incorrectly. My view was too
focused on how the path-walk API starts from the commits output by the
revision walk to get a list of root trees and then walks by path from
that point.

I'll reword to make this more clear.

>> Update the path-walk API to accept certain kinds of pathspecs and to
>> silently ignore anything too complex.
> 
> Hmph, "silently ignore", instead of "no, you cannot use that! and
> die", or at least "sorry, I cannot do that, so the result may not be
> what you wanted, you've been warned"?

The behavior when silently ignoring is to over-download. The revision
walk still filters commits, but the path-walk then walks paths beyond
that pathspec. This will be fixed in the next commit, so adding an
error case didn't seem worth it. I'll do a better job foreshadowing.

>> The current behavior focuses on
>> pathspecs that match paths exactly. This includes exact filenames,
>> including directory names as prefixes. Pathspecs containing wildcards
>> or magic are cleared so the path walk downloads all blobs, as before.
> 
> Ah, "we punt and lift the limitation to grab everything, so at least
> everything you wanted to have will become available to you, even
> though we may download more than what you asked"?  OK, users would
> survive that, and as we improve the pathspec support, the user
> experience would only improve.  OK.

Exactly. I can word things better.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Wed, Mar 18, 2026 at 09:15:00AM -0400, Derrick Stolee wrote:
> On 3/17/2026 6:10 PM, Junio C Hamano wrote:
> >> Update the path-walk API to accept certain kinds of pathspecs and to
> >> silently ignore anything too complex.
> > 
> > Hmph, "silently ignore", instead of "no, you cannot use that! and
> > die", or at least "sorry, I cannot do that, so the result may not be
> > what you wanted, you've been warned"?
> 
> The behavior when silently ignoring is to over-download. The revision
> walk still filters commits, but the path-walk then walks paths beyond
> that pathspec. This will be fixed in the next commit, so adding an
> error case didn't seem worth it. I'll do a better job foreshadowing.

I guess this is a fine tradeoff when documented properly. But I think in
that case we should make very clear that this behaviour may change in
the future if find a way to efficiently limit the pathwalk, too.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Wed, Mar 18, 2026 at 09:15:00AM -0400, Derrick Stolee wrote:
> On 3/17/2026 6:10 PM, Junio C Hamano wrote:
> >> Update the path-walk API to accept certain kinds of pathspecs and to
> >> silently ignore anything too complex.
> > 
> > Hmph, "silently ignore", instead of "no, you cannot use that! and
> > die", or at least "sorry, I cannot do that, so the result may not be
> > what you wanted, you've been warned"?
> 
> The behavior when silently ignoring is to over-download. The revision
> walk still filters commits, but the path-walk then walks paths beyond
> that pathspec. This will be fixed in the next commit, so adding an
> error case didn't seem worth it. I'll do a better job foreshadowing.

I guess this is a fine tradeoff when documented properly. But I think in
that case we should make very clear that this behaviour may change in
the future if find a way to efficiently limit the pathwalk, too.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -481,6 +524,18 @@ int walk_objects_by_path(struct path_walk_info *info)
>  	if (info->tags)
>  		info->revs->tag_objects = 1;
>  
> +	if (ctx.revs->prune_data.nr) {
> +		/*
> +		 * Only exact prefix pathspecs are currently supported.
> +		 * Clear any wildcard or magic pathspecs to avoid
> +		 * incorrect prefix matching.
> +		 */
> +		struct pathspec *pd = &ctx.revs->prune_data;
> +
> +		if (pd->has_wildcard || pd->magic)
> +			pd->nr = 0;
> +	}

Huh, curious. Won't this cause a leak? I guess we should rather use
`clear_pathspec()` here.

Also shows that this path is missing test coverage.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +206,49 @@ static int add_tree_entries(struct path_walk_context *ctx,
>  				 match != MATCHED)
>  				continue;
>  		}
> +		if (ctx->revs->prune_data.nr) {
> +			struct pathspec *pd = &ctx->revs->prune_data;
> +			bool found = false;
> +
> +			for (int i = 0; i < pd->nr; i++) {
> +				struct pathspec_item *item = &pd->items[i];
> +
> +				/*
> +				 * Is this path a parent directory of
> +				 * the pathspec item?
> +				 */
> +				if (path.len < (size_t)item->len &&
> +				    !strncmp(path.buf, item->match, path.len) &&
> +				    item->match[path.len - 1] == '/') {
> +					found = true;
> +					break;
> +				}
> +
> +				/*
> +				 * Or, is the pathspec an exact match?
> +				 */
> +				if (path.len == (size_t)item->len &&
> +				    !strcmp(path.buf, item->match)) {
> +					found = true;
> +					break;
> +				}
> +
> +				/*
> +				 * Or, is the pathspec a directory prefix
> +				 * match?
> +				 */
> +				if (path.len > (size_t)item->len &&
> +				    !strncmp(path.buf, item->match, item->len) &&
> +				    path.buf[item->len] == '/') {
> +					found = true;
> +					break;
> +				}

Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
reuse it here.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 6:15 AM, Patrick Steinhardt wrote:
> Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
> reuse it here.

Good idea. This becomes

			/*
			 * Continue if either is a directory prefix
			 * of the other.
			 */
			if (dir_prefix(path.buf, item->match) ||
			    dir_prefix(item->match, path.buf)) {
				found = true;
				break;
			}

With the idea that we need to walk the parents of each prefix in
addition to walking all of their children.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -481,6 +524,18 @@ int walk_objects_by_path(struct path_walk_info *info)
>  	if (info->tags)
>  		info->revs->tag_objects = 1;
>  
> +	if (ctx.revs->prune_data.nr) {
> +		/*
> +		 * Only exact prefix pathspecs are currently supported.
> +		 * Clear any wildcard or magic pathspecs to avoid
> +		 * incorrect prefix matching.
> +		 */
> +		struct pathspec *pd = &ctx.revs->prune_data;
> +
> +		if (pd->has_wildcard || pd->magic)
> +			pd->nr = 0;
> +	}

Huh, curious. Won't this cause a leak? I guess we should rather use
`clear_pathspec()` here.

Also shows that this path is missing test coverage.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +206,49 @@ static int add_tree_entries(struct path_walk_context *ctx,
>  				 match != MATCHED)
>  				continue;
>  		}
> +		if (ctx->revs->prune_data.nr) {
> +			struct pathspec *pd = &ctx->revs->prune_data;
> +			bool found = false;
> +
> +			for (int i = 0; i < pd->nr; i++) {
> +				struct pathspec_item *item = &pd->items[i];
> +
> +				/*
> +				 * Is this path a parent directory of
> +				 * the pathspec item?
> +				 */
> +				if (path.len < (size_t)item->len &&
> +				    !strncmp(path.buf, item->match, path.len) &&
> +				    item->match[path.len - 1] == '/') {
> +					found = true;
> +					break;
> +				}
> +
> +				/*
> +				 * Or, is the pathspec an exact match?
> +				 */
> +				if (path.len == (size_t)item->len &&
> +				    !strcmp(path.buf, item->match)) {
> +					found = true;
> +					break;
> +				}
> +
> +				/*
> +				 * Or, is the pathspec a directory prefix
> +				 * match?
> +				 */
> +				if (path.len > (size_t)item->len &&
> +				    !strncmp(path.buf, item->match, item->len) &&
> +				    path.buf[item->len] == '/') {
> +					found = true;
> +					break;
> +				}

Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
reuse it here.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 6:15 AM, Patrick Steinhardt wrote:
> Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
> reuse it here.

Good idea. This becomes

			/*
			 * Continue if either is a directory prefix
			 * of the other.
			 */
			if (dir_prefix(path.buf, item->match) ||
			    dir_prefix(item->match, path.buf)) {
				found = true;
				break;
			}

With the idea that we need to walk the parents of each prefix in
addition to walking all of their children.

Thanks,
-Stolee

continue;
}
if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
int did_strip_suffix = strbuf_strip_suffix(&path, "/");


for (int i = 0; i < pd->nr; i++) {
struct pathspec_item *item = &pd->items[i];

/*
* Continue if either is a directory prefix
* of the other.
*/
if (dir_prefix(path.buf, item->match) ||
dir_prefix(item->match, path.buf)) {
found = true;
break;
}
}

if (did_strip_suffix)
strbuf_addch(&path, '/');

/* Skip paths that do not match the prefix. */
if (!found)
continue;
}

add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
Expand Down Expand Up @@ -274,6 +304,13 @@ static int walk_path(struct path_walk_context *ctx,
return 0;
}

if (list->type == OBJ_BLOB &&
ctx->revs->prune_data.nr &&
!match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
path, strlen(path), 0,
NULL, 0))
return 0;

/* Evaluate function pointer on this data, if requested. */
if ((list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
Expand Down Expand Up @@ -481,6 +518,12 @@ int walk_objects_by_path(struct path_walk_info *info)
if (info->tags)
info->revs->tag_objects = 1;

if (ctx.revs->prune_data.nr) {
if (!ctx.revs->prune_data.has_wildcard &&
!ctx.revs->prune_data.magic)
ctx.exact_pathspecs = 1;
}

/* Insert a single list for the root tree into the paths. */
CALLOC_ARRAY(root_tree_list, 1);
root_tree_list->type = OBJ_TREE;
Expand Down
2 changes: 1 addition & 1 deletion path.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ static void strbuf_cleanup_path(struct strbuf *sb)
strbuf_remove(sb, 0, path - sb->buf);
}

static int dir_prefix(const char *buf, const char *dir)
int dir_prefix(const char *buf, const char *dir)
{
int len = strlen(dir);
return !strncmp(buf, dir, len) &&
Expand Down
6 changes: 6 additions & 0 deletions path.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,12 @@ const char *repo_submodule_path_replace(struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 4, 5)));

/*
* Given a directory name 'dir' (not ending with a trailing '/'),
* determine if 'buf' is equal to 'dir' or has prefix 'dir'+'/'.
*/
int dir_prefix(const char *buf, const char *dir);

void report_linked_checkout_garbage(struct repository *r);

/*
Expand Down
1 change: 1 addition & 0 deletions revision.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "commit.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
> defined in object-name.h. Include it in revision.h so consumers of
> REV_INFO_INIT do not need to include this hidden dependency.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  revision.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/revision.h b/revision.h
> index b36acfc2d9..18c9bbd822 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -4,6 +4,7 @@
>  #include "commit.h"
>  #include "grep.h"
>  #include "notes.h"
> +#include "object-name.h"
>  #include "oidset.h"
>  #include "pretty.h"
>  #include "diff.h"

OK.  Other symbols REV_INFO_INIT needs are REV_SORT_IN_GRAPH_ORDER
(in <commit.h>), CMIT_FMT_DEFAULT (in <pretty.h>), and STRVEC_INIT
(in <strvec.h>), and all three are already included there.

Makes sense.

#include "grep.h"
#include "notes.h"
#include "object-name.h"
#include "oidset.h"
#include "pretty.h"
#include "diff.h"
Expand Down
Loading
Loading