Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Documentation/git-backfill.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);

So we used to "cheat" and did an initialization without even knowing
in which directory we were started ...

> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

... but by initializing the revs correctly in the caller, we would
be correcting it.  Looking good.

> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

OK.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Kristoffer Haugsbakk" wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>[snip]
> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
> index b8394dcf22..fdfe22d623 100644
> --- a/Documentation/git-backfill.adoc
> +++ b/Documentation/git-backfill.adoc
> @@ -63,9 +63,12 @@ OPTIONS
>  	current sparse-checkout. If the sparse-checkout feature is enabled,
>  	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>
> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
> +
>  SEE ALSO
>  --------
>  linkgit:git-clone[1].
> +linkgit:git-rev-list[1].

Should there be a comma between these two?

>[snip]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 11:37 AM, Kristoffer Haugsbakk wrote:
> On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>> [snip]
>> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
>> index b8394dcf22..fdfe22d623 100644
>> --- a/Documentation/git-backfill.adoc
>> +++ b/Documentation/git-backfill.adoc
>> @@ -63,9 +63,12 @@ OPTIONS
>>   	current sparse-checkout. If the sparse-checkout feature is enabled,
>>   	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>>
>> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
>> +
>>   SEE ALSO
>>   --------
>>   linkgit:git-clone[1].
>> +linkgit:git-rev-list[1].
> > Should there be a comma between these two?

Good catch. Also there shouldn't be a hard stop, either.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/18/26 11:37 AM, Kristoffer Haugsbakk wrote:
> On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>> [snip]
>> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
>> index b8394dcf22..fdfe22d623 100644
>> --- a/Documentation/git-backfill.adoc
>> +++ b/Documentation/git-backfill.adoc
>> @@ -63,9 +63,12 @@ OPTIONS
>>   	current sparse-checkout. If the sparse-checkout feature is enabled,
>>   	then `--sparse` is assumed and can be disabled with `--no-sparse`.
>>
>> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
>> +
>>   SEE ALSO
>>   --------
>>   linkgit:git-clone[1].
>> +linkgit:git-rev-list[1].
> > Should there be a comma between these two?

Good catch. Also there shouldn't be a hard stop, either.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
> 
> The existing implementation of 'git backfill' only includes downloading
> missing blobs reachable from HEAD. Advanced uses may desire more general
> commit limiting options, such as '--all' for all references, specifying a
> commit range via negative references, or specifying a recency of use such as
> with '--since=<date>'.
> 
> All of these options are available if we use setup_revisions() to parse the
> unknown arguments with the revision machinery. This opens up a large number
> of possibilities, only a small set of which are tested here.
> 
> For documentation, we avoid duplicating the option documentation and instead
> link to the documentation of 'git rev-list'.
> 
> Note that these arguments currently allow specifying a pathspec, which
> modifies the commit history checks but does not limit the paths used in the
> backfill logic. This will be updated in a future change.

Makes me wonder whether reversing the order would have avoided this
slight awkwardness. But let's just stick with the current order, the end
result would be the same anyway.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  Documentation/git-backfill.adoc |   3 +
>  builtin/backfill.c              |  19 ++--
>  t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>  3 files changed, 172 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..1b5595b27c 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>  		}
>  	}
>  
> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);
> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

Can we use `add_head_to_pending(&ctx->revs)` instead?

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 5:54 AM, Patrick Steinhardt wrote:
> On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The existing implementation of 'git backfill' only includes downloading
>> missing blobs reachable from HEAD. Advanced uses may desire more general
>> commit limiting options, such as '--all' for all references, specifying a
>> commit range via negative references, or specifying a recency of use such as
>> with '--since=<date>'.
>>
>> All of these options are available if we use setup_revisions() to parse the
>> unknown arguments with the revision machinery. This opens up a large number
>> of possibilities, only a small set of which are tested here.
>>
>> For documentation, we avoid duplicating the option documentation and instead
>> link to the documentation of 'git rev-list'.
>>
>> Note that these arguments currently allow specifying a pathspec, which
>> modifies the commit history checks but does not limit the paths used in the
>> backfill logic. This will be updated in a future change.
> > Makes me wonder whether reversing the order would have avoided this
> slight awkwardness. But let's just stick with the current order, the end
> result would be the same anyway.
True, we could have added the pathspec logic first, but we wouldn't be able
to test it right away because the parsing comes through the rev-list.

>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>>   Documentation/git-backfill.adoc |   3 +
>>   builtin/backfill.c              |  19 ++--
>>   t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>>   3 files changed, 172 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..1b5595b27c 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>>   		}
>>   	}
>>   >> -	repo_init_revisions(ctx->repo, &revs, "");
>> -	handle_revision_arg("HEAD", &revs, 0, 0);
>> +	/* Walk from HEAD if otherwise unspecified. */
>> +	if (!ctx->revs.pending.nr)
>> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);
> > Can we use `add_head_to_pending(&ctx->revs)` instead?

Nice. We absolutely can and should.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..90c9d84793 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

We should probably die here in case we still have unknown arguments.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..90c9d84793 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>>  					 builtin_backfill_usage, options);
>>  
>>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
>> -			     0);
>> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
>> +			     PARSE_OPT_KEEP_ARGV0 |
>> +			     PARSE_OPT_KEEP_DASHDASH);
>> +
>> +	repo_init_revisions(repo, &ctx.revs, prefix);
>> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);
> 
> We should probably die here in case we still have unknown arguments.

That is indeed the fix for the bad test in patch 6. I'll make the
necessary update in v3's patch 6 along with the test for it.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
> 
> The existing implementation of 'git backfill' only includes downloading
> missing blobs reachable from HEAD. Advanced uses may desire more general
> commit limiting options, such as '--all' for all references, specifying a
> commit range via negative references, or specifying a recency of use such as
> with '--since=<date>'.
> 
> All of these options are available if we use setup_revisions() to parse the
> unknown arguments with the revision machinery. This opens up a large number
> of possibilities, only a small set of which are tested here.
> 
> For documentation, we avoid duplicating the option documentation and instead
> link to the documentation of 'git rev-list'.
> 
> Note that these arguments currently allow specifying a pathspec, which
> modifies the commit history checks but does not limit the paths used in the
> backfill logic. This will be updated in a future change.

Makes me wonder whether reversing the order would have avoided this
slight awkwardness. But let's just stick with the current order, the end
result would be the same anyway.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  Documentation/git-backfill.adoc |   3 +
>  builtin/backfill.c              |  19 ++--
>  t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>  3 files changed, 172 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..1b5595b27c 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>  		}
>  	}
>  
> -	repo_init_revisions(ctx->repo, &revs, "");
> -	handle_revision_arg("HEAD", &revs, 0, 0);
> +	/* Walk from HEAD if otherwise unspecified. */
> +	if (!ctx->revs.pending.nr)
> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);

Can we use `add_head_to_pending(&ctx->revs)` instead?

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/19/26 5:54 AM, Patrick Steinhardt wrote:
> On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The existing implementation of 'git backfill' only includes downloading
>> missing blobs reachable from HEAD. Advanced uses may desire more general
>> commit limiting options, such as '--all' for all references, specifying a
>> commit range via negative references, or specifying a recency of use such as
>> with '--since=<date>'.
>>
>> All of these options are available if we use setup_revisions() to parse the
>> unknown arguments with the revision machinery. This opens up a large number
>> of possibilities, only a small set of which are tested here.
>>
>> For documentation, we avoid duplicating the option documentation and instead
>> link to the documentation of 'git rev-list'.
>>
>> Note that these arguments currently allow specifying a pathspec, which
>> modifies the commit history checks but does not limit the paths used in the
>> backfill logic. This will be updated in a future change.
> > Makes me wonder whether reversing the order would have avoided this
> slight awkwardness. But let's just stick with the current order, the end
> result would be the same anyway.
True, we could have added the pathspec logic first, but we wouldn't be able
to test it right away because the parsing comes through the rev-list.

>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>>   Documentation/git-backfill.adoc |   3 +
>>   builtin/backfill.c              |  19 ++--
>>   t/t5620-backfill.sh             | 156 ++++++++++++++++++++++++++++++++
>>   3 files changed, 172 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..1b5595b27c 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>>   		}
>>   	}
>>   >> -	repo_init_revisions(ctx->repo, &revs, "");
>> -	handle_revision_arg("HEAD", &revs, 0, 0);
>> +	/* Walk from HEAD if otherwise unspecified. */
>> +	if (!ctx->revs.pending.nr)
>> +		handle_revision_arg("HEAD", &ctx->revs, 0, 0);
> > Can we use `add_head_to_pending(&ctx->revs)` instead?

Nice. We absolutely can and should.

Thanks,
-Stolee

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..90c9d84793 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  					 builtin_backfill_usage, options);
>  
>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> -			     0);
> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
> +			     PARSE_OPT_KEEP_ARGV0 |
> +			     PARSE_OPT_KEEP_DASHDASH);
> +
> +	repo_init_revisions(repo, &ctx.revs, prefix);
> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);

We should probably die here in case we still have unknown arguments.

Patrick

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derrick Stolee wrote on the Git mailing list (how to reply to this email):

On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..90c9d84793 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>>  					 builtin_backfill_usage, options);
>>  
>>  	argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
>> -			     0);
>> +			     PARSE_OPT_KEEP_UNKNOWN_OPT |
>> +			     PARSE_OPT_KEEP_ARGV0 |
>> +			     PARSE_OPT_KEEP_DASHDASH);
>> +
>> +	repo_init_revisions(repo, &ctx.revs, prefix);
>> +	argc = setup_revisions(argc, argv, &ctx.revs, NULL);
> 
> We should probably die here in case we still have unknown arguments.

That is indeed the fix for the bad test in patch 6. I'll make the
necessary update in v3's patch 6 along with the test for it.

Thanks,
-Stolee

then `--sparse` is assumed and can be disabled with `--no-sparse`.

You may also specify the commit limiting options from linkgit:git-rev-list[1].

SEE ALSO
--------
linkgit:git-clone[1].
linkgit:git-clone[1],
linkgit:git-rev-list[1]

GIT
---
Expand Down
19 changes: 13 additions & 6 deletions builtin/backfill.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
struct rev_info revs;
};

static void backfill_context_clear(struct backfill_context *ctx)
Expand Down Expand Up @@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,

static int do_backfill(struct backfill_context *ctx)
{
struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;

Expand All @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}

repo_init_revisions(ctx->repo, &revs, "");
handle_revision_arg("HEAD", &revs, 0, 0);
/* Walk from HEAD if otherwise unspecified. */
if (!ctx->revs.pending.nr)
add_head_to_pending(&ctx->revs);

info.blobs = 1;
info.tags = info.commits = info.trees = 0;

info.revs = &revs;
info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;

Expand All @@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);

path_walk_info_clear(&info);
release_revisions(&revs);
return ret;
}

Expand All @@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
.revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
Expand All @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);

argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
0);
PARSE_OPT_KEEP_UNKNOWN_OPT |
PARSE_OPT_KEEP_ARGV0 |
PARSE_OPT_KEEP_DASHDASH);

repo_init_revisions(repo, &ctx.revs, prefix);
argc = setup_revisions(argc, argv, &ctx.revs, NULL);

repo_config(repo, git_default_config, NULL);

Expand All @@ -143,5 +149,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit

result = do_backfill(&ctx);
backfill_context_clear(&ctx);
release_revisions(&ctx.revs);
return result;
}
1 change: 1 addition & 0 deletions revision.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "commit.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
> defined in object-name.h. Include it in revision.h so consumers of
> REV_INFO_INIT do not need to include this hidden dependency.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  revision.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/revision.h b/revision.h
> index b36acfc2d9..18c9bbd822 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -4,6 +4,7 @@
>  #include "commit.h"
>  #include "grep.h"
>  #include "notes.h"
> +#include "object-name.h"
>  #include "oidset.h"
>  #include "pretty.h"
>  #include "diff.h"

OK.  Other symbols REV_INFO_INIT needs are REV_SORT_IN_GRAPH_ORDER
(in <commit.h>), CMIT_FMT_DEFAULT (in <pretty.h>), and STRVEC_INIT
(in <strvec.h>), and all three are already included there.

Makes sense.

#include "grep.h"
#include "notes.h"
#include "object-name.h"
#include "oidset.h"
#include "pretty.h"
#include "diff.h"
Expand Down
208 changes: 206 additions & 2 deletions t/t5620-backfill.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ test_expect_success 'setup repo for object creation' '
git init src &&

mkdir -p src/a/b/c &&
mkdir -p src/d/e &&
mkdir -p src/d/f &&

for i in 1 2
do
Expand All @@ -26,8 +26,9 @@ test_expect_success 'setup repo for object creation' '
echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt &&
echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt &&
echo "Version $i of file d/$n" > src/d/file.$n.txt &&
echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt &&
echo "Version $i of file d/f/$n" > src/d/f/file.$n.txt &&
git -C src add . &&
test_tick &&
git -C src commit -m "Iteration $n" || return 1
done
done
Expand All @@ -41,6 +42,53 @@ test_expect_success 'setup bare clone for server' '
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
'

# Create a version of the repo with branches for testing revision
# arguments like --all, --first-parent, and --since.
#
# main: 8 commits (linear) + merge of side branch
# 48 original blobs + 4 side blobs = 52 blobs from main HEAD
# side: 2 commits adding s/file.{1,2}.txt (v1, v2), merged into main
# other: 1 commit adding o/file.{1,2}.txt (not merged)
# 54 total blobs reachable from --all
test_expect_success 'setup branched repo for revision tests' '
git clone src src-revs &&

# Side branch from tip of main with unique files
git -C src-revs checkout -b side HEAD &&
mkdir -p src-revs/s &&
echo "Side version 1 of file 1" >src-revs/s/file.1.txt &&
echo "Side version 1 of file 2" >src-revs/s/file.2.txt &&
test_tick &&
git -C src-revs add . &&
git -C src-revs commit -m "Side commit 1" &&

echo "Side version 2 of file 1" >src-revs/s/file.1.txt &&
echo "Side version 2 of file 2" >src-revs/s/file.2.txt &&
test_tick &&
git -C src-revs add . &&
git -C src-revs commit -m "Side commit 2" &&

# Merge side into main
git -C src-revs checkout main &&
test_tick &&
git -C src-revs merge side --no-ff -m "Merge side branch" &&

# Other branch (not merged) for --all testing
git -C src-revs checkout -b other main~1 &&
mkdir -p src-revs/o &&
echo "Other content 1" >src-revs/o/file.1.txt &&
echo "Other content 2" >src-revs/o/file.2.txt &&
test_tick &&
git -C src-revs add . &&
git -C src-revs commit -m "Other commit" &&

git -C src-revs checkout main &&

git clone --bare "file://$(pwd)/src-revs" srv-revs.bare &&
git -C srv-revs.bare config --local uploadpack.allowfilter 1 &&
git -C srv-revs.bare config --local uploadpack.allowanysha1inwant 1
'

# do basic partial clone from "srv.bare"
test_expect_success 'do partial clone 1, backfill gets all objects' '
git clone --no-checkout --filter=blob:none \
Expand Down Expand Up @@ -176,6 +224,162 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
test_line_count = 12 missing
'

test_expect_success 'backfill with revision range' '
test_when_finished rm -rf backfill-revs &&
git clone --no-checkout --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill-revs &&

# No blobs yet
git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

git -C backfill-revs backfill HEAD~2..HEAD &&

# 30 objects downloaded.
git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 18 missing
'

test_expect_success 'backfill with revisions over stdin' '
test_when_finished rm -rf backfill-revs &&
git clone --no-checkout --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill-revs &&

# No blobs yet
git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

cat >in <<-EOF &&
HEAD
^HEAD~2
EOF

git -C backfill-revs backfill --stdin <in &&

# 30 objects downloaded.
git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 18 missing
'

test_expect_success 'backfill with prefix pathspec' '
test_when_finished rm -rf backfill-path &&
git clone --bare --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill-path &&

# No blobs yet
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

# TODO: The pathspec should limit the downloaded blobs to
# only those matching the prefix "d/f", but currently all
# blobs are downloaded.
git -C backfill-path backfill HEAD -- d/f &&

git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 0 missing
'

test_expect_success 'backfill with multiple pathspecs' '
test_when_finished rm -rf backfill-path &&
git clone --bare --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill-path &&

# No blobs yet
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

# TODO: The pathspecs should limit the downloaded blobs to
# only those matching "d/f" or "a", but currently all blobs
# are downloaded.
git -C backfill-path backfill HEAD -- d/f a &&

git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 0 missing
'

test_expect_success 'backfill with wildcard pathspec' '
test_when_finished rm -rf backfill-path &&
git clone --bare --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv.bare" backfill-path &&

# No blobs yet
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&

# TODO: The wildcard pathspec should limit downloaded blobs,
# but currently all blobs are downloaded.
git -C backfill-path backfill HEAD -- "d/file.*.txt" &&

git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 0 missing
'

test_expect_success 'backfill with --all' '
test_when_finished rm -rf backfill-all &&
git clone --no-checkout --filter=blob:none \
"file://$(pwd)/srv-revs.bare" backfill-all &&

# All blobs from all refs are missing
git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
test_line_count = 54 missing &&

# Backfill from HEAD gets main blobs only
git -C backfill-all backfill HEAD &&

# Other branch blobs still missing
git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
test_line_count = 2 missing &&

# Backfill with --all gets everything
git -C backfill-all backfill --all &&

git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
test_line_count = 0 missing
'

test_expect_success 'backfill with --first-parent' '
test_when_finished rm -rf backfill-fp &&
git clone --no-checkout --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv-revs.bare" backfill-fp &&

git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 52 missing &&

# --first-parent skips the side branch commits, so
# s/file.{1,2}.txt v1 blobs (only in side commit 1) are missed.
git -C backfill-fp backfill --first-parent HEAD &&

git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 2 missing
'

test_expect_success 'backfill with --since' '
test_when_finished rm -rf backfill-since &&
git clone --no-checkout --filter=blob:none \
--single-branch --branch=main \
"file://$(pwd)/srv-revs.bare" backfill-since &&

git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 52 missing &&

# Use a cutoff between commits 4 and 5 (between v1 and v2
# iterations). Commits 5-8 still carry v1 of files 2-4 in
# their trees, but v1 of file.1.txt is only in commits 1-4.
SINCE=$(git -C backfill-since log --first-parent --reverse \
--format=%ct HEAD~1 | sed -n 5p) &&
git -C backfill-since backfill --since="@$((SINCE - 1))" HEAD &&

# 6 missing: v1 of file.1.txt in all 6 directories
git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 6 missing
'

. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd

Expand Down