Skip to content

Commit beb1c92

Browse files
path-walk: support wildcard pathspecs for blob filtering
Previously, walk_objects_by_path() silently ignored pathspecs containing wildcards or magic by clearing them. This caused all blobs to be downloaded regardless of the given pathspec. Wildcard pathspecs like "d/file.*.txt" are useful for narrowing which blobs to process (e.g., during 'git backfill'). Support wildcard pathspecs by making three changes: 1. Add an 'exact_pathspecs' flag to path_walk_context. When the pathspec has no wildcards or magic, set this flag and use the existing fast-path prefix matching in add_tree_entries(). When wildcards are present, skip that block since prefix matching cannot handle glob patterns. 2. Disable revision-level commit pruning (revs->prune = 0) for wildcard pathspecs. The revision walk uses the pathspec to filter commits via TREESAME detection. For exact prefix pathspecs this works well, but wildcard pathspecs may fail to match through TREESAME because fnmatch with WM_PATHNAME does not cross directory boundaries. Disabling pruning ensures all commits are visited and their trees are available for the path-walk to filter. 3. Add a match_pathspec() check in walk_path() to filter out blobs whose full path does not match the pathspec. This provides the actual blob-level filtering for wildcard pathspecs. Signed-off-by: Derrick Stolee <stolee@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 977f62f commit beb1c92

2 files changed

Lines changed: 17 additions & 12 deletions

File tree

path-walk.c

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ struct path_walk_context {
6262
*/
6363
struct prio_queue path_stack;
6464
struct strset path_stack_pushed;
65+
66+
unsigned exact_pathspecs:1;
6567
};
6668

6769
static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -206,7 +208,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
206208
match != MATCHED)
207209
continue;
208210
}
209-
if (ctx->revs->prune_data.nr) {
211+
if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
210212
struct pathspec *pd = &ctx->revs->prune_data;
211213
bool found = false;
212214

@@ -317,6 +319,13 @@ static int walk_path(struct path_walk_context *ctx,
317319
return 0;
318320
}
319321

322+
if (list->type == OBJ_BLOB &&
323+
ctx->revs->prune_data.nr &&
324+
!match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
325+
path, strlen(path), 0,
326+
NULL, 0))
327+
return 0;
328+
320329
/* Evaluate function pointer on this data, if requested. */
321330
if ((list->type == OBJ_TREE && ctx->info->trees) ||
322331
(list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -525,15 +534,12 @@ int walk_objects_by_path(struct path_walk_info *info)
525534
info->revs->tag_objects = 1;
526535

527536
if (ctx.revs->prune_data.nr) {
528-
/*
529-
* Only exact prefix pathspecs are currently supported.
530-
* Clear any wildcard or magic pathspecs to avoid
531-
* incorrect prefix matching.
532-
*/
533537
struct pathspec *pd = &ctx.revs->prune_data;
534538

535-
if (pd->has_wildcard || pd->magic)
536-
pd->nr = 0;
539+
if (!pd->has_wildcard && !pd->magic)
540+
ctx.exact_pathspecs = 1;
541+
else
542+
ctx.revs->prune = 0;
537543
}
538544

539545
/* Insert a single list for the root tree into the paths. */

t/t5620-backfill.sh

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -307,12 +307,11 @@ test_expect_success 'backfill with wildcard pathspec' '
307307
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
308308
test_line_count = 48 missing &&
309309
310-
# TODO: The wildcard pathspec should limit downloaded blobs,
311-
# but currently all blobs are downloaded.
312-
git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
310+
git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
311+
test_must_be_empty err &&
313312
314313
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
315-
test_line_count = 0 missing
314+
test_line_count = 40 missing
316315
'
317316

318317
test_expect_success 'backfill with --all' '

0 commit comments

Comments
 (0)