Observed problem
canonicalizeRemote() in lib/gstack-memory-helpers.ts is documented to produce a canonical host/org/repo key with no scheme and no trailing .git. It is used as the cross-machine dedup key and to derive gbrain code source-ids (bin/gstack-gbrain-sync.ts deriveCodeSourceId/deriveLegacyCodeSourceId, bin/gstack-memory-ingest.ts).
When the configured origin URL ends with a trailing slash (e.g. git remote add origin https://github.com/garrytan/gstack.git/ — valid, clones fine), the .git suffix is not stripped:
canonicalizeRemote('https://github.com/garrytan/gstack.git/')
=> 'github.com/garrytan/gstack.git' // expected: 'github.com/garrytan/gstack'
Current behavior on upstream main (cab774c)
The two normalization steps run in the wrong order:
// strip trailing .git
s = s.replace(/\.git$/i, ""); // anchored on $ — fails when a slash follows
// strip trailing slash
s = s.replace(/\/+$/, "");
With a trailing slash present, the \.git$ anchor does not match (the string ends in /, not .git), so .git survives; the trailing slash is then removed, leaving ...gstack.git.
Downstream, deriveCodeSourceId joins the last two path segments, so the same repo yields different source-ids on different machines depending on whether origin has a trailing slash (gstack-code-garrytan-gstack-<hash> vs gstack-code-garrytan-gstack-git-<hash> after id sanitization). Federated cross-source search then splits/duplicates hits for one repo — exactly what the canonical key exists to prevent.
Expected behavior
A remote written with a trailing slash must canonicalize to the same key as one without:
canonicalizeRemote('https://github.com/garrytan/gstack.git/')
=== canonicalizeRemote('https://github.com/garrytan/gstack.git')
=== 'github.com/garrytan/gstack'
Duplicate searches performed
Candidate fix shape
Strip trailing slash(es) before stripping the .git suffix (one-line reorder), plus regression tests for the .git/ combination.
Observed problem
canonicalizeRemote()inlib/gstack-memory-helpers.tsis documented to produce a canonicalhost/org/repokey with no scheme and no trailing.git. It is used as the cross-machine dedup key and to derive gbrain code source-ids (bin/gstack-gbrain-sync.tsderiveCodeSourceId/deriveLegacyCodeSourceId,bin/gstack-memory-ingest.ts).When the configured
originURL ends with a trailing slash (e.g.git remote add origin https://github.com/garrytan/gstack.git/— valid, clones fine), the.gitsuffix is not stripped:Current behavior on upstream main (cab774c)
The two normalization steps run in the wrong order:
With a trailing slash present, the
\.git$anchor does not match (the string ends in/, not.git), so.gitsurvives; the trailing slash is then removed, leaving...gstack.git.Downstream,
deriveCodeSourceIdjoins the last two path segments, so the same repo yields different source-ids on different machines depending on whetheroriginhas a trailing slash (gstack-code-garrytan-gstack-<hash>vsgstack-code-garrytan-gstack-git-<hash>after id sanitization). Federated cross-source search then splits/duplicates hits for one repo — exactly what the canonical key exists to prevent.Expected behavior
A remote written with a trailing slash must canonicalize to the same key as one without:
Duplicate searches performed
gh search prs --state open 'canonicalizeRemote'-> no open PRs (only merged v1.26.0.0 feat: V1 transcript ingest + per-skill gbrain manifests + retrieval surface #1298 which introduced it; closed fix(sync-gbrain): generate gbrain-valid source ids for repos with dots or long names #1330 unrelated source-id work)memory-helpers,canonical remote,dedup,remote url-> no matching open item for this trailing-slash defecttest/gstack-memory-helpers.test.tscovers trailing-slash alone and.gitalone, but not the combinationCandidate fix shape
Strip trailing slash(es) before stripping the
.gitsuffix (one-line reorder), plus regression tests for the.git/combination.