Skip to content

canonicalizeRemote keeps the .git suffix when the remote URL has a trailing slash, splitting the dedup/source-id key #1895

@jbetala7

Description

@jbetala7

Observed problem

canonicalizeRemote() in lib/gstack-memory-helpers.ts is documented to produce a canonical host/org/repo key with no scheme and no trailing .git. It is used as the cross-machine dedup key and to derive gbrain code source-ids (bin/gstack-gbrain-sync.ts deriveCodeSourceId/deriveLegacyCodeSourceId, bin/gstack-memory-ingest.ts).

When the configured origin URL ends with a trailing slash (e.g. git remote add origin https://github.com/garrytan/gstack.git/ — valid, clones fine), the .git suffix is not stripped:

canonicalizeRemote('https://github.com/garrytan/gstack.git/')
  => 'github.com/garrytan/gstack.git'   // expected: 'github.com/garrytan/gstack'

Current behavior on upstream main (cab774c)

The two normalization steps run in the wrong order:

// strip trailing .git
s = s.replace(/\.git$/i, "");   // anchored on $ — fails when a slash follows
// strip trailing slash
s = s.replace(/\/+$/, "");

With a trailing slash present, the \.git$ anchor does not match (the string ends in /, not .git), so .git survives; the trailing slash is then removed, leaving ...gstack.git.

Downstream, deriveCodeSourceId joins the last two path segments, so the same repo yields different source-ids on different machines depending on whether origin has a trailing slash (gstack-code-garrytan-gstack-<hash> vs gstack-code-garrytan-gstack-git-<hash> after id sanitization). Federated cross-source search then splits/duplicates hits for one repo — exactly what the canonical key exists to prevent.

Expected behavior

A remote written with a trailing slash must canonicalize to the same key as one without:

canonicalizeRemote('https://github.com/garrytan/gstack.git/')
  === canonicalizeRemote('https://github.com/garrytan/gstack.git')
  === 'github.com/garrytan/gstack'

Duplicate searches performed

Candidate fix shape

Strip trailing slash(es) before stripping the .git suffix (one-line reorder), plus regression tests for the .git/ combination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions