Skip to content

Match SHAs embedded in URLs and markdown bodies #12

@amenocal

Description

@amenocal

pkg/commitremap.replaceSHA only replaces JSON string values that match a
commit-map key exactly (whole-string match). Real archive JSON contains
SHAs embedded in larger strings:

  • API URLs: "url": "https://api.github.com/repos/o/r/commits/<sha>"
  • HTML URLs in _links.*.href
  • Markdown bodies referencing commits by SHA in PR/issue text
  • Diff URLs, compare URLs

Embedded SHAs are NOT rewritten today, so PR/issue bodies and link fields
will reference the pre-rewrite history after migration.

Approach:

  1. For known URL fields, parse the URL and substitute the SHA path segment.
  2. For free-text fields (PR body, issue body, comment body), regex-match
    40-char hex sequences against the commit map.
  3. Add a flag / option to disable embedded matching for callers who want
    the safer whole-string-only behavior.

Documented as a known limitation in CHANGELOG/README of v0.1.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions