Skip to content

Fixes #1063: Use github purl, repo and version for a github release archive in SBOM#1104

Merged
spoorcc merged 2 commits intomainfrom
feature/archive-github-purl
Apr 3, 2026
Merged

Fixes #1063: Use github purl, repo and version for a github release archive in SBOM#1104
spoorcc merged 2 commits intomainfrom
feature/archive-github-purl

Conversation

@ben-edna
Copy link
Copy Markdown
Contributor

@ben-edna ben-edna commented Apr 3, 2026

Fixes #1063

Summary by CodeRabbit

  • New Features
    • SBOM generation recognizes GitHub release archive downloads and emits GitHub-style package references with extracted versions.
  • Improvements
    • SBOM metadata now prefers the version parsed from the package identifier when available, improving version accuracy.
  • Tests
    • Added a BDD scenario validating SBOM output for GitHub-hosted archive dependencies.
  • Documentation
    • CHANGELOG updated with an unreleased "Release 0.14.0" entry.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 3, 2026

Walkthrough

Detect GitHub Releases asset URLs when mapping archive download URLs to PURLs and use the PURL-derived version when adding projects to SBOM components; add a BDD scenario and changelog entry for Release 0.14.0 documenting the GitHub purl metadata behavior.

Changes

Cohort / File(s) Summary
GitHub Archive URL Parsing
dfetch/vcs/archive.py
Added detection for GitHub Releases asset download URLs (/releases/download/<version>/...) with regex, returning PackageURL(type="github") using lowercased org/repo and coerced version; falls back to previous generic archive PURL handling otherwise.
SBOM Component Version Resolution
dfetch/reporting/sbom_reporter.py
Changed add_project() to prefer purl.version when present (version = purl.version or version) so component.version, bom_ref, and concluded evidence reflect PURL-extracted version.
Tests and Changelog
features/report-sbom-archive.feature, CHANGELOG.rst
Added a BDD scenario asserting GitHub PURL and version extraction in SBOM output; added unreleased "Release 0.14.0" entry describing GitHub purl metadata change.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Manifest as Manifest / Input
participant ArchiveParser as dfetch.vcs.archive
participant PURL as PackageURL
participant SBOM as SbomReporter
Manifest->>ArchiveParser: provide archive download_url
ArchiveParser->>ArchiveParser: match GitHub release pattern
alt GitHub release match
ArchiveParser->>PURL: build pkg:github//@
ArchiveParser-->>Manifest: return github PURL
else no match
ArchiveParser->>PURL: build pkg:generic with qualifiers(download_url)
ArchiveParser-->>Manifest: return generic PURL
end
Manifest->>SBOM: add_project(name, purl, version)
SBOM->>SBOM: derive version = purl.version or provided version
SBOM-->>Manifest: include component with chosen version and bom_ref

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

development

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR changes do not implement the stated objectives from issue #1063, which focus on improving the freeze command with single-project freezing and in-place manifest editing. The PR should implement freeze command improvements per #1063, including single project freezes and in-place manifest editing for git/SVN superprojects.
Out of Scope Changes check ⚠️ Warning All changes are focused on SBOM generation with GitHub purl support, which is unrelated to the freeze command improvements specified in the linked issue #1063. The changes address SBOM/purl generation rather than freeze command improvements; clarify the correct linked issue or ensure changes match the stated objectives.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly addresses the issue #1063 and accurately summarizes the main change: using GitHub purl, repo, and version for GitHub release archives in SBOM generation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/archive-github-purl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 88-100: The code is incorrectly using str(current_version) or
prefix which yields "None" when current_version is None; update the logic in the
block that builds the PackageURL (the match handling that calls coerce(...)) to
use current_version only when it is not None (e.g., version =
str(current_version) if current_version is not None else prefix) so the
PackageURL(version=...) falls back to prefix instead of the literal "None";
change the version expression passed to PackageURL in the matched branch
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5d15ac1e-1b1c-4cd9-aaf1-3129f5577784

📥 Commits

Reviewing files that changed from the base of the PR and between fa0c753 and 3806ee0.

📒 Files selected for processing (4)
  • CHANGELOG.rst
  • dfetch/reporting/sbom_reporter.py
  • dfetch/vcs/archive.py
  • features/report-sbom-archive.feature

Comment thread dfetch/vcs/archive.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 88-103: The GitHub detection currently runs re.search against the
whole download_url which can match GitHub-like strings in query/fragments; first
parse the URL (use parsed = urllib.parse.urlparse(download_url)) and then only
run the GitHub regex against parsed.netloc and parsed.path (or check
parsed.hostname == "github.com" and match parsed.path against
r"/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/"); keep
the same logic that calls coerce(match["version"]) and returns the PackageURL
with namespace, name, and version, but derive match from the parsed.path (or
combined host+path) to avoid false positives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dcb875f4-7c4c-4b2f-a8e4-890f97d57231

📥 Commits

Reviewing files that changed from the base of the PR and between 3806ee0 and 62bb933.

📒 Files selected for processing (1)
  • dfetch/vcs/archive.py

Comment thread dfetch/vcs/archive.py
Comment on lines +88 to 103
if match := re.search(
r"https://github\.com/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/",
download_url,
):
prefix, current_version, _ = coerce(
match["version"],
)
return PackageURL(
type="github",
namespace=match["org"].lower(),
name=match["repo"].lower(),
version=str(current_version) if current_version else prefix,
)

parsed = urllib.parse.urlparse(download_url)
basename = os.path.basename(parsed.path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Anchor GitHub detection to parsed host/path to avoid false positives.

Using re.search on the full URL can incorrectly match GitHub patterns inside query parameters/fragments of non-GitHub URLs, resulting in wrong SBOM package identity.

Proposed fix
-    if match := re.search(
-        r"https://github\.com/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/",
-        download_url,
-    ):
+    parsed = urllib.parse.urlparse(download_url)
+    if (
+        (parsed.hostname or "").lower() == "github.com"
+        and (
+            match := re.match(
+                r"^/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/",
+                parsed.path,
+            )
+        )
+    ):
         prefix, current_version, _ = coerce(
             match["version"],
         )
         return PackageURL(
             type="github",
             namespace=match["org"].lower(),
             name=match["repo"].lower(),
             version=str(current_version) if current_version else prefix,
         )
 
-    parsed = urllib.parse.urlparse(download_url)
     basename = os.path.basename(parsed.path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dfetch/vcs/archive.py` around lines 88 - 103, The GitHub detection currently
runs re.search against the whole download_url which can match GitHub-like
strings in query/fragments; first parse the URL (use parsed =
urllib.parse.urlparse(download_url)) and then only run the GitHub regex against
parsed.netloc and parsed.path (or check parsed.hostname == "github.com" and
match parsed.path against
r"/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/"); keep
the same logic that calls coerce(match["version"]) and returns the PackageURL
with namespace, name, and version, but derive match from the parsed.path (or
combined host+path) to avoid false positives.

@spoorcc spoorcc merged commit 2bd62cc into main Apr 3, 2026
41 checks passed
@spoorcc spoorcc deleted the feature/archive-github-purl branch April 3, 2026 17:29
@spoorcc spoorcc linked an issue Apr 3, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TF-PSA-Crypto sbom should be improved improve freeze command

2 participants