Fixes #1063: Use github purl, repo and version for a github release archive in SBOM#1104
Fixes #1063: Use github purl, repo and version for a github release archive in SBOM#1104
Conversation
WalkthroughDetect GitHub Releases asset URLs when mapping archive download URLs to PURLs and use the PURL-derived version when adding projects to SBOM components; add a BDD scenario and changelog entry for Release 0.14.0 documenting the GitHub purl metadata behavior. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 88-100: The code is incorrectly using str(current_version) or
prefix which yields "None" when current_version is None; update the logic in the
block that builds the PackageURL (the match handling that calls coerce(...)) to
use current_version only when it is not None (e.g., version =
str(current_version) if current_version is not None else prefix) so the
PackageURL(version=...) falls back to prefix instead of the literal "None";
change the version expression passed to PackageURL in the matched branch
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5d15ac1e-1b1c-4cd9-aaf1-3129f5577784
📒 Files selected for processing (4)
CHANGELOG.rstdfetch/reporting/sbom_reporter.pydfetch/vcs/archive.pyfeatures/report-sbom-archive.feature
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dfetch/vcs/archive.py`:
- Around line 88-103: The GitHub detection currently runs re.search against the
whole download_url which can match GitHub-like strings in query/fragments; first
parse the URL (use parsed = urllib.parse.urlparse(download_url)) and then only
run the GitHub regex against parsed.netloc and parsed.path (or check
parsed.hostname == "github.com" and match parsed.path against
r"/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/"); keep
the same logic that calls coerce(match["version"]) and returns the PackageURL
with namespace, name, and version, but derive match from the parsed.path (or
combined host+path) to avoid false positives.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dcb875f4-7c4c-4b2f-a8e4-890f97d57231
📒 Files selected for processing (1)
dfetch/vcs/archive.py
| if match := re.search( | ||
| r"https://github\.com/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/", | ||
| download_url, | ||
| ): | ||
| prefix, current_version, _ = coerce( | ||
| match["version"], | ||
| ) | ||
| return PackageURL( | ||
| type="github", | ||
| namespace=match["org"].lower(), | ||
| name=match["repo"].lower(), | ||
| version=str(current_version) if current_version else prefix, | ||
| ) | ||
|
|
||
| parsed = urllib.parse.urlparse(download_url) | ||
| basename = os.path.basename(parsed.path) |
There was a problem hiding this comment.
Anchor GitHub detection to parsed host/path to avoid false positives.
Using re.search on the full URL can incorrectly match GitHub patterns inside query parameters/fragments of non-GitHub URLs, resulting in wrong SBOM package identity.
Proposed fix
- if match := re.search(
- r"https://github\.com/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/",
- download_url,
- ):
+ parsed = urllib.parse.urlparse(download_url)
+ if (
+ (parsed.hostname or "").lower() == "github.com"
+ and (
+ match := re.match(
+ r"^/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/",
+ parsed.path,
+ )
+ )
+ ):
prefix, current_version, _ = coerce(
match["version"],
)
return PackageURL(
type="github",
namespace=match["org"].lower(),
name=match["repo"].lower(),
version=str(current_version) if current_version else prefix,
)
- parsed = urllib.parse.urlparse(download_url)
basename = os.path.basename(parsed.path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dfetch/vcs/archive.py` around lines 88 - 103, The GitHub detection currently
runs re.search against the whole download_url which can match GitHub-like
strings in query/fragments; first parse the URL (use parsed =
urllib.parse.urlparse(download_url)) and then only run the GitHub regex against
parsed.netloc and parsed.path (or check parsed.hostname == "github.com" and
match parsed.path against
r"/(?P<org>[^/]+)/(?P<repo>[^/]+)/releases/download/(?P<version>[^/]+)/"); keep
the same logic that calls coerce(match["version"]) and returns the PackageURL
with namespace, name, and version, but derive match from the parsed.path (or
combined host+path) to avoid false positives.
Fixes #1063
Summary by CodeRabbit