update SQL query in GetByAssetVersionPaged#1636
Conversation
Signed-off-by: rafi <refaei.shikho@hotmail.com>
There was a problem hiding this comment.
Pull request overview
Updates the SQL/GORM query used by GetByAssetVersionPaged to derive the paged/ordered list of component PURLs (packages) for an asset version.
Changes:
- Switch
package_nameselection/grouping fromcomponents.idtodependency_vulns.component_purl. - Change the join between
componentsanddependency_vulnsfromINNER JOINtoRIGHT JOINto include dependency vulns even when a matching component row is missing. - Enable GORM query debugging on the
packageNameQueryscan.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Select("SUM(dependency_vulns.raw_risk_assessment) as total_risk, AVG(dependency_vulns.raw_risk_assessment) as avg_risk, MAX(dependency_vulns.raw_risk_assessment) as max_risk, MAX(\"CVE\".cvss) as max_cvss, COUNT(dependency_vulns.id) as dependency_vuln_count, dependency_vulns.component_purl as package_name"). | ||
| Joins("RIGHT JOIN dependency_vulns ON components.id = dependency_vulns.component_purl AND dependency_vulns.asset_id = ? AND dependency_vulns.asset_version_name = ?", assetID, assetVersionName). | ||
| Joins("LEFT JOIN artifact_dependency_vulns ON artifact_dependency_vulns.dependency_vuln_id = dependency_vulns.id"). | ||
| Joins("INNER JOIN cves \"CVE\" ON dependency_vulns.cve_id = \"CVE\".cve"). |
There was a problem hiding this comment.
The aggregates in this grouped query can be inflated because of the LEFT JOIN artifact_dependency_vulns (a dependency vuln can have multiple artifacts, duplicating rows). This affects SUM, AVG, and COUNT(dependency_vulns.id) and can change sort order when clients sort by total_risk, avg_risk, or dependency_vuln_count. Consider aggregating over a de-duplicated set of dependency_vulns (e.g., via a DISTINCT subquery keyed by dependency_vulns.id) and/or using COUNT(DISTINCT dependency_vulns.id) for the count.
| if err := packageNameQuery.Debug().Scan(&res).Error; err != nil { | ||
| return shared.Paged[models.DependencyVuln]{}, map[string]int{}, err |
There was a problem hiding this comment.
Debug() should not be left enabled in repository code; it can significantly increase log volume and may expose SQL/parameters if logging is enabled in some environments. Please remove Debug() or gate it behind an explicit, environment-driven debug flag used consistently across the codebase.
| packageNameQuery := repository.GetDB(tx).Table("components"). | ||
| Select("SUM(dependency_vulns.raw_risk_assessment) as total_risk, AVG(dependency_vulns.raw_risk_assessment) as avg_risk, MAX(dependency_vulns.raw_risk_assessment) as max_risk, MAX(\"CVE\".cvss) as max_cvss, COUNT(dependency_vulns.id) as dependency_vuln_count, components.id as package_name"). | ||
| Joins("INNER JOIN dependency_vulns ON components.id = dependency_vulns.component_purl AND dependency_vulns.asset_id = ? AND dependency_vulns.asset_version_name = ?", assetID, assetVersionName). | ||
| Select("SUM(dependency_vulns.raw_risk_assessment) as total_risk, AVG(dependency_vulns.raw_risk_assessment) as avg_risk, MAX(dependency_vulns.raw_risk_assessment) as max_risk, MAX(\"CVE\".cvss) as max_cvss, COUNT(dependency_vulns.id) as dependency_vuln_count, dependency_vulns.component_purl as package_name"). | ||
| Joins("RIGHT JOIN dependency_vulns ON components.id = dependency_vulns.component_purl AND dependency_vulns.asset_id = ? AND dependency_vulns.asset_version_name = ?", assetID, assetVersionName). | ||
| Joins("LEFT JOIN artifact_dependency_vulns ON artifact_dependency_vulns.dependency_vuln_id = dependency_vulns.id"). |
There was a problem hiding this comment.
packageNameQuery uses RIGHT JOIN while starting from Table("components"). This is harder to reason about and can lead to less optimal plans; the query is fundamentally driven by dependency_vulns (all WHERE predicates are on that table). Consider rewriting it to start from dependency_vulns and LEFT JOIN components instead, which preserves the "include vulns even if component is missing" behavior without relying on a RIGHT JOIN.
Signed-off-by: rafi <refaei.shikho@hotmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if comp.PackageURL != "" { | ||
| // Unescape URL-encoded characters (e.g., %2B -> +) to match the format stored in the database | ||
| unescapedPurl, err := url.PathUnescape(comp.PackageURL) | ||
| if err == nil { | ||
| comp.PackageURL = unescapedPurl | ||
| } | ||
| } |
There was a problem hiding this comment.
Unescaping comp.PackageURL while keeping comp.BOMRef unchanged can create multiple component nodes (distinct BOMRefs) that collapse to the same Component.PackageURL after unescaping (e.g., one added as ...%2B... and another as ...+...). Downstream code uses Component.PackageURL as an identifier/key (e.g., ToMinimalTree builds dependencies keyed by parentNode.Component.PackageURL), so collisions will overwrite edges and produce incorrect/minimally-lossy output. Consider canonicalizing the node ID (BOMRef) to the same normalized PURL (and updating edges/dep refs accordingly), or keep the raw PackageURL and store a separate normalized/canonical PURL field used only for DB matching/export.
| if comp.PackageURL != "" { | |
| // Unescape URL-encoded characters (e.g., %2B -> +) to match the format stored in the database | |
| unescapedPurl, err := url.PathUnescape(comp.PackageURL) | |
| if err == nil { | |
| comp.PackageURL = unescapedPurl | |
| } | |
| } |
| node := g.nodes[encodedPurl] | ||
| assert.NotNil(t, node) | ||
| assert.Equal(t, expectedPurl, node.Component.PackageURL) |
There was a problem hiding this comment.
These tests reach into the unexported g.nodes map. This makes the tests brittle to internal refactors and also bypasses graph scoping/reachability rules (e.g., g.Node(id) would return nil unless the component is linked). Prefer asserting via public behavior (e.g., link the component into the graph with AddEdge and then assert via Components()/Node()), or iterate g.Components() and match on BOMRef as done later in this test file.
| if comp.PackageURL != "" { | ||
| // Unescape URL-encoded characters (e.g., %2B -> +) to match the format stored in the database | ||
| unescapedPurl, err := url.PathUnescape(comp.PackageURL) | ||
| if err == nil { | ||
| comp.PackageURL = unescapedPurl |
There was a problem hiding this comment.
PR title mentions "update SQL query in GetByAssetVersionPaged", but the changes here are focused on normalizing/unescaping component PURLs in SBOMGraph.AddComponent plus adding tests. Either the PR title/description should be updated to reflect the actual change, or the intended SQL/query changes are missing from this PR.
…s interface declaration Signed-off-by: rafi <refaei.shikho@hotmail.com>
…move unused code Signed-off-by: rafi <refaei.shikho@hotmail.com>
…r improved handling of package URLs Signed-off-by: rafi <refaei.shikho@hotmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
integrations/commonint/integration_helper.go:417
FindAllComponentOnlyPathsToPURLreturns component-only paths (no ROOT/artifact/info-source nodes), but the code below/comment expects to render a Mermaid graph that includes those structural nodes. With the current call, the rendered graph will omit root/artifact/info-source nodes.
Consider using a full-path function for visualization (or add an option/method that returns full paths including structural nodes).
paths := bom.FindAllComponentOnlyPathsToPURL(pURL, 0)
// we want to show fake nodes in the mermaid graph (root, artifact, info sources)
pathWithFakeNodes := make([][]string, 0, len(paths))
for _, path := range paths {
pathWithFakeNodes = append(pathWithFakeNodes, path.ToStringSlice())
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dependencyVulnRouter := assetVersionGroup.Group.Group("/dependency-vulns") | ||
| dependencyVulnRouter.GET("/", dependencyVulnController.ListPaged) | ||
| dependencyVulnRouter.GET("/sync/", dependencyVulnController.ListByAssetIDWithoutHandledExternalEventsPaged) | ||
| dependencyVulnRouter.GET("/:dependencyVulnID/", dependencyVulnController.Read) |
There was a problem hiding this comment.
This change removes the GET /dependency-vulns/sync/ route entirely. If this endpoint is consumed by the UI, integrations, or external clients, this is a breaking API change.
If removal is intended, consider leaving a deprecated alias/redirect for at least one release cycle (or ensure all callers are migrated) and update any API/docs references accordingly.
|
|
||
| if componentPurl.String() != "" { | ||
| pathPattern = dtos.PathPattern{componentPurl.String(), dtos.PathPatternWildcard, purl.ToString()} | ||
| pathPattern = dtos.PathPattern{componentPurl.String(), dtos.PathPatternWildcard, purlString} |
There was a problem hiding this comment.
purlString is normalized/unescaped, but the metadata/root componentPurl is still inserted into the PathPattern via componentPurl.String(). If the root PURL contains URL-escaped characters (e.g. %2B), this can prevent PathPattern.MatchesSuffix from matching DependencyVuln.VulnerabilityPath (which now appears to use unescaped PURLs).
Normalize componentPurl the same way (or otherwise ensure all path elements use one canonical string form).
| pathPattern = dtos.PathPattern{componentPurl.String(), dtos.PathPatternWildcard, purlString} | |
| componentPurlString, err := normalize.PURLToString(componentPurl) | |
| if err != nil { | |
| slog.Info("failed to unescape component purl for path pattern, continuing anyway", "componentPurl", componentPurl.String(), "error", err) | |
| componentPurlString = componentPurl.String() | |
| } | |
| pathPattern = dtos.PathPattern{componentPurlString, dtos.PathPatternWildcard, purlString} |
|
|
||
| func TestFindAllPathsToPURL(t *testing.T) { | ||
| func TestFindAllComponentOnlyPathsToPURL(t *testing.T) { | ||
| t.Run("multiple information sources pointing to same component - should return multiple paths", func(t *testing.T) { |
There was a problem hiding this comment.
The t.Run description says "should return multiple paths", but the test now asserts a single deduplicated path (Len(paths, 1)). Please rename the test case (and the assertion message at line 628) to match the intended behavior so failures are not misleading.
| assert.Equal(t, paths[0].ToStringSliceComponentOnly(), paths[1].ToStringSliceComponentOnly()) | ||
| assert.Equal(t, []string{"pkg:npm/lodash@4.17.21"}, paths[0].ToStringSliceComponentOnly()) | ||
| // Should return a single path (reachable through different info sources) | ||
| assert.Len(t, paths, 1, "Should return separate paths for each info source") |
There was a problem hiding this comment.
Assertion message is inconsistent with the expectation: it asserts Len(paths, 1) but the message says "separate paths for each info source". Please update the message to reflect that deduplication is expected.
| // Should return two paths (one through each artifact) | ||
| assert.Len(t, paths, 2, "Should return separate paths through different artifacts") | ||
|
|
||
| // Both paths should have the same component-only representation | ||
| assert.Equal(t, paths[0].ToStringSliceComponentOnly(), paths[1].ToStringSliceComponentOnly()) | ||
| assert.Equal(t, []string{"pkg:npm/express@4.18.0"}, paths[0].ToStringSliceComponentOnly()) | ||
| assert.Len(t, paths, 1, "Should return separate paths through different artifacts") |
There was a problem hiding this comment.
The comment and assertion message still refer to multiple/separate paths, but the test expects a single deduplicated path (Len(paths, 1)). Please update the comment/message (and optionally the t.Run name) to match the new behavior.
No description provided.