LCORE-2631: Vulnerability report - stat processing#1973
Conversation
Walkthrough
ChangesDependabot Alert Statistics Pipeline
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/vulnerability_report.py`:
- Around line 232-245: The fill_in_days_statistic function does not handle the
case when the days list is empty, which causes a ZeroDivisionError on line 243
when calculating the average (sum(days) / len(days)) and a StatisticsError on
line 244 when calculating the median. Add a guard condition to check if the days
list is empty before performing these calculations, and return an appropriate
value for the empty case (such as setting avg and median to None or 0) to
prevent these exceptions when no fixed alerts exist.
- Around line 254-258: The fill_in_cve_created_dates function accesses
item["created_at"] without validating that the field exists or is properly
formatted, which causes KeyError or parsing failures. Add validation in the
load_dependabot_file function to ensure every item in the source data has the
required created_at field with a valid timestamp format before it is passed to
fill_in_cve_created_dates. This upstream validation will prevent errors when the
datetime.strptime call attempts to parse the date string.
- Around line 248-251: The fill_in_vulnerable_packages function performs unsafe
deep dictionary access on item["dependency"]["package"]["name"] without
validating that these nested keys exist, which can cause KeyError exceptions.
Add proper error handling to validate that each level of the dictionary
hierarchy exists before accessing it. You can either use the dict.get() method
with default values in the list comprehension, or add validation logic to check
for the presence of the "dependency", "package", and "name" fields before
attempting to access them. Ensure that items with missing fields are either
skipped gracefully or handled appropriately rather than causing the entire
function to fail.
- Line 261: The `prefix` parameter in the `process_dependabot_file` function is
declared but never used within the function body. Remove the `prefix` parameter
from the function signature of `process_dependabot_file` and update the call
site in the `main` function (and any other locations where
`process_dependabot_file` is invoked) to remove the corresponding prefix
argument being passed.
- Around line 125-140: The validation in this section is checking for the wrong
JSON path structure from the GitHub Dependabot API. In the validation block
where you retrieve the security advisory data using
`item.get("security_advisory")`, you need to replace this with
`item.get("security_vulnerability")` since the API structure places the
`severity` field under `security_vulnerability` not `security_advisory`. This
incorrect path also needs to be corrected throughout the file wherever
`security_advisory` is used to extract severity information, including the
downstream code that extracts and uses this data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 39413dc1-00e4-4edf-95e7-3db234a7e7b5
📒 Files selected for processing (1)
scripts/vulnerability_report.py
📜 Review details
⏰ Context from checks skipped due to timeout. (16)
- GitHub Check: spectral
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: build-pr
- GitHub Check: integration_tests (3.13)
- GitHub Check: integration_tests (3.12)
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (3)
scripts/vulnerability_report.py (3)
34-37: LGTM!
201-216: LGTM!
300-302: LGTM!
| data = json.load(fin) | ||
|
|
||
| # perform sanity check | ||
| if not isinstance(data, list): | ||
| raise ValueError("Dependabot alerts file must contain a JSON array") | ||
| for idx, item in enumerate(data): | ||
| if not isinstance(item, dict): | ||
| raise ValueError(f"Alert at index {idx} must be an object") | ||
| if "state" not in item: | ||
| raise ValueError(f"Alert at index {idx} is missing 'state'") | ||
| sa = item.get("security_advisory") | ||
| if not isinstance(sa, dict) or "severity" not in sa: | ||
| raise ValueError( | ||
| f"Alert at index {idx} is missing 'security_advisory.severity'" | ||
| ) | ||
| return data |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check the structure of Dependabot alert JSON files to verify field names
# Find JSON files matching the pattern organization__repository.json
fd -e json -x bash -c '
file="$1"
echo "=== Checking: $file ==="
# Check if security_advisory exists
jq -e ".[0].security_advisory" "$file" >/dev/null 2>&1 && echo "✓ Has security_advisory"
# Check if security_vulnerability exists
jq -e ".[0].security_vulnerability" "$file" >/dev/null 2>&1 && echo "✓ Has security_vulnerability"
# Show the actual structure of the first alert
echo "First alert structure:"
jq ".[0] | keys" "$file" 2>/dev/null || echo "Failed to parse JSON"
# Show severity field location
echo "Severity field paths found:"
jq ".[0] | paths(type == \"string\" and . == \"severity\") | join(\".\")" "$file" 2>/dev/null || echo "No severity found"
echo ""
' _ {} \;Repository: lightspeed-core/lightspeed-stack
Length of output: 1041
🏁 Script executed:
# First, let's find and examine the vulnerability_report.py file
find . -name "vulnerability_report.py" -type fRepository: lightspeed-core/lightspeed-stack
Length of output: 204
🏁 Script executed:
# Search for references to security_advisory and security_vulnerability in the codebase
rg "security_advisory|security_vulnerability" --type py --type mdRepository: lightspeed-core/lightspeed-stack
Length of output: 170
🏁 Script executed:
# Look for the context file mentioned in the review
find . -path "*/.claude/commands/file-dependabot-cves.md" -type fRepository: lightspeed-core/lightspeed-stack
Length of output: 213
🏁 Script executed:
# Look for test fixtures with Dependabot-like JSON data
fd -e json | xargs rg -l "security_advisory|security_vulnerability" 2>/dev/null || echo "No matches found"Repository: lightspeed-core/lightspeed-stack
Length of output: 187
🏁 Script executed:
# Read the vulnerability_report.py file
cat -n ./scripts/vulnerability_report.py | head -150Repository: lightspeed-core/lightspeed-stack
Length of output: 5615
🏁 Script executed:
# Read the context file
cat -n ./.claude/commands/file-dependabot-cves.mdRepository: lightspeed-core/lightspeed-stack
Length of output: 5885
🏁 Script executed:
# Look for test files related to vulnerability_report
fd "test.*vulnerability|vulnerability.*test" -e pyRepository: lightspeed-core/lightspeed-stack
Length of output: 170
🏁 Script executed:
# Check all uses of security_advisory in the codebase to see what the code expects
rg "security_advisory" ./scripts/vulnerability_report.py -nRepository: lightspeed-core/lightspeed-stack
Length of output: 434
🏁 Script executed:
# Look for any test files or fixtures
find . -name "*test*" -type f | grep -i vulnerabilityRepository: lightspeed-core/lightspeed-stack
Length of output: 170
🏁 Script executed:
# Check the full vulnerability_report.py for how security_advisory is used downstream
sed -n '140,300p' ./scripts/vulnerability_report.py | cat -nRepository: lightspeed-core/lightspeed-stack
Length of output: 6725
Replace security_advisory.severity with security_vulnerability.severity throughout the file.
The validation and all downstream code reference security_advisory.severity, but the actual GitHub Dependabot API structure places severity in security_vulnerability, not security_advisory. Per the documented API format in .claude/commands/file-dependabot-cves.md, the correct extraction is severity: .security_vulnerability.severity. This mismatch causes validation to fail on all real Dependabot JSON exports and subsequent code to crash with KeyError at lines 190 and 228.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/vulnerability_report.py` around lines 125 - 140, The validation in
this section is checking for the wrong JSON path structure from the GitHub
Dependabot API. In the validation block where you retrieve the security advisory
data using `item.get("security_advisory")`, you need to replace this with
`item.get("security_vulnerability")` since the API structure places the
`severity` field under `security_vulnerability` not `security_advisory`. This
incorrect path also needs to be corrected throughout the file wherever
`security_advisory` is used to extract severity information, including the
downstream code that extracts and uses this data.
| def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]: | ||
| """Fill-in statistic about days needed to fix the CVEs.""" | ||
| fixed = filter_by(source_data, "state", "fixed") | ||
| days = [] | ||
| for item in fixed: | ||
| dt1 = parser.isoparse(item["created_at"]) | ||
| dt2 = parser.isoparse(item["fixed_at"]) | ||
| d = (dt2 - dt1).total_seconds() / 86400 | ||
| days.append(d) | ||
| days_stat = {} | ||
| days_stat["days"] = days | ||
| days_stat["avg"] = sum(days) / len(days) | ||
| days_stat["median"] = median(days) | ||
| return days_stat |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
Critical: Division by zero when no alerts are fixed.
When filter_by returns an empty list (no fixed alerts), line 243 performs sum(days) / len(days) which raises ZeroDivisionError, and line 244's median(days) raises StatisticsError for an empty sequence.
🐛 Proposed fix to handle empty fixed list
def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]:
"""Fill-in statistic about days needed to fix the CVEs."""
fixed = filter_by(source_data, "state", "fixed")
days = []
for item in fixed:
dt1 = parser.isoparse(item["created_at"])
dt2 = parser.isoparse(item["fixed_at"])
d = (dt2 - dt1).total_seconds() / 86400
days.append(d)
days_stat = {}
days_stat["days"] = days
- days_stat["avg"] = sum(days) / len(days)
- days_stat["median"] = median(days)
+ if days:
+ days_stat["avg"] = sum(days) / len(days)
+ days_stat["median"] = median(days)
+ else:
+ days_stat["avg"] = 0
+ days_stat["median"] = 0
return days_stat📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]: | |
| """Fill-in statistic about days needed to fix the CVEs.""" | |
| fixed = filter_by(source_data, "state", "fixed") | |
| days = [] | |
| for item in fixed: | |
| dt1 = parser.isoparse(item["created_at"]) | |
| dt2 = parser.isoparse(item["fixed_at"]) | |
| d = (dt2 - dt1).total_seconds() / 86400 | |
| days.append(d) | |
| days_stat = {} | |
| days_stat["days"] = days | |
| days_stat["avg"] = sum(days) / len(days) | |
| days_stat["median"] = median(days) | |
| return days_stat | |
| def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]: | |
| """Fill-in statistic about days needed to fix the CVEs.""" | |
| fixed = filter_by(source_data, "state", "fixed") | |
| days = [] | |
| for item in fixed: | |
| dt1 = parser.isoparse(item["created_at"]) | |
| dt2 = parser.isoparse(item["fixed_at"]) | |
| d = (dt2 - dt1).total_seconds() / 86400 | |
| days.append(d) | |
| days_stat = {} | |
| days_stat["days"] = days | |
| if days: | |
| days_stat["avg"] = sum(days) / len(days) | |
| days_stat["median"] = median(days) | |
| else: | |
| days_stat["avg"] = 0 | |
| days_stat["median"] = 0 | |
| return days_stat |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/vulnerability_report.py` around lines 232 - 245, The
fill_in_days_statistic function does not handle the case when the days list is
empty, which causes a ZeroDivisionError on line 243 when calculating the average
(sum(days) / len(days)) and a StatisticsError on line 244 when calculating the
median. Add a guard condition to check if the days list is empty before
performing these calculations, and return an appropriate value for the empty
case (such as setting avg and median to None or 0) to prevent these exceptions
when no fixed alerts exist.
| def fill_in_vulnerable_packages(source_data: DependabotAlerts) -> Counter[Any]: | ||
| """Fill-in vulnerable packages with CVE frequency info.""" | ||
| package_names = [item["dependency"]["package"]["name"] for item in source_data] | ||
| return Counter(package_names) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Add error handling for missing dependency fields.
The list comprehension performs deep dict access to item["dependency"]["package"]["name"] without validation. If any level is missing, this raises KeyError. The validation in load_dependabot_file (lines 125-140) doesn't check for the dependency field hierarchy.
🛡️ Proposed fix to add validation in load_dependabot_file
Add this check inside the validation loop (around line 139):
sa = item.get("security_advisory")
if not isinstance(sa, dict) or "severity" not in sa:
raise ValueError(
f"Alert at index {idx} is missing 'security_advisory.severity'"
)
+ dep = item.get("dependency")
+ if not isinstance(dep, dict):
+ raise ValueError(f"Alert at index {idx} is missing 'dependency'")
+ pkg = dep.get("package")
+ if not isinstance(pkg, dict) or "name" not in pkg:
+ raise ValueError(
+ f"Alert at index {idx} is missing 'dependency.package.name'"
+ )
return data🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/vulnerability_report.py` around lines 248 - 251, The
fill_in_vulnerable_packages function performs unsafe deep dictionary access on
item["dependency"]["package"]["name"] without validating that these nested keys
exist, which can cause KeyError exceptions. Add proper error handling to
validate that each level of the dictionary hierarchy exists before accessing it.
You can either use the dict.get() method with default values in the list
comprehension, or add validation logic to check for the presence of the
"dependency", "package", and "name" fields before attempting to access them.
Ensure that items with missing fields are either skipped gracefully or handled
appropriately rather than causing the entire function to fail.
| def fill_in_cve_created_dates(source_data: DependabotAlerts) -> Counter[datetime]: | ||
| """Fill-in dates (freq.) when new CVE was detected by Dependabot.""" | ||
| dates_str = [item["created_at"][:10] for item in source_data] | ||
| dates = [datetime.strptime(date_str, "%Y-%m-%d") for date_str in dates_str] | ||
| return Counter(dates) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Add validation for the created_at timestamp field.
The function accesses item["created_at"] without validation (line 256). If created_at is missing or malformed, this raises KeyError or parsing errors. The validation in load_dependabot_file doesn't check for this required field.
🛡️ Proposed fix to add validation in load_dependabot_file
Add this check inside the validation loop (around line 133):
if "state" not in item:
raise ValueError(f"Alert at index {idx} is missing 'state'")
+ if "created_at" not in item:
+ raise ValueError(f"Alert at index {idx} is missing 'created_at'")
+ # For fixed alerts, fixed_at is also required
+ if item.get("state") == "fixed" and "fixed_at" not in item:
+ raise ValueError(f"Alert at index {idx} is missing 'fixed_at'")
sa = item.get("security_advisory")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/vulnerability_report.py` around lines 254 - 258, The
fill_in_cve_created_dates function accesses item["created_at"] without
validating that the field exists or is properly formatted, which causes KeyError
or parsing failures. Add validation in the load_dependabot_file function to
ensure every item in the source data has the required created_at field with a
valid timestamp format before it is passed to fill_in_cve_created_dates. This
upstream validation will prevent errors when the datetime.strptime call attempts
to parse the date string.
| return Counter(dates) | ||
|
|
||
|
|
||
| def process_dependabot_file(dependabot_file: str, prefix: str) -> dict[str, Any]: |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Remove unused prefix parameter.
The prefix parameter is declared but never used in the function body. It should either be used for its intended purpose or removed to reduce confusion.
♻️ Proposed fix to remove unused parameter
-def process_dependabot_file(dependabot_file: str, prefix: str) -> dict[str, Any]:
+def process_dependabot_file(dependabot_file: str) -> dict[str, Any]:
"""Read Dependabot alerts and prepare statistic info."""And update the call site in main:
- stat = process_dependabot_file(dependabot_file, prefix)
+ stat = process_dependabot_file(dependabot_file)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/vulnerability_report.py` at line 261, The `prefix` parameter in the
`process_dependabot_file` function is declared but never used within the
function body. Remove the `prefix` parameter from the function signature of
`process_dependabot_file` and update the call site in the `main` function (and
any other locations where `process_dependabot_file` is invoked) to remove the
corresponding prefix argument being passed.
Description
LCORE-2631: Vulnerability report - stat processing
Type of change
Tools used to create PR
Related Tickets & Documents
Summary by CodeRabbit
New Features