LCORE-2631: Vulnerability report - stat processing by tisnik · Pull Request #1973 · lightspeed-core/lightspeed-stack

tisnik · 2026-06-23T06:28:30Z

Description

LCORE-2631: Vulnerability report - stat processing

Type of change

Tools used to create PR

Assisted-by: N/A
Generated by: N/A

Related Tickets & Documents

Related Issue #LCORE-2631

Summary by CodeRabbit

New Features

Vulnerability reports now generate comprehensive statistics including counts by severity and state, vulnerability resolution times, affected packages, and CVE creation dates
Enhanced input validation for Dependabot alert data to ensure data integrity

coderabbitai · 2026-06-23T06:28:44Z

Walkthrough

scripts/vulnerability_report.py gains new imports (Counter, median, datetime), structural validation in load_dependabot_file, seven statistics-building helper functions, a fully implemented process_dependabot_file that assembles a stat dict, and a main update that captures and prints that result.

Changes

Dependabot Alert Statistics Pipeline

Layer / File(s)	Summary
Imports and JSON structural validation `scripts/vulnerability_report.py`	Adds `Counter`, `median`, `datetime`, and ISO-date parsing imports. `load_dependabot_file` validates the parsed JSON is a list of dicts, each having `state` and a `security_advisory` dict with a `severity` field; raises `ValueError` on any structural mismatch.
Statistics helpers and `process_dependabot_file` assembly `scripts/vulnerability_report.py`	Adds `filter_by`, `fill_in_state`, `fill_in_severity`, `fill_in_severities_set`, `fill_in_days_statistic`, `fill_in_vulnerable_packages`, and `fill_in_cve_created_dates`. `process_dependabot_file` is rewritten from a stub returning `{}` to assembling a `stat` dict with `count`, `state`, `severity`, `severities`, `days`, `packages`, and `dates`.
`main` output wiring `scripts/vulnerability_report.py`	`main` now assigns the return value of `process_dependabot_file` to `stat` and prints it, replacing the prior call that discarded the result.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1959: Introduced the vulnerability_report.py CLI skeleton that this PR extends with actual validation and statistics logic.
lightspeed-core/lightspeed-stack#1961: Added the stub process_dependabot_file returning {} that this PR replaces with the full statistics-building implementation.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically references the main change: implementing vulnerability report statistics processing functionality, which matches the changeset's primary focus.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/vulnerability_report.py`:
- Around line 232-245: The fill_in_days_statistic function does not handle the
case when the days list is empty, which causes a ZeroDivisionError on line 243
when calculating the average (sum(days) / len(days)) and a StatisticsError on
line 244 when calculating the median. Add a guard condition to check if the days
list is empty before performing these calculations, and return an appropriate
value for the empty case (such as setting avg and median to None or 0) to
prevent these exceptions when no fixed alerts exist.
- Around line 254-258: The fill_in_cve_created_dates function accesses
item["created_at"] without validating that the field exists or is properly
formatted, which causes KeyError or parsing failures. Add validation in the
load_dependabot_file function to ensure every item in the source data has the
required created_at field with a valid timestamp format before it is passed to
fill_in_cve_created_dates. This upstream validation will prevent errors when the
datetime.strptime call attempts to parse the date string.
- Around line 248-251: The fill_in_vulnerable_packages function performs unsafe
deep dictionary access on item["dependency"]["package"]["name"] without
validating that these nested keys exist, which can cause KeyError exceptions.
Add proper error handling to validate that each level of the dictionary
hierarchy exists before accessing it. You can either use the dict.get() method
with default values in the list comprehension, or add validation logic to check
for the presence of the "dependency", "package", and "name" fields before
attempting to access them. Ensure that items with missing fields are either
skipped gracefully or handled appropriately rather than causing the entire
function to fail.
- Line 261: The `prefix` parameter in the `process_dependabot_file` function is
declared but never used within the function body. Remove the `prefix` parameter
from the function signature of `process_dependabot_file` and update the call
site in the `main` function (and any other locations where
`process_dependabot_file` is invoked) to remove the corresponding prefix
argument being passed.
- Around line 125-140: The validation in this section is checking for the wrong
JSON path structure from the GitHub Dependabot API. In the validation block
where you retrieve the security advisory data using
`item.get("security_advisory")`, you need to replace this with
`item.get("security_vulnerability")` since the API structure places the
`severity` field under `security_vulnerability` not `security_advisory`. This
incorrect path also needs to be corrected throughout the file wherever
`security_advisory` is used to extract severity information, including the
downstream code that extracts and uses this data.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 39413dc1-00e4-4edf-95e7-3db234a7e7b5

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and c63f659.

📒 Files selected for processing (1)

scripts/vulnerability_report.py

📜 Review details

⏰ Context from checks skipped due to timeout. (16)

GitHub Check: spectral
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: Pylinter
GitHub Check: build-pr
GitHub Check: integration_tests (3.13)
GitHub Check: integration_tests (3.12)
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🔇 Additional comments (3)

scripts/vulnerability_report.py (3)

34-37: LGTM!

201-216: LGTM!

300-302: LGTM!

coderabbitai · 2026-06-23T06:32:39Z

+        data = json.load(fin)
+
+    # perform sanity check
+    if not isinstance(data, list):
+        raise ValueError("Dependabot alerts file must contain a JSON array")
+    for idx, item in enumerate(data):
+        if not isinstance(item, dict):
+            raise ValueError(f"Alert at index {idx} must be an object")
+        if "state" not in item:
+            raise ValueError(f"Alert at index {idx} is missing 'state'")
+        sa = item.get("security_advisory")
+        if not isinstance(sa, dict) or "severity" not in sa:
+            raise ValueError(
+                f"Alert at index {idx} is missing 'security_advisory.severity'"
+            )
+    return data


🗄️ Data Integrity & Integration | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check the structure of Dependabot alert JSON files to verify field names # Find JSON files matching the pattern organization__repository.json fd -e json -x bash -c ' file="$1" echo "=== Checking: $file ===" # Check if security_advisory exists jq -e ".[0].security_advisory" "$file" >/dev/null 2>&1 && echo "✓ Has security_advisory" # Check if security_vulnerability exists jq -e ".[0].security_vulnerability" "$file" >/dev/null 2>&1 && echo "✓ Has security_vulnerability" # Show the actual structure of the first alert echo "First alert structure:" jq ".[0] | keys" "$file" 2>/dev/null || echo "Failed to parse JSON" # Show severity field location echo "Severity field paths found:" jq ".[0] | paths(type == \"string\" and . == \"severity\") | join(\".\")" "$file" 2>/dev/null || echo "No severity found" echo "" ' _ {} \;

Repository: lightspeed-core/lightspeed-stack

Length of output: 1041

🏁 Script executed:

# First, let's find and examine the vulnerability_report.py file find . -name "vulnerability_report.py" -type f

Repository: lightspeed-core/lightspeed-stack

Length of output: 204

🏁 Script executed:

# Search for references to security_advisory and security_vulnerability in the codebase rg "security_advisory|security_vulnerability" --type py --type md

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

# Look for the context file mentioned in the review find . -path "*/.claude/commands/file-dependabot-cves.md" -type f

Repository: lightspeed-core/lightspeed-stack

Length of output: 213

🏁 Script executed:

# Look for test fixtures with Dependabot-like JSON data fd -e json | xargs rg -l "security_advisory|security_vulnerability" 2>/dev/null || echo "No matches found"

Repository: lightspeed-core/lightspeed-stack

Length of output: 187

🏁 Script executed:

# Read the vulnerability_report.py file cat -n ./scripts/vulnerability_report.py | head -150

Repository: lightspeed-core/lightspeed-stack

Length of output: 5615

🏁 Script executed:

# Read the context file cat -n ./.claude/commands/file-dependabot-cves.md

Repository: lightspeed-core/lightspeed-stack

Length of output: 5885

🏁 Script executed:

# Look for test files related to vulnerability_report fd "test.*vulnerability|vulnerability.*test" -e py

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

# Check all uses of security_advisory in the codebase to see what the code expects rg "security_advisory" ./scripts/vulnerability_report.py -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 434

🏁 Script executed:

# Look for any test files or fixtures find . -name "*test*" -type f | grep -i vulnerability

Repository: lightspeed-core/lightspeed-stack

Length of output: 170

🏁 Script executed:

# Check the full vulnerability_report.py for how security_advisory is used downstream sed -n '140,300p' ./scripts/vulnerability_report.py | cat -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 6725

Replace security_advisory.severity with security_vulnerability.severity throughout the file.

The validation and all downstream code reference security_advisory.severity, but the actual GitHub Dependabot API structure places severity in security_vulnerability, not security_advisory. Per the documented API format in .claude/commands/file-dependabot-cves.md, the correct extraction is severity: .security_vulnerability.severity. This mismatch causes validation to fail on all real Dependabot JSON exports and subsequent code to crash with KeyError at lines 190 and 228.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/vulnerability_report.py` around lines 125 - 140, The validation in this section is checking for the wrong JSON path structure from the GitHub Dependabot API. In the validation block where you retrieve the security advisory data using `item.get("security_advisory")`, you need to replace this with `item.get("security_vulnerability")` since the API structure places the `severity` field under `security_vulnerability` not `security_advisory`. This incorrect path also needs to be corrected throughout the file wherever `security_advisory` is used to extract severity information, including the downstream code that extracts and uses this data.

coderabbitai · 2026-06-23T06:32:39Z

+def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]:
+    """Fill-in statistic about days needed to fix the CVEs."""
+    fixed = filter_by(source_data, "state", "fixed")
+    days = []
+    for item in fixed:
+        dt1 = parser.isoparse(item["created_at"])
+        dt2 = parser.isoparse(item["fixed_at"])
+        d = (dt2 - dt1).total_seconds() / 86400
+        days.append(d)
+    days_stat = {}
+    days_stat["days"] = days
+    days_stat["avg"] = sum(days) / len(days)
+    days_stat["median"] = median(days)
+    return days_stat


🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Critical: Division by zero when no alerts are fixed.

When filter_by returns an empty list (no fixed alerts), line 243 performs sum(days) / len(days) which raises ZeroDivisionError, and line 244's median(days) raises StatisticsError for an empty sequence.

🐛 Proposed fix to handle empty fixed list

def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]: """Fill-in statistic about days needed to fix the CVEs.""" fixed = filter_by(source_data, "state", "fixed") days = [] for item in fixed: dt1 = parser.isoparse(item["created_at"]) dt2 = parser.isoparse(item["fixed_at"]) d = (dt2 - dt1).total_seconds() / 86400 days.append(d) days_stat = {} days_stat["days"] = days - days_stat["avg"] = sum(days) / len(days) - days_stat["median"] = median(days) + if days: + days_stat["avg"] = sum(days) / len(days) + days_stat["median"] = median(days) + else: + days_stat["avg"] = 0 + days_stat["median"] = 0 return days_stat

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]:

"""Fill-in statistic about days needed to fix the CVEs."""

fixed = filter_by(source_data, "state", "fixed")

days = []

for item in fixed:

dt1 = parser.isoparse(item["created_at"])

dt2 = parser.isoparse(item["fixed_at"])

d = (dt2 - dt1).total_seconds() / 86400

days.append(d)

days_stat = {}

days_stat["days"] = days

days_stat["avg"] = sum(days) / len(days)

days_stat["median"] = median(days)

return days_stat

def fill_in_days_statistic(source_data: DependabotAlerts) -> dict[str, Any]:

"""Fill-in statistic about days needed to fix the CVEs."""

fixed = filter_by(source_data, "state", "fixed")

days = []

for item in fixed:

dt1 = parser.isoparse(item["created_at"])

dt2 = parser.isoparse(item["fixed_at"])

d = (dt2 - dt1).total_seconds() / 86400

days.append(d)

days_stat = {}

days_stat["days"] = days

if days:

days_stat["avg"] = sum(days) / len(days)

days_stat["median"] = median(days)

else:

days_stat["avg"] = 0

days_stat["median"] = 0

return days_stat

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/vulnerability_report.py` around lines 232 - 245, The fill_in_days_statistic function does not handle the case when the days list is empty, which causes a ZeroDivisionError on line 243 when calculating the average (sum(days) / len(days)) and a StatisticsError on line 244 when calculating the median. Add a guard condition to check if the days list is empty before performing these calculations, and return an appropriate value for the empty case (such as setting avg and median to None or 0) to prevent these exceptions when no fixed alerts exist.

coderabbitai · 2026-06-23T06:32:39Z

+def fill_in_vulnerable_packages(source_data: DependabotAlerts) -> Counter[Any]:
+    """Fill-in vulnerable packages with CVE frequency info."""
+    package_names = [item["dependency"]["package"]["name"] for item in source_data]
+    return Counter(package_names)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add error handling for missing dependency fields.

The list comprehension performs deep dict access to item["dependency"]["package"]["name"] without validation. If any level is missing, this raises KeyError. The validation in load_dependabot_file (lines 125-140) doesn't check for the dependency field hierarchy.

🛡️ Proposed fix to add validation in load_dependabot_file

Add this check inside the validation loop (around line 139):

sa = item.get("security_advisory") if not isinstance(sa, dict) or "severity" not in sa: raise ValueError( f"Alert at index {idx} is missing 'security_advisory.severity'" ) + dep = item.get("dependency") + if not isinstance(dep, dict): + raise ValueError(f"Alert at index {idx} is missing 'dependency'") + pkg = dep.get("package") + if not isinstance(pkg, dict) or "name" not in pkg: + raise ValueError( + f"Alert at index {idx} is missing 'dependency.package.name'" + ) return data

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/vulnerability_report.py` around lines 248 - 251, The fill_in_vulnerable_packages function performs unsafe deep dictionary access on item["dependency"]["package"]["name"] without validating that these nested keys exist, which can cause KeyError exceptions. Add proper error handling to validate that each level of the dictionary hierarchy exists before accessing it. You can either use the dict.get() method with default values in the list comprehension, or add validation logic to check for the presence of the "dependency", "package", and "name" fields before attempting to access them. Ensure that items with missing fields are either skipped gracefully or handled appropriately rather than causing the entire function to fail.

coderabbitai · 2026-06-23T06:32:39Z

+def fill_in_cve_created_dates(source_data: DependabotAlerts) -> Counter[datetime]:
+    """Fill-in dates (freq.) when new CVE was detected by Dependabot."""
+    dates_str = [item["created_at"][:10] for item in source_data]
+    dates = [datetime.strptime(date_str, "%Y-%m-%d") for date_str in dates_str]
+    return Counter(dates)


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add validation for the created_at timestamp field.

The function accesses item["created_at"] without validation (line 256). If created_at is missing or malformed, this raises KeyError or parsing errors. The validation in load_dependabot_file doesn't check for this required field.

🛡️ Proposed fix to add validation in load_dependabot_file

Add this check inside the validation loop (around line 133):

if "state" not in item: raise ValueError(f"Alert at index {idx} is missing 'state'") + if "created_at" not in item: + raise ValueError(f"Alert at index {idx} is missing 'created_at'") + # For fixed alerts, fixed_at is also required + if item.get("state") == "fixed" and "fixed_at" not in item: + raise ValueError(f"Alert at index {idx} is missing 'fixed_at'") sa = item.get("security_advisory")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/vulnerability_report.py` around lines 254 - 258, The fill_in_cve_created_dates function accesses item["created_at"] without validating that the field exists or is properly formatted, which causes KeyError or parsing failures. Add validation in the load_dependabot_file function to ensure every item in the source data has the required created_at field with a valid timestamp format before it is passed to fill_in_cve_created_dates. This upstream validation will prevent errors when the datetime.strptime call attempts to parse the date string.

coderabbitai · 2026-06-23T06:32:39Z

+    return Counter(dates)
+
+
 def process_dependabot_file(dependabot_file: str, prefix: str) -> dict[str, Any]:


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove unused prefix parameter.

The prefix parameter is declared but never used in the function body. It should either be used for its intended purpose or removed to reduce confusion.

♻️ Proposed fix to remove unused parameter

-def process_dependabot_file(dependabot_file: str, prefix: str) -> dict[str, Any]: +def process_dependabot_file(dependabot_file: str) -> dict[str, Any]: """Read Dependabot alerts and prepare statistic info."""

And update the call site in main:

- stat = process_dependabot_file(dependabot_file, prefix) + stat = process_dependabot_file(dependabot_file)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/vulnerability_report.py` at line 261, The `prefix` parameter in the `process_dependabot_file` function is declared but never used within the function body. Remove the `prefix` parameter from the function signature of `process_dependabot_file` and update the call site in the `main` function (and any other locations where `process_dependabot_file` is invoked) to remove the corresponding prefix argument being passed.

LCORE-2631: Vulnerability report - stat processing

c63f659

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

tisnik merged commit 6d4dc86 into lightspeed-core:main Jun 23, 2026
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-2631: Vulnerability report - stat processing#1973

LCORE-2631: Vulnerability report - stat processing#1973
tisnik merged 1 commit into
lightspeed-core:mainfrom
tisnik:lcore-2631-vulnerability-report-stat-processing

tisnik commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		return Counter(dates)


		def process_dependabot_file(dependabot_file: str, prefix: str) -> dict[str, Any]:

Conversation

tisnik commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Summary by CodeRabbit

New Features

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tisnik commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading