fix(importers): keep dedup stats in sync with API responses#13707
fix(importers): keep dedup stats in sync with API responses#13707ivhorodko wants to merge 1 commit intoDefectDojo:bugfixfrom
Conversation
🔴 Risk threshold exceeded.This pull request modifies Dojo importer code and includes sensitive edits to dojo/importers/default_importer.py and dojo/importers/default_reimporter.py, and it also introduces a potential information disclosure where Finding objects may be fully serialized to JSON (exposing all concrete model fields) when sync=False. The sensitive paths and allowed authors can be configured in .dryrunsecurity.yaml to mitigate the codepath edits, and the serialization should explicitly limit fields to avoid leaking internal identifiers or other sensitive data.
🔴 Configured Codepaths Edit in
|
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
Information Disclosure via JSON Serialization in dojo/importers/default_importer.py
| Vulnerability | Information Disclosure via JSON Serialization |
|---|---|
| Description | When the sync parameter is set to False during the import process, Finding objects are serialized directly to JSON using Django's default serialize function without specifying which fields to include. This default behavior exposes all concrete fields of the Finding model, which are highly likely to contain sensitive information such as internal identifiers, detailed vulnerability descriptions, system paths, or internal notes, leading to unintended information disclosure. |
django-DefectDojo/dojo/importers/default_importer.py
Lines 288 to 290 in 0c7adea
We've notified @mtesauro.
All finding details can be found in the DryRun Security Dashboard.
Description
Type: bugfix
Base branch: bugfix
Scope / Area: importers
Problem (what’s wrong)
As an API consumer I expect import post-processing (dedup, rules) to complete before the response is serialized so the counts match what the UI later shows. Currently
/api/v2/import-scanand/api/v2/reimport-scancan return “new” findings that are immediately flipped to duplicates once Celery finishes, so before/after/delta disagree with the final state.Evidence (API vs UI)
Immediate API Response
Actual final result
Steps to Reproduce
Root Cause (why it happens)
DefaultImporter and DefaultReImporter ignore the incoming
synckwarg when callingwe_want_async(...). Even when callers explicitly request synchronous mode, post-processing is dispatched to Celery and finishes after the serializer has already readTest_Import.statistics.Fix (what changed)
dojo/importers/default_importer.py,dojo/importers/default_reimporter.py: cachesyncassync_requested, pass it towe_want_async(...), and reuse it when deciding how to return the result. Whensync=True, post-processing now stays inline so statistics always reflect the final deduped state. No API schema or behavior changes beyond improved consistency.Impact / Risk
Touches import/reimport dedup flows. Main risk is around async scheduling, but async callers keep the existing behavior.
Test Results
Automated
Local run:
Result: OK (50 tests)
Manual validation
Scenarios validated:
/api/v2/import-scan/andsync=true: API now returns counts that match the test’s final deduped state./api/v2/reimport-scan/: statistics in the JSON response equal what UI shows immediately after the call.API Response after fix
Documentation
fix(importers): API statistics now reflect deduplicated results when sync execution is requested