Skip to content

fix(importers): keep dedup stats in sync with API responses#13707

Closed
ivhorodko wants to merge 1 commit intoDefectDojo:bugfixfrom
ivhorodko:bugfix/reimport-sync-stats
Closed

fix(importers): keep dedup stats in sync with API responses#13707
ivhorodko wants to merge 1 commit intoDefectDojo:bugfixfrom
ivhorodko:bugfix/reimport-sync-stats

Conversation

@ivhorodko
Copy link
Copy Markdown
Contributor

Description

Type: bugfix
Base branch: bugfix
Scope / Area: importers


Problem (what’s wrong)

As an API consumer I expect import post-processing (dedup, rules) to complete before the response is serialized so the counts match what the UI later shows. Currently /api/v2/import-scan and /api/v2/reimport-scan can return “new” findings that are immediately flipped to duplicates once Celery finishes, so before/after/delta disagree with the final state.


Evidence (API vs UI)

Immediate API Response
"after": {
  "info": {
    "active": 0,
    "verified": 0,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 0
  },
  "low": {
    "active": 2,
    "verified": 2,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 2
  },
  "medium": {
    "active": 5,
    "verified": 5,
    "duplicate": 4,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 9
  },
  "high": {
    "active": 1,
    "verified": 1,
    "duplicate": 3,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 4
  },
  "critical": {
    "active": 0,
    "verified": 0,
    "duplicate": 1,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 1
  },
  "total": {
    "active": 8,
    "verified": 8,
    "duplicate": 8,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 16
  }
}
Actual final result image

Steps to Reproduce

  1. Import or re-import a scan through the API using default user settings.
  2. Include findings that should deduplicate.
  3. Observe that the API response lists them as new, but after a short delay the UI/test history shows them as duplicates.

Root Cause (why it happens)

DefaultImporter and DefaultReImporter ignore the incoming sync kwarg when calling we_want_async(...). Even when callers explicitly request synchronous mode, post-processing is dispatched to Celery and finishes after the serializer has already read Test_Import.statistics.


Fix (what changed)

dojo/importers/default_importer.py, dojo/importers/default_reimporter.py: cache sync as sync_requested, pass it to we_want_async(...), and reuse it when deciding how to return the result. When sync=True, post-processing now stays inline so statistics always reflect the final deduped state. No API schema or behavior changes beyond improved consistency.


Impact / Risk

Touches import/reimport dedup flows. Main risk is around async scheduling, but async callers keep the existing behavior.


Test Results

Automated

  • Tests added/updated: no

Local run:

docker compose build && docker compose up -d
docker compose exec uwsgi bash -lc "python manage.py test unittests.test_importers_deduplication --keepdb"

Result: OK (50 tests)

Manual validation

Scenarios validated:

  • Import scan with duplicate findings via /api/v2/import-scan/ and sync=true: API now returns counts that match the test’s final deduped state.
  • Reimport scan with duplicate findings via /api/v2/reimport-scan/: statistics in the JSON response equal what UI shows immediately after the call.
API Response after fix
"after": {
  "info": {
    "active": 0,
    "verified": 0,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 0
  },
  "low": {
    "active": 0,
    "verified": 0,
    "duplicate": 2,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 2
  },
  "medium": {
    "active": 0,
    "verified": 0,
    "duplicate": 9,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 9
  },
  "high": {
    "active": 0,
    "verified": 0,
    "duplicate": 4,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 4
  },
  "critical": {
    "active": 0,
    "verified": 0,
    "duplicate": 1,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 1
  },
  "total": {
    "active": 0,
    "verified": 0,
    "duplicate": 16,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 16
  }
}

Documentation

  • Docs updates needed? no
  • Release notes: fix(importers): API statistics now reflect deduplicated results when sync execution is requested

@dryrunsecurity
Copy link
Copy Markdown

DryRun Security

🔴 Risk threshold exceeded.

This pull request modifies Dojo importer code and includes sensitive edits to dojo/importers/default_importer.py and dojo/importers/default_reimporter.py, and it also introduces a potential information disclosure where Finding objects may be fully serialized to JSON (exposing all concrete model fields) when sync=False. The sensitive paths and allowed authors can be configured in .dryrunsecurity.yaml to mitigate the codepath edits, and the serialization should explicitly limit fields to avoid leaking internal identifiers or other sensitive data.

🔴 Configured Codepaths Edit in dojo/importers/default_importer.py
Vulnerability Configured Codepaths Edit
Description Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml.
🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py
Vulnerability Configured Codepaths Edit
Description Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml.
Information Disclosure via JSON Serialization in dojo/importers/default_importer.py
Vulnerability Information Disclosure via JSON Serialization
Description When the sync parameter is set to False during the import process, Finding objects are serialized directly to JSON using Django's default serialize function without specifying which fields to include. This default behavior exposes all concrete fields of the Finding model, which are highly likely to contain sensitive information such as internal identifiers, detailed vulnerability descriptions, system paths, or internal notes, leading to unintended information disclosure.

return [serialize("json", [finding]) for finding in new_findings]
return new_findings

We've notified @mtesauro.


All finding details can be found in the DryRun Security Dashboard.

@ivhorodko ivhorodko closed this Nov 14, 2025
@ivhorodko ivhorodko deleted the bugfix/reimport-sync-stats branch November 14, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant