fix(importers): keep dedup stats in sync with API responses by ivhorodko · Pull Request #13707 · DefectDojo/django-DefectDojo

ivhorodko · 2025-11-14T12:54:01Z

Description

Type: bugfix
Base branch: bugfix
Scope / Area: importers

Problem (what’s wrong)

As an API consumer I expect import post-processing (dedup, rules) to complete before the response is serialized so the counts match what the UI later shows. Currently /api/v2/import-scan and /api/v2/reimport-scan can return “new” findings that are immediately flipped to duplicates once Celery finishes, so before/after/delta disagree with the final state.

Evidence (API vs UI)

Immediate API Response

"after": {
  "info": {
    "active": 0,
    "verified": 0,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 0
  },
  "low": {
    "active": 2,
    "verified": 2,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 2
  },
  "medium": {
    "active": 5,
    "verified": 5,
    "duplicate": 4,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 9
  },
  "high": {
    "active": 1,
    "verified": 1,
    "duplicate": 3,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 4
  },
  "critical": {
    "active": 0,
    "verified": 0,
    "duplicate": 1,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 1
  },
  "total": {
    "active": 8,
    "verified": 8,
    "duplicate": 8,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 16
  }
}

Actual final result

Steps to Reproduce

Import or re-import a scan through the API using default user settings.
Include findings that should deduplicate.
Observe that the API response lists them as new, but after a short delay the UI/test history shows them as duplicates.

Root Cause (why it happens)

DefaultImporter and DefaultReImporter ignore the incoming sync kwarg when calling we_want_async(...). Even when callers explicitly request synchronous mode, post-processing is dispatched to Celery and finishes after the serializer has already read Test_Import.statistics.

Fix (what changed)

dojo/importers/default_importer.py, dojo/importers/default_reimporter.py: cache sync as sync_requested, pass it to we_want_async(...), and reuse it when deciding how to return the result. When sync=True, post-processing now stays inline so statistics always reflect the final deduped state. No API schema or behavior changes beyond improved consistency.

Impact / Risk

Touches import/reimport dedup flows. Main risk is around async scheduling, but async callers keep the existing behavior.

Test Results

Automated

Tests added/updated: no

Local run:

docker compose build && docker compose up -d
docker compose exec uwsgi bash -lc "python manage.py test unittests.test_importers_deduplication --keepdb"

Result: OK (50 tests)

Manual validation

Scenarios validated:

Import scan with duplicate findings via /api/v2/import-scan/ and sync=true: API now returns counts that match the test’s final deduped state.
Reimport scan with duplicate findings via /api/v2/reimport-scan/: statistics in the JSON response equal what UI shows immediately after the call.

API Response after fix

"after": {
  "info": {
    "active": 0,
    "verified": 0,
    "duplicate": 0,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 0
  },
  "low": {
    "active": 0,
    "verified": 0,
    "duplicate": 2,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 2
  },
  "medium": {
    "active": 0,
    "verified": 0,
    "duplicate": 9,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 9
  },
  "high": {
    "active": 0,
    "verified": 0,
    "duplicate": 4,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 4
  },
  "critical": {
    "active": 0,
    "verified": 0,
    "duplicate": 1,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 1
  },
  "total": {
    "active": 0,
    "verified": 0,
    "duplicate": 16,
    "false_p": 0,
    "out_of_scope": 0,
    "is_mitigated": 0,
    "risk_accepted": 0,
    "total": 16
  }
}

Documentation

Docs updates needed? no
Release notes: fix(importers): API statistics now reflect deduplicated results when sync execution is requested

dryrunsecurity · 2025-11-14T12:55:06Z

🔴 Risk threshold exceeded.

This pull request modifies Dojo importer code and includes sensitive edits to dojo/importers/default_importer.py and dojo/importers/default_reimporter.py, and it also introduces a potential information disclosure where Finding objects may be fully serialized to JSON (exposing all concrete model fields) when sync=False. The sensitive paths and allowed authors can be configured in .dryrunsecurity.yaml to mitigate the codepath edits, and the serialization should explicitly limit fields to avoid leaking internal identifiers or other sensitive data.

🔴 Configured Codepaths Edit in dojo/importers/default_importer.py

Vulnerability	Configured Codepaths Edit
Description	Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in `.dryrunsecurity.yaml`.

🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py

Vulnerability	Configured Codepaths Edit
Description	Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in `.dryrunsecurity.yaml`.

Information Disclosure via JSON Serialization in dojo/importers/default_importer.py

Vulnerability	Information Disclosure via JSON Serialization
Description	When the `sync` parameter is set to `False` during the import process, `Finding` objects are serialized directly to JSON using Django's default `serialize` function without specifying which fields to include. This default behavior exposes all concrete fields of the `Finding` model, which are highly likely to contain sensitive information such as internal identifiers, detailed vulnerability descriptions, system paths, or internal notes, leading to unintended information disclosure.

django-DefectDojo/dojo/importers/default_importer.py

Lines 288 to 290 in 0c7adea

    
               return [serialize("json", [finding]) for finding in new_findings] 
        
           return new_findings

We've notified @mtesauro.

All finding details can be found in the DryRun Security Dashboard.

fix(importers): keep dedup stats in sync with API responses

0c7adea

ivhorodko requested review from Maffooch and mtesauro as code owners November 14, 2025 12:54

ivhorodko closed this Nov 14, 2025

ivhorodko deleted the bugfix/reimport-sync-stats branch November 14, 2025 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(importers): keep dedup stats in sync with API responses#13707

fix(importers): keep dedup stats in sync with API responses#13707
ivhorodko wants to merge 1 commit intoDefectDojo:bugfixfrom
ivhorodko:bugfix/reimport-sync-stats

ivhorodko commented Nov 14, 2025

Uh oh!

dryrunsecurity Bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivhorodko commented Nov 14, 2025

Description

Problem (what’s wrong)

Evidence (API vs UI)

Steps to Reproduce

Root Cause (why it happens)

Fix (what changed)

Impact / Risk

Test Results

Documentation

Uh oh!

dryrunsecurity Bot commented Nov 14, 2025

🔴 Risk threshold exceeded.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant