Commit d61da2c
perf: replace per-object async delete with SQL cascade walker (#14566)
* Optimize prepare_duplicates_for_delete and add test coverage
Replace per-original O(n×m) loop with a single bulk UPDATE for
inside-scope duplicate reset. Outside-scope reconfiguration still
runs per-original but now uses .iterator() and .exists() to avoid
loading full querysets into memory.
Also adds WARN-level logging to fix_loop_duplicates for visibility
into how often duplicate loops occur in production, and a comment on
removeLoop explaining the optimization opportunity.
* fix: remove unused import and fix docstring lint warning
* perf: eliminate per-original queries with prefetch and bulk reset
Remove redundant .exclude() and .exists() calls by leveraging the
bulk UPDATE that already unlinks inside-scope duplicates. Add
prefetch_related to fetch all reverse relations in a single query.
* add comment
* perf: replace per-object async delete with SQL cascade walker
Replace the per-object obj.delete() approach in async_delete_crawl_task
with a recursive SQL cascade walker that compiles QuerySets to raw SQL
and walks model._meta.related_objects bottom-up. This auto-discovers
all FK relations at runtime, including those added by plugins.
Key changes:
- New dojo/utils_cascade_delete.py: cascade_delete() utility
- New dojo/signals.py: pre_bulk_delete_findings signal for extensibility
- New bulk_clear_finding_m2m() in finding/helper.py for M2M cleanup
with FileUpload disk cleanup and orphaned Notes deletion
- Rewritten async_delete_crawl_task with chunked cascade deletion
- Removed async_delete_chunk_task (no longer needed)
- Product grading recalculated once at end instead of per-object
* perf: replace mass_model_updater with single UPDATE in reconfigure_duplicate_cluster
Use QuerySet.update() instead of mass_model_updater to re-point
duplicates to the new original. Single SQL query instead of loading
all findings into Python and calling bulk_update.
* cleanup: remove dead code from duplicate handling
Remove reset_duplicate_before_delete, reset_duplicates_before_delete,
and set_new_original — all replaced by bulk UPDATE in
prepare_duplicates_for_delete and .update() in
reconfigure_duplicate_cluster. Remove unused mass_model_updater import.
* fix: delete outside-scope duplicates before main scope to avoid FK violations
When bulk-deleting findings in chunks, an original in an earlier chunk
could fail to delete because its duplicate (higher ID) in a later chunk
still references it via duplicate_finding FK. Fix by deleting outside-scope
duplicates first, then the main scope.
Also moves pre_bulk_delete_findings signal into bulk_delete_findings so it
fires automatically.
* fix: use bool type for DD_DUPLICATE_CLUSTER_CASCADE_DELETE env var
* fix: replace save_no_options with .update() in reconfigure_duplicate_cluster
Avoids triggering Finding.save() signals (pre_save_changed,
execute_prioritization_calculations) when reconfiguring duplicate
clusters during deletion. Adds tests for cross-engagement duplicate
reconfiguration and product deletion with duplicates.
* refactor: scope prepare_duplicates_for_delete to full object, not per-engagement
Adds product= and product_type= parameters so the entire deletion scope
is handled in one call, avoiding unnecessary reconfiguration of findings
that are about to be deleted anyway. Uses subqueries instead of
materializing ID sets, and chunks the originals loop with prefetch to
bound memory. Reverts finding_delete to use ORM .delete() for single
finding cascade deletes.
* refactor: remove ASYNC_DELETE_MAPPING, use FINDING_SCOPE_FILTERS
Replace the model_list-based mapping with a simple scope filter dict.
prepare_duplicates_for_delete now accepts a single object and derives
the scope via FINDING_SCOPE_FILTERS. Removes the redundant non-Finding
model deletion loop — cascade_delete on the top-level object handles
all remaining children. Cleans up async_delete class.
* fix: resolve ruff lint violations in helper and tests
* remove obsolete test
* perf: add bulk_delete_findings, fix CASCADE_DELETE and scope expansion
- Add bulk_delete_findings() wrapper: M2M cleanup + chunked cascade_delete
- reconfigure_duplicate_cluster: return early when CASCADE_DELETE=True
instead of calling Django .delete() which fires signals per finding
- finding_delete: use bulk_delete_findings when CASCADE_DELETE=True
- async_delete_crawl_task: expand scope to include outside-scope duplicates,
use bulk_delete_findings instead of manual M2M + cascade_delete calls
- Fix test to use async_delete class instead of direct task import
* fix: handle M2M and tag cleanup in cascade_delete
Adds generic M2M through-table cleanup to cascade_delete so tags and
other M2M relations are cleared before row deletion. Introduces
bulk_remove_all_tags in tag_utils to properly decrement tagulous tag
counts during bulk deletion. Adds test for product deletion with tagged
objects.
* refactor: auto-discover TagFields in bulk_remove_all_tags
Instead of hardcoding field names, iterate over all fields on the model
and select those with tag_options. This avoids unexpected side effects
when callers pass a specific tag_field_name parameter.
* perf: address PR review feedback for large-scale delete safety
- Stream finding IDs via iterator()+batched instead of materializing
the full ID list into memory. Prevents OOM on 4.5M+ finding deletes.
- Add SET LOCAL statement_timeout (300s) and deadlock error logging to
cascade_delete SQL execution. Prevents runaway queries from holding
locks indefinitely and surfaces deadlock errors in logs.
- Reuse scope_ids subquery variable and replace .exists()+.count()
with a single .count() call to avoid evaluating the subquery twice.
- Add comment explaining why FileUpload uses per-object ORM delete
(custom delete() removes files from disk; file attachments are rare).
- Scope fix_loop_duplicates to the deletion set instead of scanning
the full findings table. The double self-join is cheap when filtered
to only findings in the scope being deleted.
- Document that pre_bulk_delete_findings signal receivers must not
materialize the full queryset (use .filter()/.iterator() instead).
- Add skip_m2m_for parameter to cascade_delete so bulk_delete_findings
can tell it Finding M2M was already cleaned by bulk_clear_finding_m2m,
avoiding redundant tag count aggregation queries.
* refactor: rename cascade_delete to cascade_delete_related_objects
The function now only deletes related objects, not the root record.
This allows async_delete_task to call obj.delete() on the top-level
object via ORM, which fires Django signals (post_delete notifications,
pghistory audit, Pro signals like product_post_delete).
bulk_delete_findings uses execute_delete_sql to delete the finding
rows themselves after cascade_delete_related_objects cleans children.
* Update dojo/settings/settings.dist.py
Co-authored-by: Cody Maffucci <46459665+Maffooch@users.noreply.github.com>
---------
Co-authored-by: Cody Maffucci <46459665+Maffooch@users.noreply.github.com>1 parent f0f0f7f commit d61da2c
9 files changed
Lines changed: 975 additions & 245 deletions
File tree
- dojo
- finding
- settings
- unittests
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1647 | 1647 | | |
1648 | 1648 | | |
1649 | 1649 | | |
1650 | | - | |
| 1650 | + | |
1651 | 1651 | | |
1652 | 1652 | | |
1653 | 1653 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
302 | 302 | | |
303 | 303 | | |
304 | 304 | | |
305 | | - | |
| 305 | + | |
306 | 306 | | |
307 | 307 | | |
308 | 308 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| 12 | + | |
| 13 | + | |
11 | 14 | | |
12 | 15 | | |
13 | 16 | | |
| |||
161 | 164 | | |
162 | 165 | | |
163 | 166 | | |
164 | | - | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
0 commit comments