Skip to content

Commit e4193ae

Browse files
perf(tags): wrap importer in tag_inheritance.batch() + bulk flush after exit
Stage 2 of Phase B. Wraps the import / reimport orchestration in `process_scan` (default_importer + default_reimporter) inside `tag_inheritance.batch()`. Inside the batch: - `make_inherited_tags_sticky` (m2m_changed) early-returns - `inherit_tags_on_instance` (post_save, gated on created=True) early-returns - `inherit_tags_on_linked_instance` (post_save) early-returns After the batch exits, `tag_inheritance.flush_for_product(product)` runs once and bulk-applies inheritance to every child via the Phase A bulk diff path. The bulk diff in `_sync_inheritance_for_qs` (dojo/product/helpers.py) is extended with a re-merge step: it reads each child's current `tags` through-table and ensures every target inherited name is present. This is the bulk equivalent of `make_inherited_tags_sticky` and is needed because `tags.set([...])` inside the batch (e.g. `update_test_tags` during reimport) can wipe inherited names from `tags` while `inherited_tags` stays in sync. A current-tags read is added to gate the re-merge so it only writes rows that are actually missing. `tag_inheritance.flush_for_product(product)` is added as the public flush helper. Internally delegates to `propagate_tags_on_product_sync`. Pinned perf-test impact (this branch vs Stage 1): ZAP scan import V2 : 1385 -> 1006 (~27% drop) ZAP scan import V3 : 1263 -> 984 (~22% drop) ZAP reimport V2 : 69 -> 82 (+13: flush has fixed cost) ZAP reimport V3 : 87 -> 140 (+53: flush has fixed cost) product_tag_add -> 100 findings V2/V3 : 91 -> 94 (+3: tags read for re-merge) product_tag_remove -> 100 findings V2/V3 : 53 -> 56 (+3) product_tag_add -> 100 endpoints V2 : 91 -> 194 (eager Celery + explicit propagate both pay re-merge cost) product_tag_remove -> 100 endpoints V2 : 53 -> 56 product_tag_add -> 100 locations V3 : 316 -> 320 product_tag_remove -> 100 locations V3 : 266 -> 270 The reimport and product-toggle increases are eliminated in Stages 3+4+5 (drop the duplicate `inherited_tags` TagField and the re-merge step becomes free because `tags` is the single source of truth).
1 parent ec02e2e commit e4193ae

6 files changed

Lines changed: 172 additions & 75 deletions

File tree

dojo/importers/default_importer.py

Lines changed: 34 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from django.db.models.query_utils import Q
66
from django.urls import reverse
77

8+
from dojo import tag_inheritance
89
from dojo.celery_dispatch import dojo_dispatch_task
910
from dojo.finding import helper as finding_helper
1011
from dojo.importers.base_importer import BaseImporter, Parser
@@ -114,29 +115,39 @@ def process_scan(
114115
# Note: for fresh imports, parse_findings() calls create_test() internally,
115116
# so self.test is guaranteed to be set after this call.
116117
parsed_findings = self.parse_findings(scan, parser) or []
117-
new_findings = self.process_findings(parsed_findings, **kwargs)
118-
# Close any old findings in the processed list if the the user specified for that
119-
# to occur in the form that is then passed to the kwargs
120-
closed_findings = self.close_old_findings(self.test.finding_set.all(), **kwargs)
121-
# Update the timestamps of the test object by looking at the findings imported
122-
self.update_timestamps()
123-
# Update the test meta
124-
self.update_test_meta()
125-
# Save the test and engagement for changes to take affect
126-
self.test.save()
127-
self.test.engagement.save()
128-
# Create a test import history object to record the flags sent to the importer
129-
# This operation will return None if the user does not have the import history
130-
# feature enabled
131-
test_import_history = self.update_import_history(
132-
new_findings=new_findings,
133-
closed_findings=closed_findings,
134-
)
135-
# Apply tags to findings and endpoints/locations
136-
self.apply_import_tags(
137-
new_findings=new_findings,
138-
closed_findings=closed_findings,
139-
)
118+
# Suppress per-row tag-inheritance signal work during the import hot
119+
# loop. Inheritance is applied in bulk after the batch via
120+
# `tag_inheritance.flush_for_product` (see below). This replaces the
121+
# per-finding `_manage_inherited_tags` signal cascade with a single
122+
# `propagate_tags_on_product_sync` pass.
123+
with tag_inheritance.batch():
124+
new_findings = self.process_findings(parsed_findings, **kwargs)
125+
# Close any old findings in the processed list if the the user specified for that
126+
# to occur in the form that is then passed to the kwargs
127+
closed_findings = self.close_old_findings(self.test.finding_set.all(), **kwargs)
128+
# Update the timestamps of the test object by looking at the findings imported
129+
self.update_timestamps()
130+
# Update the test meta
131+
self.update_test_meta()
132+
# Save the test and engagement for changes to take affect
133+
self.test.save()
134+
self.test.engagement.save()
135+
# Create a test import history object to record the flags sent to the importer
136+
# This operation will return None if the user does not have the import history
137+
# feature enabled
138+
test_import_history = self.update_import_history(
139+
new_findings=new_findings,
140+
closed_findings=closed_findings,
141+
)
142+
# Apply tags to findings and endpoints/locations
143+
self.apply_import_tags(
144+
new_findings=new_findings,
145+
closed_findings=closed_findings,
146+
)
147+
# Flush inherited tags in bulk for all children of the product touched
148+
# by this import. Idempotent and cheap when tag inheritance is
149+
# disabled (it short-circuits on the system + per-product flags).
150+
tag_inheritance.flush_for_product(self.test.engagement.product)
140151
# Send out some notifications to the user
141152
logger.debug("IMPORT_SCAN: Generating notifications")
142153
dojo_dispatch_task(

dojo/importers/default_reimporter.py

Lines changed: 46 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from django.db.models.query_utils import Q
77

88
import dojo.finding.helper as finding_helper
9+
from dojo import tag_inheritance
910
from dojo.celery_dispatch import dojo_dispatch_task
1011
from dojo.finding.deduplication import (
1112
find_candidates_for_deduplication_hash,
@@ -107,43 +108,51 @@ def process_scan(
107108
# Get the findings from the parser based on what methods the parser supplies
108109
# This could either mean traditional file parsing, or API pull parsing
109110
parsed_findings = self.parse_findings(scan, parser) or []
110-
(
111-
new_findings,
112-
reactivated_findings,
113-
findings_to_mitigate,
114-
untouched_findings,
115-
) = self.process_findings(parsed_findings, **kwargs)
116-
# Close any old findings in the processed list if the the user specified for that
117-
# to occur in the form that is then passed to the kwargs
118-
closed_findings = self.close_old_findings(findings_to_mitigate, **kwargs)
119-
# Update the timestamps of the test object by looking at the findings imported
120-
logger.debug("REIMPORT_SCAN: Updating test/engagement timestamps")
121-
# Update the timestamps of the test object by looking at the findings imported
122-
self.update_timestamps()
123-
# Update the test meta
124-
self.update_test_meta()
125-
# Update the test tags
126-
self.update_test_tags()
127-
# Save the test and engagement for changes to take affect
128-
self.test.save()
129-
self.test.engagement.save()
130-
logger.debug("REIMPORT_SCAN: Updating test tags")
131-
# Create a test import history object to record the flags sent to the importer
132-
# This operation will return None if the user does not have the import history
133-
# feature enabled
134-
test_import_history = self.update_import_history(
135-
new_findings=new_findings,
136-
closed_findings=closed_findings,
137-
reactivated_findings=reactivated_findings,
138-
untouched_findings=untouched_findings,
139-
)
140-
# Apply tags to findings and endpoints
141-
self.apply_import_tags(
142-
new_findings=new_findings,
143-
closed_findings=closed_findings,
144-
reactivated_findings=reactivated_findings,
145-
untouched_findings=untouched_findings,
146-
)
111+
# Suppress per-row tag-inheritance signal work during the reimport
112+
# hot loop. Inheritance is applied in bulk after the batch via
113+
# `tag_inheritance.flush_for_product`. See default_importer for the
114+
# rationale.
115+
with tag_inheritance.batch():
116+
(
117+
new_findings,
118+
reactivated_findings,
119+
findings_to_mitigate,
120+
untouched_findings,
121+
) = self.process_findings(parsed_findings, **kwargs)
122+
# Close any old findings in the processed list if the the user specified for that
123+
# to occur in the form that is then passed to the kwargs
124+
closed_findings = self.close_old_findings(findings_to_mitigate, **kwargs)
125+
# Update the timestamps of the test object by looking at the findings imported
126+
logger.debug("REIMPORT_SCAN: Updating test/engagement timestamps")
127+
# Update the timestamps of the test object by looking at the findings imported
128+
self.update_timestamps()
129+
# Update the test meta
130+
self.update_test_meta()
131+
# Update the test tags
132+
self.update_test_tags()
133+
# Save the test and engagement for changes to take affect
134+
self.test.save()
135+
self.test.engagement.save()
136+
logger.debug("REIMPORT_SCAN: Updating test tags")
137+
# Create a test import history object to record the flags sent to the importer
138+
# This operation will return None if the user does not have the import history
139+
# feature enabled
140+
test_import_history = self.update_import_history(
141+
new_findings=new_findings,
142+
closed_findings=closed_findings,
143+
reactivated_findings=reactivated_findings,
144+
untouched_findings=untouched_findings,
145+
)
146+
# Apply tags to findings and endpoints
147+
self.apply_import_tags(
148+
new_findings=new_findings,
149+
closed_findings=closed_findings,
150+
reactivated_findings=reactivated_findings,
151+
untouched_findings=untouched_findings,
152+
)
153+
# Bulk-apply tag inheritance for all children of the product touched
154+
# by this reimport.
155+
tag_inheritance.flush_for_product(self.test.engagement.product)
147156
# Send out som notifications to the user
148157
logger.debug("REIMPORT_SCAN: Generating notifications")
149158
updated_count = (

dojo/product/helpers.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,20 @@ def _sync_inheritance_for_qs(queryset, *, target_names_per_child):
112112
for child_id, tag_name in existing_pairs:
113113
old_inherited_by_child[child_id].add(tag_name)
114114

115-
# Compute per-child diff and bucket by tag name.
115+
# Compute per-child diff and bucket by tag name. Two diffs are computed:
116+
# - inherited_tags add/remove: keeps the inherited_tags M2M in sync
117+
# with the target.
118+
# - tags re-merge: ensures every target name is also present on `tags`,
119+
# even when inherited_tags already matched. This is the bulk
120+
# equivalent of `make_inherited_tags_sticky` enforcement, needed for
121+
# the importer hot path where `test.tags.set([...])` overwrites the
122+
# full tag list inside a `tag_inheritance.batch()` block.
116123
add_map: dict[str, list] = defaultdict(list)
117124
remove_map: dict[str, list] = defaultdict(list)
125+
target_per_child: dict[int, set[str]] = {}
118126
for child in children:
119127
target = target_names_per_child(child)
128+
target_per_child[child.pk] = target
120129
old = old_inherited_by_child.get(child.pk, set())
121130
for name in target - old:
122131
add_map[name].append(child)
@@ -133,3 +142,33 @@ def _sync_inheritance_for_qs(queryset, *, target_names_per_child):
133142
for name, instances in remove_map.items():
134143
bulk_remove_tags_from_instances(name, instances, tag_field_name="inherited_tags")
135144
bulk_remove_tags_from_instances(name, instances, tag_field_name="tags")
145+
146+
# Bulk re-merge: ensure every target name is present on `tags`. We need
147+
# this for the importer hot path where `tags.set([...])` inside a
148+
# `tag_inheritance.batch()` can wipe inherited names from `tags` while
149+
# `inherited_tags` stays in sync (so the diff above is empty).
150+
#
151+
# Read the current `tags` per child so we only write rows that are
152+
# actually missing — without this guard the re-merge becomes O(target *
153+
# children) bulk_create attempts for every product-tag toggle.
154+
tags_field = model_class._meta.get_field("tags")
155+
tags_through = tags_field.remote_field.through
156+
tags_tag_model = tags_field.related_model
157+
existing_tags_pairs = tags_through.objects.filter(
158+
**{f"{source_field_name}__in": child_ids},
159+
).values_list(source_field_name, f"{tags_tag_model._meta.model_name}__name")
160+
161+
current_tags_by_child: dict[int, set[str]] = defaultdict(set)
162+
for child_id, tag_name in existing_tags_pairs:
163+
current_tags_by_child[child_id].add(tag_name)
164+
165+
remerge_map: dict[str, list] = defaultdict(list)
166+
for child in children:
167+
target = target_per_child[child.pk]
168+
current = current_tags_by_child.get(child.pk, set())
169+
# Skip names already added by the diff above; only fix true drift.
170+
already_added = {name for name, lst in add_map.items() if child in lst}
171+
for name in target - current - already_added:
172+
remerge_map[name].append(child)
173+
if remerge_map:
174+
bulk_add_tag_mapping(remerge_map, tag_field_name="tags")

dojo/tag_inheritance.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ def batch():
3737
# Bulk operations that would otherwise fire `make_inherited_tags_sticky`
3838
# or `inherit_tags_on_instance` per row.
3939
...
40+
# After exit, callers should explicitly bulk-apply inheritance via
41+
# `propagate_tags_on_product_sync(product)` (or equivalent) for any
42+
# product whose children were created/modified inside the batch.
4043
4144
The context is reentrant; nested ``with`` blocks share the suppression
4245
until the outermost block exits. State lives in ``threading.local()``,
@@ -52,3 +55,19 @@ def batch():
5255
# Clean up the attribute so leak-free thread reuse stays simple.
5356
with contextlib.suppress(AttributeError):
5457
del _state.depth
58+
59+
60+
def flush_for_product(product) -> None:
61+
"""
62+
Bulk-apply tag inheritance to all children of `product`.
63+
64+
Intended to be called after a ``batch()`` block exits, when the calling
65+
code has created/modified many children of one product and the
66+
per-instance signal handlers were suppressed. Delegates to
67+
``propagate_tags_on_product_sync``, which uses the Phase A bulk-diff
68+
helpers to sync `inherited_tags` and `tags` across all children in O(1)
69+
queries per (model x tag) instead of O(N) rows.
70+
"""
71+
# Local import to avoid circulars at module import time.
72+
from dojo.product.helpers import propagate_tags_on_product_sync # noqa: PLC0415
73+
propagate_tags_on_product_sync(product)

dojo/tags_signals.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,19 @@ def inherit_tags_on_instance(sender, instance, created, **kwargs):
7272
# tag edits is handled by `make_inherited_tags_sticky` (m2m_changed).
7373
if not created:
7474
return
75+
# Inside a `tag_inheritance.batch()` block the caller takes responsibility
76+
# for applying inheritance in bulk after exit (typically via
77+
# `tag_inheritance.flush_for_product`).
78+
if tag_inheritance.is_in_batch():
79+
return
7580
inherit_instance_tags(instance)
7681

7782

7883
@receiver(signals.post_save, sender=LocationFindingReference)
7984
@receiver(signals.post_save, sender=LocationProductReference)
8085
def inherit_tags_on_linked_instance(sender, instance, created, **kwargs):
86+
if tag_inheritance.is_in_batch():
87+
return
8188
inherit_linked_instance_tags(instance)
8289

8390

unittests/test_tag_inheritance_perf.py

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -359,10 +359,12 @@ def test_baseline_product_tag_remove_propagates_to_100_locations_v3(self):
359359
# Findings-only scenarios.
360360
# Pre-Phase-A V2: 4758 add, 4540 remove. V3: 4759/4541.
361361
# Phase A bulk-propagate drops these dramatically.
362-
EXPECTED_PRODUCT_TAG_ADD_100_V2 = 91
363-
EXPECTED_PRODUCT_TAG_ADD_100_V3 = 91
364-
EXPECTED_PRODUCT_TAG_REMOVE_100_V2 = 53
365-
EXPECTED_PRODUCT_TAG_REMOVE_100_V3 = 53
362+
# Phase B Stage 2 adds ~3 queries to read current `tags` for the bulk
363+
# re-merge step (compensates for tags wiped inside batch contexts).
364+
EXPECTED_PRODUCT_TAG_ADD_100_V2 = 94
365+
EXPECTED_PRODUCT_TAG_ADD_100_V3 = 94
366+
EXPECTED_PRODUCT_TAG_REMOVE_100_V2 = 56
367+
EXPECTED_PRODUCT_TAG_REMOVE_100_V3 = 56
366368

367369
EXPECTED_CREATE_ONE_FINDING_V2 = 64
368370
EXPECTED_CREATE_ONE_FINDING_V3 = 64
@@ -375,12 +377,18 @@ def test_baseline_product_tag_remove_propagates_to_100_locations_v3(self):
375377
EXPECTED_FINDING_REMOVE_INHERITED_V3 = 44
376378

377379
# V2 endpoint paths. Pre-Phase-A: 3958 add, 3740 remove.
378-
EXPECTED_PRODUCT_TAG_ADD_100_ENDPOINTS = 91
379-
EXPECTED_PRODUCT_TAG_REMOVE_100_ENDPOINTS = 53
380+
# Phase B Stage 2 raises endpoint add 91 -> 194 because the eager Celery
381+
# propagate dispatched by m2m_changed and the explicit
382+
# propagate_tags_on_product_sync call both pay the new tags-read for
383+
# bulk re-merge. Acceptable: the same lever delivers a 27% drop on the
384+
# ZAP import path. Will go further down in Stages 3+4+5 when the
385+
# duplicate inherited_tags M2M is dropped.
386+
EXPECTED_PRODUCT_TAG_ADD_100_ENDPOINTS = 194
387+
EXPECTED_PRODUCT_TAG_REMOVE_100_ENDPOINTS = 56
380388

381389
# V3 location paths. Pre-Phase-A: 4532 add, 4307 remove.
382-
EXPECTED_PRODUCT_TAG_ADD_100_LOCATIONS = 316
383-
EXPECTED_PRODUCT_TAG_REMOVE_100_LOCATIONS = 266
390+
EXPECTED_PRODUCT_TAG_ADD_100_LOCATIONS = 320
391+
EXPECTED_PRODUCT_TAG_REMOVE_100_LOCATIONS = 270
384392

385393

386394
@override_settings(
@@ -495,9 +503,13 @@ def test_baseline_zap_scan_reimport_no_change_v3(self):
495503
# Pre-Phase-A: 1461/1319 import, 77/95 reimport.
496504
# Phase B Stage 1 (thread-safe batch context) adds ~20 queries on the V3
497505
# import path because the previous process-global signal-disconnect was
498-
# narrower in scope (Location.tags.through only). Net-positive trade for
499-
# eliminating the threading bug; full Phase B reductions land in Stage 2.
500-
EXPECTED_ZAP_IMPORT_V2 = 1385
501-
EXPECTED_ZAP_IMPORT_V3 = 1263
502-
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V2 = 69
503-
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3 = 87
506+
# narrower in scope (Location.tags.through only).
507+
# Phase B Stage 2 (importer-wide batch + flush_for_product) drops import
508+
# ~27%/22% (signal cascade replaced by single bulk propagate). Reimport
509+
# rises because flush always runs; bulk re-merge has a fixed cost even
510+
# when there's no work. Stages 3+4+5 (drop duplicate inherited_tags M2M)
511+
# will collapse the reimport cost.
512+
EXPECTED_ZAP_IMPORT_V2 = 1006
513+
EXPECTED_ZAP_IMPORT_V3 = 984
514+
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V2 = 82
515+
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3 = 140

0 commit comments

Comments
 (0)