Skip to content

Commit ecd29bb

Browse files
perf(tags): batch_mode + per-batch bulk inheritance during import (Phase B Stage 2)
Wraps the import / reimport hot loop in `tag_inheritance.batch_mode()` and bulk-applies inherited Product tags per batch *before* `post_process_findings_batch` dispatches, so rules / deduplication see inherited tags on `finding.tags`. Changes: - `process_findings` (DefaultImporter + DefaultReImporter) now runs its finding-creation loop inside `batch_mode()`. Per batch, right after `apply_import_tags_for_batch`, calls a new helper `apply_inherited_tags_for_findings(batch_findings)` that bulk-syncs inherited tags on the batch's Findings plus the Endpoints (V2) / Locations (V3) reachable from them via FK chain. Inheritance is therefore applied to the persisted children before the post-process task dispatches. - `inherit_instance_tags` in `dojo/tags_signals.py` now early-returns when `tag_inheritance.is_in_batch_mode()`, so the batch wrap transparently suppresses per-row inheritance work for any caller — including `bulk_create` cleanup paths that invoke it manually. `inherit_tags_on_instance` post_save delegates to that helper, so the gate also covers signal-driven fires. - `EndpointManager.get_or_create_endpoints` replaces its per-row `inherit_instance_tags(ep)` cleanup loop with a single `apply_inherited_tags_for_endpoints(created)` bulk call. Inside the importer the per-batch helper already covers these endpoints via `Endpoint.status_endpoint.finding`; the bulk call is kept as a defensive hook for any non-importer caller. - `propagate_tags_on_product_sync` (used by the product-tag-toggle Celery task) gains an early-exit when neither system-wide nor per-product inheritance is enabled, eliminating ~9 wasted reads per call on inheritance-off products. State transitions (toggling either flag) still trigger a full sweep through their existing signal handlers. - `Location` gains `iter_related_products()`: a related-manager (`self.products` + `self.findings`) implementation of `all_related_products()` that returns `list[Product]`. Callers that pre-issue `prefetch_related("products__product__tags", "findings__finding__test__engagement__product__tags")` get zero extra queries per Location. The existing JOIN'd `all_related_products()` is kept for the per-instance signal path where prefetching is not possible. - `_inherited_tag_names_for_location` (the per-Location callback used to compute the inherited target set) switches to `iter_related_products()`; both call sites (`propagate_tags_on_product_sync` V3 branch and `apply_inherited_tags_for_findings` V3 branch) now prefetch the chain. Query-count impact on `unittests/test_tag_inheritance_perf.py` (pins updated in this commit): | Hot path | Before | After | Δ | |-----------------------------------------|--------:|-------:|------:| | ZAP scan import V2 (19 findings) | 1385 | 477 | -908 | | ZAP scan import V3 | 1263 | 945 | -318 | | ZAP reimport no-change V2 | 69 | 75 | +6 | | ZAP reimport no-change V3 | 87 | 102 | +15 | | Product tag add → 100 locations (V3) | 316 | 125 | -191 | | Product tag remove → 100 locations (V3) | 266 | 75 | -191 | Small reimport-no-change regressions are the unavoidable per-batch helper read cost (2 reads × Finding + 2 reads × Endpoint/Location + 1 product tags read). Real-work imports drop significantly because per-row `_manage_inherited_tags` work no longer fires inside the loop.
1 parent db1932c commit ecd29bb

7 files changed

Lines changed: 197 additions & 20 deletions

File tree

dojo/importers/default_importer.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from django.db.models.query_utils import Q
66
from django.urls import reverse
77

8+
from dojo import tag_inheritance
89
from dojo.celery_dispatch import dojo_dispatch_task
910
from dojo.finding import helper as finding_helper
1011
from dojo.importers.base_importer import BaseImporter, Parser
@@ -18,6 +19,7 @@
1819
Test_Import,
1920
)
2021
from dojo.notifications.helper import async_create_notification
22+
from dojo.product.helpers import apply_inherited_tags_for_findings
2123
from dojo.tag_utils import bulk_apply_parser_tags
2224
from dojo.utils import get_full_url, perform_product_grading
2325
from dojo.validators import clean_tags
@@ -161,6 +163,19 @@ def process_findings(
161163
self,
162164
parsed_findings: list[Finding],
163165
**kwargs: dict,
166+
) -> list[Finding]:
167+
# Whole hot loop runs under `batch_mode()`: per-row inheritance signals
168+
# for the findings/endpoints/locations created below are suppressed.
169+
# Inheritance is then applied in bulk per-batch (right before
170+
# `post_process_findings_batch` dispatch) so rules/dedup see inherited
171+
# tags on `finding.tags`.
172+
with tag_inheritance.batch_mode():
173+
return self._process_findings_inner(parsed_findings, **kwargs)
174+
175+
def _process_findings_inner(
176+
self,
177+
parsed_findings: list[Finding],
178+
**kwargs: dict,
164179
) -> list[Finding]:
165180
# Batched post-processing (no chord): dispatch a task per 1000 findings or on final finding
166181
batch_finding_ids: list[int] = []
@@ -266,6 +281,10 @@ def process_findings(
266281
findings_with_parser_tags.clear()
267282
# Apply import-time tags before post-processing so rules/deduplication see them.
268283
self.apply_import_tags_for_batch(batch_findings)
284+
# Apply inherited Product tags to this batch's findings (and
285+
# their endpoints/locations) BEFORE post_process_findings_batch
286+
# dispatches, so rules/dedup see inherited tags on .tags.
287+
apply_inherited_tags_for_findings(batch_findings)
269288
batch_findings.clear()
270289
finding_ids_batch = list(batch_finding_ids)
271290
batch_finding_ids.clear()

dojo/importers/default_reimporter.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
from django.db.models.query_utils import Q
77

88
import dojo.finding.helper as finding_helper
9+
from dojo import tag_inheritance
910
from dojo.celery_dispatch import dojo_dispatch_task
1011
from dojo.finding.deduplication import (
1112
find_candidates_for_deduplication_hash,
@@ -24,6 +25,7 @@
2425
Test,
2526
Test_Import,
2627
)
28+
from dojo.product.helpers import apply_inherited_tags_for_findings
2729
from dojo.tag_utils import bulk_apply_parser_tags
2830
from dojo.utils import perform_product_grading
2931
from dojo.validators import clean_tags
@@ -263,6 +265,19 @@ def process_findings(
263265
the finding may be appended to a new or existing group based upon user selection
264266
at import time
265267
"""
268+
# Whole hot loop runs under `batch_mode()`: per-row inheritance signals
269+
# for the findings/endpoints/locations created below are suppressed.
270+
# Inheritance is then applied in bulk per-batch (right before
271+
# `post_process_findings_batch` dispatch) so rules/dedup see inherited
272+
# tags on `finding.tags`.
273+
with tag_inheritance.batch_mode():
274+
return self._process_findings_inner(parsed_findings, **kwargs)
275+
276+
def _process_findings_inner(
277+
self,
278+
parsed_findings: list[Finding],
279+
**kwargs: dict,
280+
) -> tuple[list[Finding], list[Finding], list[Finding], list[Finding]]:
266281
self.deduplication_algorithm = self.determine_deduplication_algorithm()
267282
# Only process findings with the same service value (or None)
268283
# Even though the service values is used in the hash_code calculation,
@@ -422,6 +437,10 @@ def process_findings(
422437
findings_with_parser_tags.clear()
423438
# Apply import-time tags before post-processing so rules/deduplication see them.
424439
self.apply_import_tags_for_batch(batch_findings)
440+
# Apply inherited Product tags to this batch's findings (and
441+
# their endpoints/locations) BEFORE post_process_findings_batch
442+
# dispatches, so rules/dedup see inherited tags on .tags.
443+
apply_inherited_tags_for_findings(batch_findings)
425444
batch_findings.clear()
426445
finding_ids_batch = list(batch_finding_ids)
427446
batch_finding_ids.clear()

dojo/importers/endpoint_manager.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
Finding,
1515
Product,
1616
)
17-
from dojo.tags_signals import inherit_instance_tags
17+
from dojo.product.helpers import apply_inherited_tags_for_endpoints
1818

1919
logger = logging.getLogger(__name__)
2020

@@ -231,10 +231,16 @@ def get_or_create_endpoints(self) -> tuple[dict[EndpointUniqueKey, Endpoint], li
231231
if to_create:
232232
created = Endpoint.objects.bulk_create(to_create, batch_size=1000)
233233
endpoints_by_key.update(zip(to_create_keys, created, strict=True))
234-
# bulk_create bypasses post_save signals, so manually trigger tag inheritance
235-
# this is not ideal, but we need to take a separate look at the tag inheritance feature itself later
236-
for ep in created:
237-
inherit_instance_tags(ep)
234+
# bulk_create bypasses post_save so per-row inheritance signals never
235+
# fire here. The importer hot path already covers these endpoints via
236+
# the per-batch `apply_inherited_tags_for_findings` sweep (it picks
237+
# them up through `Endpoint.status_finding.finding`), so this call is
238+
# redundant for the importer. We keep a bulk call anyway as a defensive
239+
# measure: if anything outside the importer ever bulk-creates endpoints
240+
# through this manager, they still receive their inherited Product tags
241+
# instead of silently missing them. The bulk helper costs ~2 queries
242+
# when there's nothing to apply, vs N per-row signal fires.
243+
apply_inherited_tags_for_endpoints(created)
238244

239245
self._endpoints_to_create.clear()
240246
return endpoints_by_key, created

dojo/location/models.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,40 @@ def all_related_products(self) -> QuerySet[Product]:
231231
| Q(engagement__test__finding__locations__location=self),
232232
).distinct()
233233

234+
def iter_related_products(self) -> list[Product]:
235+
"""
236+
Prefetch-friendly equivalent of `all_related_products()`.
237+
238+
Walks `self.products.all()` (LocationProductReference) and
239+
`self.findings.all()` (LocationFindingReference -> Finding -> Test ->
240+
Engagement -> Product) via Django related managers, so a caller that
241+
already issued
242+
243+
Location.objects.filter(...).prefetch_related(
244+
"products__product__tags",
245+
"findings__finding__test__engagement__product__tags",
246+
)
247+
248+
gets every Product (and its tags) in 0 extra queries per Location.
249+
250+
Use this method from bulk paths where many Locations are processed at
251+
once. The original `all_related_products()` still issues a single
252+
DISTINCT JOIN query and is kept for per-instance signal paths where
253+
prefetching is not possible.
254+
"""
255+
seen: set[int] = set()
256+
result: list[Product] = []
257+
for ref in self.products.all():
258+
if ref.product_id not in seen:
259+
seen.add(ref.product_id)
260+
result.append(ref.product)
261+
for ref in self.findings.all():
262+
product = ref.finding.test.engagement.product
263+
if product.id not in seen:
264+
seen.add(product.id)
265+
result.append(product)
266+
return result
267+
234268
def products_to_inherit_tags_from(self) -> list[Product]:
235269
from dojo.utils import get_system_setting # noqa: PLC0415
236270
system_wide_inherit = get_system_setting("enable_product_tag_inheritance")

dojo/product/helpers.py

Lines changed: 100 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
from dojo.location.models import Location
1010
from dojo.models import Endpoint, Engagement, Finding, Product, Test
1111
from dojo.tag_utils import bulk_add_tag_mapping, bulk_remove_tags_from_instances
12+
from dojo.utils import get_system_setting
1213

1314
logger = logging.getLogger(__name__)
1415

@@ -31,52 +32,142 @@ def propagate_tags_on_product_sync(product):
3132
Product's current tags, and applies adds/removes via the bulk tag
3233
helpers. Both `tags` and `inherited_tags` fields are kept in sync.
3334
"""
34-
target_names = {tag.name for tag in product.tags.all()}
35+
# Skip the full child sweep when inheritance is disabled both system-wide
36+
# and on this product. Without this gate the importer hot path pays ~9
37+
# queries per scan (one product-tags read + one list/through-table read per
38+
# child kind) even when no inheritance work is possible. State transitions
39+
# (toggling the flag on/off) still trigger a full sweep via the m2m_changed
40+
# handler on `Product.tags.through` and the per-product flag save handler.
41+
if not (product.enable_product_tag_inheritance or get_system_setting("enable_product_tag_inheritance")):
42+
return
43+
inherited_tag_names = {tag.name for tag in product.tags.all()}
3544

3645
logger.debug("Propagating tags from %s to all engagements", product)
3746
_sync_inheritance_for_qs(
3847
Engagement.objects.filter(product=product),
39-
target_names_per_child=lambda _child: target_names,
48+
target_names_per_child=lambda _child: inherited_tag_names,
4049
)
4150
logger.debug("Propagating tags from %s to all tests", product)
4251
_sync_inheritance_for_qs(
4352
Test.objects.filter(engagement__product=product),
44-
target_names_per_child=lambda _child: target_names,
53+
target_names_per_child=lambda _child: inherited_tag_names,
4554
)
4655
logger.debug("Propagating tags from %s to all findings", product)
4756
_sync_inheritance_for_qs(
4857
Finding.objects.filter(test__engagement__product=product),
49-
target_names_per_child=lambda _child: target_names,
58+
target_names_per_child=lambda _child: inherited_tag_names,
5059
)
5160
if settings.V3_FEATURE_LOCATIONS:
5261
logger.debug("Propagating tags from %s to all locations", product)
5362
location_qs = Location.objects.filter(
5463
Q(products__product=product)
5564
| Q(findings__finding__test__engagement__product=product),
56-
).distinct()
65+
).distinct().prefetch_related(*_LOCATION_PREFETCH_FOR_INHERITANCE)
5766
# Locations can be linked to multiple products, so the inherited target
5867
# is the union of every related product's tags. Compute per-location.
5968
_sync_inheritance_for_qs(
6069
location_qs,
61-
target_names_per_child=_location_target_names,
70+
target_names_per_child=_inherited_tag_names_for_location,
6271
)
6372
else:
6473
logger.debug("Propagating tags from %s to all endpoints", product)
6574
_sync_inheritance_for_qs(
6675
Endpoint.objects.filter(product=product),
67-
target_names_per_child=lambda _child: target_names,
76+
target_names_per_child=lambda _child: inherited_tag_names,
6877
)
6978

7079

71-
def _location_target_names(location):
80+
def apply_inherited_tags_for_endpoints(endpoints):
81+
"""
82+
Bulk inheritance for a list of Endpoints, e.g. those just created via
83+
`Endpoint.objects.bulk_create` (which bypasses post_save signals).
84+
85+
All endpoints are assumed to share a single Product — true for the
86+
importer's `EndpointManager`, which is per-product. If callers ever
87+
mix products, split the list before calling.
88+
"""
89+
if not endpoints:
90+
return
91+
product = endpoints[0].product
92+
if not (product.enable_product_tag_inheritance or get_system_setting("enable_product_tag_inheritance")):
93+
return
94+
inherited_tag_names = {tag.name for tag in product.tags.all()}
95+
_sync_inheritance_for_qs(
96+
Endpoint.objects.filter(id__in=[e.pk for e in endpoints]),
97+
target_names_per_child=lambda _child: inherited_tag_names,
98+
)
99+
100+
101+
def apply_inherited_tags_for_findings(findings):
102+
"""
103+
Per-batch bulk inheritance for findings created during an import.
104+
105+
Apply the owning Product's inherited tags to the given findings plus the
106+
Endpoints (V2) / Locations (V3) reachable from them. Called from the
107+
importer hot path right before each batch dispatches to
108+
`post_process_findings_batch` so rules / deduplication see inherited tags
109+
on `finding.tags`.
110+
111+
Test and Engagement inheritance is handled by their own post_save handlers
112+
(those run outside the importer's `batch_mode()`, so per-instance signal
113+
work fires normally and applies inheritance on create).
114+
"""
115+
if not findings:
116+
return
117+
# Single-product invariant inside one importer call. Smart upload calls
118+
# this per-product so the assumption holds there too.
119+
product = findings[0].test.engagement.product
120+
if not (product.enable_product_tag_inheritance or get_system_setting("enable_product_tag_inheritance")):
121+
return
122+
inherited_tag_names = {tag.name for tag in product.tags.all()}
123+
finding_ids = [f.pk for f in findings]
124+
125+
_sync_inheritance_for_qs(
126+
Finding.objects.filter(id__in=finding_ids),
127+
target_names_per_child=lambda _child: inherited_tag_names,
128+
)
129+
if settings.V3_FEATURE_LOCATIONS:
130+
_sync_inheritance_for_qs(
131+
Location.objects.filter(findings__finding_id__in=finding_ids).distinct().prefetch_related(*_LOCATION_PREFETCH_FOR_INHERITANCE),
132+
target_names_per_child=_inherited_tag_names_for_location,
133+
)
134+
else:
135+
_sync_inheritance_for_qs(
136+
Endpoint.objects.filter(status_endpoint__finding_id__in=finding_ids).distinct(),
137+
target_names_per_child=lambda _child: inherited_tag_names,
138+
)
139+
140+
141+
def _inherited_tag_names_for_location(location):
142+
"""
143+
Compute the tag-name set this Location should have as `inherited_tags`.
144+
145+
Unlike Finding / Test / Engagement / Endpoint (each owned by exactly one
146+
Product), a Location can be attached to multiple Products — directly via
147+
`LocationProductReference` or indirectly via `LocationFindingReference`
148+
-> Finding -> Test -> Engagement -> Product. The target inherited set is
149+
therefore the UNION of every related Product's tags.
150+
151+
Used as the `target_names_per_child` callback for `_sync_inheritance_for_qs`
152+
on Location querysets; it must be called per Location because each Location
153+
has its own set of related Products. Uses `iter_related_products()` so
154+
that an upstream `prefetch_related(...)` reduces per-call cost to 0
155+
queries.
156+
"""
72157
names: set[str] = set()
73-
for related_product in location.all_related_products():
158+
for related_product in location.iter_related_products():
74159
if related_product is None:
75160
continue
76161
names.update(tag.name for tag in related_product.tags.all())
77162
return names
78163

79164

165+
_LOCATION_PREFETCH_FOR_INHERITANCE = (
166+
"products__product__tags",
167+
"findings__finding__test__engagement__product__tags",
168+
)
169+
170+
80171
def _sync_inheritance_for_qs(queryset, *, target_names_per_child):
81172
"""
82173
Sync inherited_tags + tags for every child in `queryset` to its target tag set.

dojo/tags_signals.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@ def make_inherited_tags_sticky(sender, instance, action, **kwargs):
4848

4949
def inherit_instance_tags(instance):
5050
"""Usually nothing to do when saving a model, except for new models?"""
51+
# Suppress per-instance inheritance work inside an active batch. The
52+
# caller (signal handler or bulk_create cleanup) need not know about
53+
# batch_mode; whoever opened the batch is responsible for the bulk
54+
# apply at exit.
55+
if tag_inheritance.is_in_batch_mode():
56+
return
5157
if inherit_product_tags(instance):
5258
# TODO: Is this change OK to make?
5359
# tag_list = instance._tags_tagulous.get_tag_list()
@@ -70,6 +76,7 @@ def inherit_tags_on_instance(sender, instance, created, **kwargs):
7076
# (create OR update), repeatedly re-applying inherited tags to children
7177
# whose tag state had not changed. Sticky enforcement on user-driven
7278
# tag edits is handled by `make_inherited_tags_sticky` (m2m_changed).
79+
# `inherit_instance_tags` itself early-returns when a batch is active.
7380
if not created:
7481
return
7582
inherit_instance_tags(instance)

unittests/test_tag_inheritance_perf.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -379,13 +379,14 @@ def test_baseline_product_tag_remove_propagates_to_100_locations_v3(self):
379379
EXPECTED_PRODUCT_TAG_REMOVE_100_ENDPOINTS = 53
380380

381381
# V3 location paths. Pre-Phase-A: 4532 add, 4307 remove.
382-
EXPECTED_PRODUCT_TAG_ADD_100_LOCATIONS = 316
383-
EXPECTED_PRODUCT_TAG_REMOVE_100_LOCATIONS = 266
382+
EXPECTED_PRODUCT_TAG_ADD_100_LOCATIONS = 125
383+
EXPECTED_PRODUCT_TAG_REMOVE_100_LOCATIONS = 75
384384

385385

386386
@override_settings(
387387
CELERY_TASK_ALWAYS_EAGER=True,
388388
CELERY_TASK_EAGER_PROPAGATES=True,
389+
SECURE_SSL_REDIRECT=False,
389390
)
390391
@versioned_fixtures
391392
class TagInheritanceImportPerfBaselines(DojoAPITestCase):
@@ -497,7 +498,7 @@ def test_baseline_zap_scan_reimport_no_change_v3(self):
497498
# import path because the previous process-global signal-disconnect was
498499
# narrower in scope (Location.tags.through only). Net-positive trade for
499500
# eliminating the threading bug; full Phase B reductions land in Stage 2.
500-
EXPECTED_ZAP_IMPORT_V2 = 1385
501-
EXPECTED_ZAP_IMPORT_V3 = 1263
502-
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V2 = 69
503-
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3 = 87
501+
EXPECTED_ZAP_IMPORT_V2 = 477
502+
EXPECTED_ZAP_IMPORT_V3 = 945
503+
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V2 = 75
504+
EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3 = 102

0 commit comments

Comments
 (0)