Skip to content

Commit 968cc5d

Browse files
perf(tag inheritance): batch_mode + per-batch bulk during import + reorganize (#14877)
* perf(tags): batch_mode + per-batch bulk inheritance during import (Phase B Stage 2) Wraps the import / reimport hot loop in `tag_inheritance.batch_mode()` and bulk-applies inherited Product tags per batch *before* `post_process_findings_batch` dispatches, so rules / deduplication see inherited tags on `finding.tags`. Changes: - `process_findings` (DefaultImporter + DefaultReImporter) now runs its finding-creation loop inside `batch_mode()`. Per batch, right after `apply_import_tags_for_batch`, calls a new helper `apply_inherited_tags_for_findings(batch_findings)` that bulk-syncs inherited tags on the batch's Findings plus the Endpoints (V2) / Locations (V3) reachable from them via FK chain. Inheritance is therefore applied to the persisted children before the post-process task dispatches. - `inherit_instance_tags` in `dojo/tags_signals.py` now early-returns when `tag_inheritance.is_in_batch_mode()`, so the batch wrap transparently suppresses per-row inheritance work for any caller — including `bulk_create` cleanup paths that invoke it manually. `inherit_tags_on_instance` post_save delegates to that helper, so the gate also covers signal-driven fires. - `EndpointManager.get_or_create_endpoints` replaces its per-row `inherit_instance_tags(ep)` cleanup loop with a single `apply_inherited_tags_for_endpoints(created)` bulk call. Inside the importer the per-batch helper already covers these endpoints via `Endpoint.status_endpoint.finding`; the bulk call is kept as a defensive hook for any non-importer caller. - `propagate_tags_on_product_sync` (used by the product-tag-toggle Celery task) gains an early-exit when neither system-wide nor per-product inheritance is enabled, eliminating ~9 wasted reads per call on inheritance-off products. State transitions (toggling either flag) still trigger a full sweep through their existing signal handlers. - `Location` gains `iter_related_products()`: a related-manager (`self.products` + `self.findings`) implementation of `all_related_products()` that returns `list[Product]`. Callers that pre-issue `prefetch_related("products__product__tags", "findings__finding__test__engagement__product__tags")` get zero extra queries per Location. The existing JOIN'd `all_related_products()` is kept for the per-instance signal path where prefetching is not possible. - `_inherited_tag_names_for_location` (the per-Location callback used to compute the inherited target set) switches to `iter_related_products()`; both call sites (`propagate_tags_on_product_sync` V3 branch and `apply_inherited_tags_for_findings` V3 branch) now prefetch the chain. Query-count impact on `unittests/test_tag_inheritance_perf.py` (pins updated in this commit): | Hot path | Before | After | Δ | |-----------------------------------------|--------:|-------:|------:| | ZAP scan import V2 (19 findings) | 1385 | 477 | -908 | | ZAP scan import V3 | 1263 | 945 | -318 | | ZAP reimport no-change V2 | 69 | 75 | +6 | | ZAP reimport no-change V3 | 87 | 102 | +15 | | Product tag add → 100 locations (V3) | 316 | 125 | -191 | | Product tag remove → 100 locations (V3) | 266 | 75 | -191 | Small reimport-no-change regressions are the unavoidable per-batch helper read cost (2 reads × Finding + 2 reads × Endpoint/Location + 1 product tags read). Real-work imports drop significantly because per-row `_manage_inherited_tags` work no longer fires inside the loop. * refactor(tags): centralize inheritance helpers in dojo/tag_inheritance Move scattered tag-inheritance logic into dojo/tag_inheritance.py without changing runtime behavior: - _manage_inherited_tags relocated from dojo/models.py (replaced by a lazy-import shim to avoid an import cycle through dojo.utils). - get_products, inherit_product_tags, get_products_to_inherit_tags_from, propagate_inheritance, inherit_instance_tags, inherit_linked_instance_tags relocated from dojo/tags_signals.py; receivers there now delegate. - propagate_tags_on_product_sync, apply_inherited_tags_for_endpoints, apply_inherited_tags_for_findings, _sync_inheritance_for_qs and _inherited_tag_names_for_location relocated from dojo/product/helpers.py; the Celery wrapper propagate_tags_on_product stays in product/helpers.py. Backward-compat re-exports preserved at every original import site so external callers (dojo/location, dojo/importers, unittests, pro/) keep working unchanged. Adjusts unittests.test_tag_inheritance mock targets to the new module path. Bumps EXPECTED_ZAP_IMPORT_V2/V3 baselines (477->470, 945->938) to the actual query counts; the prior pin was already drifting on this branch before the move. * refactor(tags): group tag modules under dojo/tags/ Move the three tag-related top-level modules into a dedicated package: dojo/tag_inheritance.py -> dojo/tags/inheritance.py dojo/tag_utils.py -> dojo/tags/utils.py dojo/tags_signals.py -> dojo/tags/signals.py Update all internal call sites to the new canonical paths (apps, models, importers, finding, product/helpers, utils_cascade_delete, settings, unittests). No backward-compat shims left at the old paths. Also drops the unused dojo/tag/ stub directory that only contained stale .pyc files. No logic change. All tag inheritance and tag bulk unit tests pass. * rename batch_mode to suppressed * perf(tags): check cached system_setting before per-product flag Tag inheritance is usually enabled system-wide first, then optionally overridden per product. The system setting is cached, so check it first and short-circuit the related-product walk / per-product attribute read when it's already True. Sites updated: - dojo/tags/inheritance.py: inherit_product_tags, get_products_to_inherit_tags_from, and the three early-return guards inside propagate_tags_on_product_sync / apply_inherited_tags_for_*. - dojo/location/models.py: Location.products_to_inherit_tags_from. - dojo/importers/location_manager.py: bulk-inherit early-return. Also picks up the in-flight rename of batch_mode() -> suppress() / is_in_batch_mode() -> is_suppressed() in the two importers. Behavior unchanged. All tag inheritance / perf / bulk unit tests pass (36 + 20 + 29). * refactor(tags): simplify is_tag_inheritance_enabled, drop linked-instance wrapper - is_tag_inheritance_enabled now delegates to get_products_to_inherit_tags_from (no products eligible -> not enabled). Centralizes the "which products contribute inherited tags" decision in one place; the None-product filter moves into get_products_to_inherit_tags_from so both call sites get it. - Drop the inherit_linked_instance_tags wrapper: single signal caller now inlines tag_inheritance.inherit_instance_tags(instance.location). - Trim dead backward-compat re-exports from dojo/tags/signals.py; reroute the test imports to dojo.tags.inheritance directly. Behavior unchanged. test_tag_inheritance (36) and test_tag_inheritance_perf (20) pass. * refactor(tags): merge propagate_inheritance into inherit_instance_tags Combine the two-step gate (propagate_inheritance returning bool, then caller writing) into a single inherit_instance_tags() with an optional force=False kwarg that bypasses the suppress() check. The m2m make_inherited_tags_sticky receiver now also routes through this single entry point. Wrap the inner instance.inherit_tags(tag_list) call in suppress_tag_inheritance() to short-circuit m2m_changed re-entry. The re-entrant signal previously did a redundant in-sync check using a stale get_tag_list() cached value; suppressing it both eliminates the recursion risk and removes per-row redundant queries: - test_baseline_zap_scan_import_v2: 470 -> 463 - test_baseline_zap_scan_import_v3: 938 -> 931 - test_create_one_finding_v2/v3: 64 -> 61 - test_create_100_findings_v2/v3: 4024 -> 3724 - test_finding_add_user_tag_v2/v3: 17 -> 16 - test_finding_remove_inherited_v2/v3: 44 -> 40 Also picks up the rename of the context manager suppress() -> suppress_tag_inheritance(). Unit tests for the early-exit optimization rewritten against the merged function (still 4 cases). * refactor(tags): diff-based _sync_inherited_tags, drop per-model wrappers Rewrite the per-row inheritance primitive as a pure diff: current_inherited = obj.inherited_tags.all() target = incoming_inherited_tags remove (current - target), add (target - current) + sticky re-add any target name missing from obj.tags Drops the "rebuild full tag_list, set() everything" approach and removes the `existing_tags_hint` parameter entirely. Writes are wrapped in `suppress_tag_inheritance()` so the m2m_changed signal can't dispatch make_inherited_tags_sticky back into this function. Other cleanup: - Rename `_manage_inherited_tags` -> `_sync_inherited_tags`. - Drop per-model `inherit_tags(self, existing_tags_hint)` methods on Endpoint / Engagement / Test / Finding / Location. The signal path (`inherit_instance_tags`) and `LocationManager._bulk_inherit_tags` now call `_sync_inherited_tags(instance, incoming)` directly. - Drop the unused `existing_tags_by_location` dict in `LocationManager._bulk_inherit_tags` (one fewer through-table read). - Unit tests rewritten against the diff primitive (4 cases) plus a no-products early-exit test for `inherit_instance_tags`. Perf baselines (test_tag_inheritance_perf): EXPECTED_ZAP_IMPORT_V2: 463 -> 422 EXPECTED_ZAP_IMPORT_V3: 931 -> 809 EXPECTED_ZAP_REIMPORT_NO_CHANGE_V3:102 -> 101 EXPECTED_CREATE_ONE_FINDING_V2/V3: 61 -> 55 EXPECTED_CREATE_100_FINDINGS_V2/V3: 3724 -> 3124 (-600) EXPECTED_FINDING_REMOVE_INHERITED_V2/V3: 40 -> 18 EXPECTED_FINDING_ADD_USER_TAG_V2/V3: 16 -> 17 (+1) * add comments * refactor(tags): consolidate location inheritance, drop dojo/product/helpers.py - New `apply_inherited_tags_for_locations(locations, *, product)` mirrors the endpoint/finding helpers. Replaces `LocationManager._bulk_inherit_tags`; callsite delegates. Drops the redundant outer `suppress_tag_inheritance()` (writes inside `_sync_inherited_tags` are already wrapped) and the bulk pre-fetch of existing inherited tags (the through-table primitive does this read once already). - `_inherited_tag_names_for_location` now filters contributing products by their own `enable_product_tag_inheritance` flag. Previously tags from flag-off products linked to the same Location could leak in via `propagate_tags_on_product_sync` / `apply_inherited_tags_for_findings`. - Move `propagate_tags_on_product` Celery task into `dojo.tags.inheritance` alongside its `_sync` counterpart; delete `dojo/product/helpers.py`. Keep a `propagate_tags_on_product_deprecated` alias under the old task name so in-flight tasks complete after upgrade. - Rename `_sync_inheritance_for_qs` kwarg `target_names_per_child` -> `target_tag_names_per_child` for clarity. - Update import sites + perf baseline V3 import: 809 -> 445 queries. * comments * comments * refactor(tags): rename inherit_instance_tags -> auto_inherit_product_tags, move to signals Renames `inherit_instance_tags` to `auto_inherit_product_tags` and relocates it from `dojo.tags.inheritance` to `dojo.tags.signals`. The function is the signal-driven entrypoint that applies a Product's tags to a child instance; moving it next to the signal receivers (and renaming for clarity) makes the auto-inheritance flow self-contained. Of all the inheritance helpers it is the only one that early-returns when `suppress_tag_inheritance()` is active — the other helpers use the context manager around their writes for reentrancy. That gate semantic is exclusive to the signal path, so colocation fits. Also combines `inherit_tags_on_linked_instance` into `inherit_tags_on_instance` with sender-based target dispatch and a `created=True` gate. Previously the ref handler fired on every save, including `set_status` updates that don't change the Location's related-product set; those now correctly no-op. Drops the unused `force=` parameter on the function while renaming. * make tag accumulator mandatory param * resolve duplicate task name * perf(tags): pk-based _sync_inheritance_for_ids; skip full-row fetch Replaces `_sync_inheritance_for_qs(queryset, target_tag_names_per_child=callable)` with `_sync_inheritance_for_ids(model_class, child_ids, target_tag_names)`. The previous implementation called `list(queryset)` to materialize every child as a full model instance just so the bulk helpers (`bulk_add_tag_mapping`, `bulk_remove_tags_from_instances`) had an `instance.pk` and `instance.__class__` to work with. For Finding (70+ columns) this dominated wall-clock time on big products — a real-world product with ~14000 findings took ~22s for a single `propagate_tags_on_product_sync`. The new path: - Accepts an iterable of pks; constant-target callers pass `values_list("pk", flat=True)` directly, skipping all ORM hydration. - Builds bare `model_class(pk=pid)` stubs (cached per pk) only for the rows whose inherited set actually needs to change, not for every row scanned. - Accepts `target_tag_names` as either a `set[str]` (constant target, hoisted out of the loop — no per-row function call for the product → engagement/test/finding/endpoint propagation paths) or a `Callable[[int], set[str]]` for the Location case, where each row's target is the per-row union of its linked Products' tags. - For Locations, callers materialize the prefetched instances into a `{pk: location}` dict and close over it in the callback — the prefetch chain still runs once upfront, but the primitive itself only sees pks. - Adds an `add_map`/`remove_map` skip when `target == old` so rows already in sync don't even allocate a stub. Also adds direct query-count baselines for `propagate_tags_on_product_sync` in `test_tag_inheritance_perf.py` so future regressions on the sweep path fail loudly (V2 = 9 queries, V3 = 18 queries on a product with 100 findings plus 100 endpoints or locations). ZAP baselines drop slightly as a side effect of the early `target == old` skip (V2 import 422 → 420, V3 import 445 → 444, V2 reimport-no-change 75 → 74, V3 reimport-no-change 101 → 100). The major wall-clock win is invisible to query counters — it's the avoided Finding ORM hydration on large products. * rename * perf(tags): skip redundant inheritance on no-change reimport Two gates eliminate ~14 wasted queries on a reimport that creates no new findings or location refs (the common "scheduled rescan, nothing changed" path): 1. `DefaultReImporter.process_findings` now tracks newly-created findings in `new_findings_in_batch` (populated in the else branch of the matched/unmatched dispatch) and passes ONLY those to `apply_inherited_tags_for_findings`. Matched/existing findings already had inheritance applied at their original creation, so re-running the through-table read + Location prefetch chain on them is pure overhead. 2. `LocationManager._persist_locations` now skips `apply_inherited_tags_for_locations` when no new `LocationProductReference` rows were created. New `LocationFindingReference`s only add findings within `self._product`, so they can't change a Location's Product membership; only a new product ref can. When `all_product_refs` is empty, the Location's inherited target set is unchanged and the helper would do a costly no-op read for nothing. Net effect on the pinned ZAP reimport-no-change baselines: - V2: 75 → 69 (matches the pre-Phase-A baseline of 69) - V3: 101 → 81 (beats the pre-Phase-A baseline of 87) * test(tags): add reimport-with-new-findings perf baseline Imports the 10-finding ZAP subset first, then reimports the 19-finding full report so 9 findings are created during reimport while 10 are matched. Pins V2 = 169 queries, V3 = 198 queries. Captures the realistic "scheduled rescan with drift" path that the no-change baseline doesn't exercise.
1 parent 80974b3 commit 968cc5d

22 files changed

Lines changed: 942 additions & 591 deletions

dojo/apps.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def ready(self):
9494
import dojo.product_type.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
9595
import dojo.risk_acceptance.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
9696
import dojo.sla_config.helpers # noqa: PLC0415, F401 raised: AppRegistryNotReady
97-
import dojo.tags_signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
97+
import dojo.tags.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
9898
import dojo.test.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
9999
import dojo.tool_product.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady
100100
import dojo.url.signals # noqa: PLC0415, F401 raised: AppRegistryNotReady

dojo/finding/helper.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -742,7 +742,7 @@ def bulk_clear_finding_m2m(finding_qs):
742742
FileUpload.delete() fires and removes files from disk storage.
743743
Tags are handled via bulk_remove_all_tags to maintain tag counts.
744744
"""
745-
from dojo.tag_utils import bulk_remove_all_tags # noqa: PLC0415 circular import
745+
from dojo.tags.utils import bulk_remove_all_tags # noqa: PLC0415 circular import
746746

747747
finding_ids = finding_qs.values_list("id", flat=True)
748748

dojo/finding/views.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@
9494
User,
9595
)
9696
from dojo.notifications.helper import create_notification
97-
from dojo.tag_utils import bulk_add_tags_to_instances
97+
from dojo.tags.utils import bulk_add_tags_to_instances
9898
from dojo.test.queries import get_authorized_tests
9999
from dojo.tools import tool_issue_updater
100100
from dojo.utils import (

dojo/importers/base_importer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
Test_Type,
3434
)
3535
from dojo.notifications.helper import create_notification
36-
from dojo.tag_utils import bulk_add_tags_to_instances
36+
from dojo.tags.utils import bulk_add_tags_to_instances
3737
from dojo.tools.factory import get_parser
3838
from dojo.tools.parser_test import ParserTest
3939
from dojo.utils import max_safe

dojo/importers/default_importer.py

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@
1818
Test_Import,
1919
)
2020
from dojo.notifications.helper import async_create_notification
21-
from dojo.tag_utils import bulk_apply_parser_tags
21+
from dojo.tags import inheritance as tag_inheritance
22+
from dojo.tags.inheritance import apply_inherited_tags_for_findings
23+
from dojo.tags.utils import bulk_apply_parser_tags
2224
from dojo.utils import get_full_url, perform_product_grading
2325
from dojo.validators import clean_tags
2426

@@ -161,6 +163,19 @@ def process_findings(
161163
self,
162164
parsed_findings: list[Finding],
163165
**kwargs: dict,
166+
) -> list[Finding]:
167+
# Whole hot loop runs under `batch_mode()`: per-row inheritance signals
168+
# for the findings/endpoints/locations created below are suppressed.
169+
# Inheritance is then applied in bulk per-batch (right before
170+
# `post_process_findings_batch` dispatch) so rules/dedup see inherited
171+
# tags on `finding.tags`.
172+
with tag_inheritance.suppress_tag_inheritance():
173+
return self._process_findings_internal(parsed_findings, **kwargs)
174+
175+
def _process_findings_internal(
176+
self,
177+
parsed_findings: list[Finding],
178+
**kwargs: dict,
164179
) -> list[Finding]:
165180
# Batched post-processing (no chord): dispatch a task per 1000 findings or on final finding
166181
batch_finding_ids: list[int] = []
@@ -266,6 +281,10 @@ def process_findings(
266281
findings_with_parser_tags.clear()
267282
# Apply import-time tags before post-processing so rules/deduplication see them.
268283
self.apply_import_tags_for_batch(batch_findings)
284+
# Apply inherited Product tags to this batch's findings (and
285+
# their endpoints/locations) BEFORE post_process_findings_batch
286+
# dispatches, so rules/dedup see inherited tags on .tags.
287+
apply_inherited_tags_for_findings(batch_findings)
269288
batch_findings.clear()
270289
finding_ids_batch = list(batch_finding_ids)
271290
batch_finding_ids.clear()

dojo/importers/default_reimporter.py

Lines changed: 35 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@
2424
Test,
2525
Test_Import,
2626
)
27-
from dojo.tag_utils import bulk_apply_parser_tags
27+
from dojo.tags import inheritance as tag_inheritance
28+
from dojo.tags.inheritance import apply_inherited_tags_for_findings
29+
from dojo.tags.utils import bulk_apply_parser_tags
2830
from dojo.utils import perform_product_grading
2931
from dojo.validators import clean_tags
3032

@@ -263,6 +265,19 @@ def process_findings(
263265
the finding may be appended to a new or existing group based upon user selection
264266
at import time
265267
"""
268+
# Whole hot loop runs under `batch_mode()`: per-row inheritance signals
269+
# for the findings/endpoints/locations created below are suppressed.
270+
# Inheritance is then applied in bulk per-batch (right before
271+
# `post_process_findings_batch` dispatch) so rules/dedup see inherited
272+
# tags on `finding.tags`.
273+
with tag_inheritance.suppress_tag_inheritance():
274+
return self._process_findings_internal(parsed_findings, **kwargs)
275+
276+
def _process_findings_internal(
277+
self,
278+
parsed_findings: list[Finding],
279+
**kwargs: dict,
280+
) -> tuple[list[Finding], list[Finding], list[Finding], list[Finding]]:
266281
self.deduplication_algorithm = self.determine_deduplication_algorithm()
267282
# Only process findings with the same service value (or None)
268283
# Even though the service values is used in the hash_code calculation,
@@ -302,6 +317,11 @@ def process_findings(
302317

303318
batch_finding_ids: list[int] = []
304319
batch_findings: list[Finding] = []
320+
# Findings that were newly created (else branch below) — pass these to
321+
# `apply_inherited_tags_for_findings` instead of `batch_findings` so
322+
# matched/existing findings (which already have correct inherited tags)
323+
# don't trigger a redundant through-table read on no-change reimports.
324+
new_findings_in_batch: list[Finding] = []
305325
findings_with_parser_tags: list[tuple] = []
306326
# Batch size for deduplication/post-processing (only new findings)
307327
dedupe_batch_max_size = getattr(settings, "IMPORT_REIMPORT_DEDUPE_BATCH_SIZE", 1000)
@@ -384,6 +404,8 @@ def process_findings(
384404
candidates_by_uid,
385405
candidates_by_key,
386406
)
407+
if finding:
408+
new_findings_in_batch.append(finding)
387409

388410
# This condition __appears__ to always be true, but am afraid to remove it
389411
if finding:
@@ -422,6 +444,14 @@ def process_findings(
422444
findings_with_parser_tags.clear()
423445
# Apply import-time tags before post-processing so rules/deduplication see them.
424446
self.apply_import_tags_for_batch(batch_findings)
447+
# Apply inherited Product tags to NEWLY CREATED findings only
448+
# (and their endpoints/locations) BEFORE post_process_findings_batch
449+
# dispatches, so rules/dedup see inherited tags on .tags.
450+
# Matched/existing findings already have inheritance applied from
451+
# their original creation; re-running it on no-change reimports
452+
# would be ~8 wasted queries per batch.
453+
apply_inherited_tags_for_findings(new_findings_in_batch)
454+
new_findings_in_batch.clear()
425455
batch_findings.clear()
426456
finding_ids_batch = list(batch_finding_ids)
427457
batch_finding_ids.clear()
@@ -949,7 +979,7 @@ def finding_post_processing(
949979
finding_from_report: Finding,
950980
*,
951981
is_matched_finding: bool = False,
952-
tag_accumulator: list | None = None,
982+
tag_accumulator: list,
953983
) -> Finding:
954984
"""
955985
Save all associated objects to the finding after it has been saved
@@ -971,15 +1001,10 @@ def finding_post_processing(
9711001
finding_from_report.unsaved_tags = merged_tags
9721002
if finding_from_report.unsaved_tags:
9731003
cleaned_tags = clean_tags(finding_from_report.unsaved_tags)
974-
if tag_accumulator is not None:
975-
if isinstance(cleaned_tags, list):
976-
tag_accumulator.append((finding, cleaned_tags))
977-
elif isinstance(cleaned_tags, str):
978-
tag_accumulator.append((finding, [cleaned_tags]))
979-
elif isinstance(cleaned_tags, list):
980-
finding.tags.add(*cleaned_tags)
1004+
if isinstance(cleaned_tags, list):
1005+
tag_accumulator.append((finding, cleaned_tags))
9811006
elif isinstance(cleaned_tags, str):
982-
finding.tags.add(cleaned_tags)
1007+
tag_accumulator.append((finding, [cleaned_tags]))
9831008
# Process any files
9841009
if finding_from_report.unsaved_files:
9851010
finding.unsaved_files = finding_from_report.unsaved_files

dojo/importers/endpoint_manager.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
Finding,
1515
Product,
1616
)
17-
from dojo.tags_signals import inherit_instance_tags
17+
from dojo.tags.inheritance import apply_inherited_tags_for_endpoints
1818

1919
logger = logging.getLogger(__name__)
2020

@@ -231,10 +231,16 @@ def get_or_create_endpoints(self) -> tuple[dict[EndpointUniqueKey, Endpoint], li
231231
if to_create:
232232
created = Endpoint.objects.bulk_create(to_create, batch_size=1000)
233233
endpoints_by_key.update(zip(to_create_keys, created, strict=True))
234-
# bulk_create bypasses post_save signals, so manually trigger tag inheritance
235-
# this is not ideal, but we need to take a separate look at the tag inheritance feature itself later
236-
for ep in created:
237-
inherit_instance_tags(ep)
234+
# bulk_create bypasses post_save so per-row inheritance signals never
235+
# fire here. The importer hot path already covers these endpoints via
236+
# the per-batch `apply_inherited_tags_for_findings` sweep (it picks
237+
# them up through `Endpoint.status_finding.finding`), so this call is
238+
# redundant for the importer. We keep a bulk call anyway as a defensive
239+
# measure: if anything outside the importer ever bulk-creates endpoints
240+
# through this manager, they still receive their inherited Product tags
241+
# instead of silently missing them. The bulk helper costs ~2 queries
242+
# when there's nothing to apply, vs N per-row signal fires.
243+
apply_inherited_tags_for_endpoints(created)
238244

239245
self._endpoints_to_create.clear()
240246
return endpoints_by_key, created

dojo/importers/location_manager.py

Lines changed: 14 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,15 @@
99
from django.db import transaction
1010
from django.utils import timezone
1111

12-
from dojo import tag_inheritance
1312
from dojo.importers.base_location_manager import BaseLocationManager
1413
from dojo.location.models import AbstractLocation, Location, LocationFindingReference, LocationProductReference
1514
from dojo.location.status import FindingLocationStatus, ProductLocationStatus
16-
from dojo.models import Product, _manage_inherited_tags
15+
from dojo.tags import inheritance as tag_inheritance
1716
from dojo.tools.locations import LocationData
1817
from dojo.url.models import URL
19-
from dojo.utils import get_system_setting
2018

2119
if TYPE_CHECKING:
22-
from tagulous.models import TagField
23-
24-
from dojo.models import Dojo_User, Finding
20+
from dojo.models import Dojo_User, Finding, Product
2521

2622
logger = logging.getLogger(__name__)
2723

@@ -214,8 +210,18 @@ def _persist_locations(self) -> None:
214210
all_product_refs, batch_size=1000, ignore_conflicts=True,
215211
)
216212

217-
# Trigger bulk tag inheritance
218-
self._bulk_inherit_tags(loc.location for loc in saved)
213+
# Trigger bulk tag inheritance only when the Location's product
214+
# membership actually changed. New product refs are the only thing
215+
# that can add a Product to a Location's inherited-tags target set
216+
# (new finding refs are always to findings in `self._product`, so
217+
# they don't introduce a new Product); skipping when `all_product_refs`
218+
# is empty avoids the through-table read on no-change reimports.
219+
if all_product_refs:
220+
new_ref_location_ids = {ref.location_id for ref in all_product_refs}
221+
tag_inheritance.apply_inherited_tags_for_locations(
222+
[loc.location for loc in saved if loc.location_id in new_ref_location_ids],
223+
product=self._product,
224+
)
219225

220226
# Clear accumulators
221227
self._locations_by_finding.clear()
@@ -477,105 +483,3 @@ def type_id(x: tuple[int, AbstractLocation]) -> int:
477483
# Restore the original input ordering
478484
saved.sort(key=itemgetter(0))
479485
return [loc for _, loc in saved]
480-
481-
# ------------------------------------------------------------------
482-
# Tag inheritance
483-
# ------------------------------------------------------------------
484-
485-
def _bulk_inherit_tags(self, locations):
486-
"""
487-
Bulk equivalent of calling inherit_instance_tags(loc) for many Locations. Actually persisting updates is handled
488-
by a per-location call to _manage_inherited_tags(), but at least determining what the tags are is more efficient
489-
(plus we can skip locations that don't need an update at all).
490-
491-
When tag inheritance is enabled, computes the target inherited tags for each location from all related products
492-
and updates only locations that are out of sync.
493-
"""
494-
locations = list(locations)
495-
if not locations:
496-
return
497-
498-
# Check whether tag inheritance is enabled at either the product level or system-wide; quit early if neither
499-
product_inherit = getattr(self._product, "enable_product_tag_inheritance", False)
500-
system_wide_inherit = bool(get_system_setting("enable_product_tag_inheritance"))
501-
if not system_wide_inherit and not product_inherit:
502-
return
503-
504-
# A location can be shared across multiple products. Its inherited tags should be the union of
505-
# tags from ALL contributing products, not just the one running this import.
506-
location_ids = [loc.id for loc in locations]
507-
product_ids_by_location: dict[int, set[int]] = {loc.id: set() for loc in locations}
508-
509-
# Find associations through LocationProductReference entries
510-
for loc_id, prod_id in LocationProductReference.objects.filter(
511-
location_id__in=location_ids,
512-
).values_list("location_id", "product_id"):
513-
product_ids_by_location[loc_id].add(prod_id)
514-
515-
# Find associations through LocationFindingReference entries and the finding.test.engagement.product chain.
516-
# This shouldn't add anything new, but just in case.
517-
for loc_id, prod_id in (
518-
LocationFindingReference.objects
519-
.filter(location_id__in=location_ids)
520-
.values_list("location_id", "finding__test__engagement__product_id")
521-
):
522-
product_ids_by_location[loc_id].add(prod_id)
523-
524-
# Fetch all products that will contribute to tag inheritance, and their tags
525-
all_product_ids = {pid for pids in product_ids_by_location.values() for pid in pids}
526-
product_qs = Product.objects.filter(id__in=all_product_ids).prefetch_related("tags")
527-
if not system_wide_inherit:
528-
# Product-level inheritance only
529-
product_qs = product_qs.filter(enable_product_tag_inheritance=True)
530-
# Materialize into a dict for ease of use
531-
products: dict[int, Product] = {p.id: p for p in product_qs}
532-
# Get distinct tags, per-product
533-
tags_by_product: dict[int, set[str]] = {
534-
pid: {t.name for t in p.tags.all()}
535-
for pid, p in products.items()
536-
}
537-
538-
# Helper method for getting all tags from the given TagField
539-
def _get_tags(tags_field: TagField) -> dict[int, set[str]]:
540-
through_model = tags_field.through
541-
fk_name = tags_field.field.m2m_reverse_field_name()
542-
tags_by_location: dict[int, set[str]] = {loc.id: set() for loc in locations}
543-
for l_id, t_name in through_model.objects.filter(
544-
location_id__in=location_ids,
545-
).values_list("location_id", f"{fk_name}__name"):
546-
tags_by_location[l_id].add(t_name)
547-
return tags_by_location
548-
549-
# Gather inherited and 'regular' tags per location
550-
existing_inherited_by_location: dict[int, set[str]] = _get_tags(Location.inherited_tags)
551-
existing_tags_by_location: dict[int, set[str]] = _get_tags(Location.tags)
552-
553-
# Perform the bulk updates inside a `tag_inheritance.batch()` context.
554-
# While the batch is active, signal handlers in `dojo/tags_signals.py`
555-
# short-circuit per-row inheritance work that would otherwise fire on
556-
# every `(inherited_)tags.set()` and defeat the bulk update.
557-
#
558-
# This replaces a previous `signals.m2m_changed.disconnect(...)` /
559-
# `connect(...)` dance which was process-global and therefore unsafe
560-
# under threaded gunicorn / Celery thread pools / ASGI threadpools:
561-
# while disconnected, every thread in the process lost sticky
562-
# enforcement. Thread-local batch state avoids that hazard.
563-
with tag_inheritance.batch_mode():
564-
for location in locations:
565-
target_tag_names: set[str] = set()
566-
for pid in product_ids_by_location[location.id]:
567-
# product_ids_by_location may contain products that shouldn't to contribute to tag inheritance (we
568-
# didn't filter either location ref lookups to check), so do a last-minute check here
569-
if pid in products:
570-
target_tag_names |= tags_by_product[pid]
571-
572-
if target_tag_names == existing_inherited_by_location[location.id]:
573-
# The existing set matches the expected set, so nothing more to do for this location
574-
continue
575-
576-
# Update tags for this location
577-
_manage_inherited_tags(
578-
location,
579-
list(target_tag_names),
580-
potentially_existing_tags=existing_tags_by_location[location.id],
581-
)

0 commit comments

Comments
 (0)