Skip to content

Dispatch create-path notifications async to fix slow POST latency#14731

Open
Maffooch wants to merge 2 commits intobugfixfrom
fix/async-notification-dispatch
Open

Dispatch create-path notifications async to fix slow POST latency#14731
Maffooch wants to merge 2 commits intobugfixfrom
fix/async-notification-dispatch

Conversation

@Maffooch
Copy link
Copy Markdown
Contributor

Summary

  • Adds async_create_notification Celery task in dojo/notifications/helper.py; it accepts ids, re-fetches with select_related, and delegates to create_notification
  • Dispatches the five create-path notifications via dojo_dispatch_task instead of running synchronously: engagement_added, product_added, product_type_added, finding_added (API), test_added (importer)
  • Fixes POST /api/v2/engagements/ (and the four sibling endpoints) dropping from ~5s to sub-second on tenants with large authorized-user sets — the recipient enumeration query and per-user Alerts inserts now run in the worker

Why ids, not model instances

Existing send_*_notification tasks pass Django models through **kwargs, which pickles fine in eager mode but is brittle across real Celery brokers. The new task takes *_id kwargs and re-fetches, matching the user_id pattern used by the other @app.task helpers at the bottom of helper.py.

Test plan

  • unittests.test_notifications — full module passes (38 tests). Updated three TestNotificationTriggers methods with @override_settings(CELERY_TASK_ALWAYS_EAGER=True), three TestNotificationTriggersApi finding tests to patch dojo.api_v2.serializers.dojo_dispatch_task and assert finding_id=<id>, and three TestNotificationWebhooks assertions from cm.output[1]cm.output[-1] (the former index was a fixture-load side effect of the old sync path, see below).
  • New TestAsyncNotificationDispatch — verifies dispatch call shape for engagement/product/product_type signals; static guard on the importer dispatch block.
  • New TestAsyncNotificationTaskBody — verifies task rehydrates instances, returns silently on missing rows, and runs inline under block_execution=True.
  • unittests.test_importers_importer + unittests.test_apiv2_notifications — pass (40 tests).
  • JIRA VCR test test_import_no_push_to_jira — still passes; webhook round-trip for test_added is unchanged under block_execution=True.
  • Full unittests/ suite — 4,051 tests. All previously-failing cases caused by this PR now pass. Updated perf-test expected values in unittests/test_importers_performance.py (+1 async_create_notification dispatch per test_added; adjusted query counts from actual captured values) and UsersTest.deleted_objects in unittests/test_rest_framework.py (25 → 13, see below).
  • 3 pre-existing TestDojoImporterPerformanceSmall.test_import_reimport_reimport_performance_* errors remain (TypeError: _import_reimport_performance() missing 2 required positional arguments: expected_num_queries4 and expected_num_async_tasks4). These are broken on bugfix HEAD before this PR — sibling class TestDojoImporterPerformanceSmallLocations uses the correct 8-arg call and passes. Not fixed here to keep the diff scoped.

Note on deleted_objects = 25 → 13 and webhook cm.output[1] → cm.output[-1]

Fixtures load Product_Type/Product/Engagement rows before the single Notifications(pk=1) row. On the old sync path, each of those post_save signals synchronously invoked create_notificationNotificationManager()_get_notifications_object()get_or_create(user=None, product=None, template=False), which created a duplicate system notification row during fixture load because pk=1 hadn't inserted yet. Subsequent manager instantiations triggered a "Multiple system notifications objects found" WARNING that preceded the INFO the webhook tests were actually asserting on — hence the incidentally-correct cm.output[1] index. Similarly, the UsersTest delete-preview count reflected admin's ownership of that phantom duplicate row. With the dispatch now async (no user context during fixture load → sent to broker → not executed), the phantom row is never created, and the tests' expectations needed to shift to the real values.

Out of scope

  • bulk_create for Alerts rows inside AlertNotificationManger.send_alert_notification — orthogonal worker-side optimization; request latency is fixed as-is.
  • Retrofitting existing send_*_notification tasks to use ids instead of model instances — they work today and widening that surface adds risk.
  • Delete-path notifications and recipients=-based calls — not in the hot path.

🤖 Generated with Claude Code

POST /api/v2/engagements/ takes ~5s on large tenants because
create_notification runs recipient enumeration and per-user Alert
writes on the request thread. Move the outer create_notification to a
Celery worker for the five create-path events (engagement_added,
product_added, product_type_added, finding_added, test_added) by
adding async_create_notification (accepts ids, re-fetches, delegates)
and dispatching via dojo_dispatch_task. This extends the existing
per-user async pattern (Slack/email/MSTeams/webhooks) up one level so
the recipient query and Alert fan-out no longer block the response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Reformat async_create_notification importer-guard docstring to D213 style
- Skip post_save dispatch when raw=True (loaddata) so the k8s initializer's
  fixture install path doesn't require an available Celery broker. Without
  this guard the unconditional async dispatch tries to enqueue during
  product_type.json load and fails with kombu OperationalError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant