Skip to content

fix: remove await from afterSaveMessage callbacks execution#37503

Draft
d-gubert wants to merge 2 commits intodevelopfrom
fix/send-message-after-save
Draft

fix: remove await from afterSaveMessage callbacks execution#37503
d-gubert wants to merge 2 commits intodevelopfrom
fix/send-message-after-save

Conversation

@d-gubert
Copy link
Copy Markdown
Member

Proposed changes (including videos or screenshots)

Issue(s)

Steps to test or reproduce

Further comments

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Nov 13, 2025

⚠️ No Changeset found

Latest commit: ef077ae

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 13, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b355c8a3-7304-471b-be8a-468d12818fab

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dionisio-bot
Copy link
Copy Markdown
Contributor

dionisio-bot Bot commented Nov 13, 2025

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@d-gubert d-gubert force-pushed the fix/send-message-after-save branch from d9fe526 to fd84604 Compare November 13, 2025 20:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.80%. Comparing base (ceefff9) to head (ef077ae).

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #37503      +/-   ##
===========================================
- Coverage    69.81%   69.80%   -0.01%     
===========================================
  Files         3296     3296              
  Lines       119173   119173              
  Branches     21479    21503      +24     
===========================================
- Hits         83197    83193       -4     
- Misses       32668    32672       +4     
  Partials      3308     3308              
Flag Coverage Δ
unit 70.57% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@d-gubert
Copy link
Copy Markdown
Member Author

@copilot based on the errors in this workflow run, document the blast radius of the proposed change and create a plan to adapt the data flows to accept the afterSaveMessage call becoming fire-and-forget

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

@copilot based on the errors in this workflow run, document the blast radius of the proposed change and create a plan to ...

Blast Radius of void afterSaveMessage(...) in sendMessage

Making afterSaveMessage fire-and-forget means that sendMessage now returns before any of the following side effects have completed. Here is a complete inventory of every callback registered on the afterSaveMessage hook, grouped by criticality:


Callbacks currently registered on afterSaveMessage

Priority Callback ID File What it does Fire-and-forget safe?
HIGH native-federation-after-room-message-sent ee/server/hooks/federation/index.ts Sends message to Matrix via FederationMatrix.sendMessage ⚠️ Tests check Matrix side; race condition
HIGH federation-matrix-after-room-message-updated ee/server/hooks/federation/index.ts Sends message edits to Matrix ⚠️ Same as above
MEDIUM notifyUsersOnMessage app/lib/server/lib/notifyUsersOnMessage.ts Updates user subscriptions (unread counts, highlights, mention increments) via roomUpdater ❌ State mutation; many tests read subscription state immediately after send
MEDIUM engagementDashboard.afterSaveMessage ee/server/lib/engagementDashboard/startup.ts Increments engagement analytics counters ✅ Analytics; eventual consistency is fine
MEDIUM autotranslate app/autotranslate/server/autotranslate.ts Auto-translates messages via external provider ✅ Can be eventually consistent
MEDIUM message-read-receipt-afterSaveMessage ee/app/message-read-receipt/server/hooks/afterSaveMessage.ts Marks message as read by its sender ✅ Can be eventually consistent
MEDIUM after-omnichannel-save-message app/livechat/server/hooks/afterSaveOmnichannelMessage.ts Runs the full afterOmnichannelSaveMessage callback chain (livechat room updates) ❌ State mutation; livechat API tests check room state right after send
MEDIUM search-events app/search/server/events/index.ts Emits search index update event ✅ Can be eventually consistent
LOW sendNotificationsOnMessage app/lib/server/lib/sendNotificationsOnMessage.ts Sends push/email notifications ✅ Inherently async
LOW SlackBridge_Out app/slackbridge/server/RocketAdapter.ts Relays message to Slack ✅ External call; already effectively async
LOW integrations-sendMessage app/integrations/server/triggers.ts Triggers outgoing webhook integrations ✅ External call; can be eventually consistent
LOW threads-after-save-message app/threads/server/hooks/aftersavemessage.ts Updates thread metadata (tcount, replies, tlm), thread subscriptions, sends thread notifications ❌ State mutation; many API tests check tcount / parent message state immediately after posting a reply
LOW PropagateDiscussionMetadata app/discussion/server/hooks/propagateDiscussionMetadata.ts Updates parent room's linked system-message metadata ❌ State mutation; discussion API tests check the parent message right after posting
LOW mentionUserNotInChannel app/lib/server/startup/mentionUserNotInChannel.ts Validates @mentions against channel membership ✅ Post-send validation; can be async
EmailInbox (no ID) server/features/EmailInbox/EmailInbox_Outgoing.ts Sends email reply via inbox ✅ External; can be async

Additionally, after the callback chain runs, afterSaveMessage calls:

  • Rooms.updateFromUpdater(...) — applies any changes accumulated in roomUpdater (e.g. room lm, usersCount). Now happens asynchronously.
  • Message.afterSave({ message: data }) — notifies core services of message save. Now happens asynchronously.

Why the CI tests fail

Test API (CE & EE) / Test API Livechat (CE & EE)

These Mocha API tests follow the pattern:

  1. POST chat.sendMessage (or livechat equivalent)
  2. Immediately GET the parent message / room / subscription

The failing assertions are things like:

  • Thread parent message has tcount > 0 — set by processThreads() in threads-after-save-message
  • Discussion parent message has updated msgs / lm — set by PropagateDiscussionMetadata
  • Livechat room has updated metadata — set by after-omnichannel-save-message
  • User subscription unread count incremented — set by notifyUsersOnMessage

With fire-and-forget, these DB writes happen in a later event-loop turn (via setTimeout(0) in runAsync, or in a floating Promise for the void case), so the GET immediately after POST can race with the callback.

Test Federation Matrix

Federation tests send a message then poll Matrix's /_matrix/client/v3/sync to verify it arrived. The native-federation-after-room-message-sent callback now starts after sendMessage returns, so the Matrix side sees the message slightly later. This is a timing race that can flip flaky at different VM speeds.

Test UI (CE/EE) shards 2

Playwright tests that send a message in a thread or discussion and then assert on thread counts / parent message states in the UI face the same timing issue — the DB isn't updated by the time the UI re-renders.


Plan to adapt data flows

The goal is to keep afterSaveMessage fire-and-forget in sendMessage while ensuring correctness. There are two tracks:

Track 1 — Code: split "must complete before return" from "can be deferred"

The cleanest approach is to stop using a single afterSaveMessage chain for everything and instead split it into two named hooks:

  • afterSaveMessageSync — run with await inside sendMessage and updateMessage. Only state-mutating callbacks live here.
  • afterSaveMessage — run with void (or callbacks.runAsync). Only side-effect callbacks live here.

Alternatively, keep one hook but change each callback's registration point:

Callback Action
notifyUsersOnMessage Move to synchronous execution (run it directly in sendMessage before returning, or keep in a sync hook)
after-omnichannel-save-message Same — move to synchronous, or call afterOmnichannelSaveMessage directly in the livechat send path before returning
threads-after-save-message Move thread metadata updates (reply(), Messages.setThreadMessagesCount) to be awaited before sendMessage returns; only thread notifications can be fire-and-forget
PropagateDiscussionMetadata Move updateAndNotifyParentRoomWithParentMessage to be awaited before sendMessage returns
engagementDashboard.afterSaveMessage No change — keep async
autotranslate No change — keep async
message-read-receipt-afterSaveMessage No change — keep async
search-events No change — keep async
sendNotificationsOnMessage No change — keep async
SlackBridge_Out No change — keep async
integrations-sendMessage No change — keep async
native-federation-* Keep async; tests need to add explicit waits (already done in most Federation test utils)
EmailInbox No change — keep async

Also: updateMessage.ts still uses await afterSaveMessage(...) — it should be made consistent with whatever approach is chosen for sendMessage.

Track 2 — Tests: add retry/polling where assertions race with async callbacks

For any test that cannot wait for a synchronous callback (e.g., federation), replace direct-assert-after-send patterns with a poll helper. Example (pseudo-code):

// Before
await sendMessage(...);
const parent = await getParentMessage();
expect(parent.tcount).to.equal(1); // can race

// After
await sendMessage(...);
const parent = await poll(() => getParentMessage(), { until: (m) => m.tcount === 1 });
expect(parent.tcount).to.equal(1);

A poll utility (exponential backoff, configurable timeout) should be introduced in the test helpers if not already available.

Immediate next steps (prioritised)

  1. Decide scope of the sync hook — confirm which callbacks must be synchronous (proposal above). The minimum is notifyUsersOnMessage, thread metadata in processThreads, PropagateDiscussionMetadata, and after-omnichannel-save-message.
  2. Refactor those 4 callbacks out of the async chain into direct awaited calls in sendMessage / livechat send paths.
  3. Update updateMessage.ts to match (either both sync or both async).
  4. Add polling helpers to the test framework for assertions that are inherently eventually-consistent (federation, notification delivery).
  5. Re-run CI and verify the remaining failures are only in the inherently-async categories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants