fix(matrix): @mention detection, sync reliability, and duplicate code cleanup#1057
Conversation
… cleanup ## Problem Three issues with the Matrix channel adapter: ### 1. @mention detection broken for Element clients (group_policy=mention_only unusable) Element sends @mentions as HTML pills in `formatted_body` and via `m.mentions.user_ids[]`, but NOT in the plain text `body`. The adapter only checked `body`, so all Element @mentions were missed. This made `group_policy=mention_only` (the default) effectively ignore all messages in group rooms when using Element. ### 2. HTTP client has no timeout (sync can hang forever) `reqwest::Client::new()` creates a client with no timeout. If the homeserver drops the TCP connection without sending RST/FIN (common in containerized deployments behind proxies), the sync request hangs forever, silently killing message reception. ### 3. Duplicate mention/DM detection code The FIX RightNow-AI#2 and FIX RightNow-AI#3 blocks (mention detection + DM detection) were duplicated in the sync loop, causing `metadata` to be overwritten by the second pass. ## Fix ### Mention detection (matrix.rs) Now checks three sources for @mentions: 1. `content.body` — plain text (CLI/API clients) 2. `content.formatted_body` — HTML pills (Element, FluffyChat, etc.) 3. `content.m.mentions.user_ids[]` — Matrix spec MSC3952 standard ### HTTP client timeout (matrix.rs) Added `timeout(90s)` and `connect_timeout(30s)` to the reqwest Client builder. The Matrix /sync uses `timeout=30000ms` (30s long-poll), so 90s gives 60s margin for network latency while ensuring hung connections are detected and retried. ### Duplicate code removal (matrix.rs) Removed the duplicated FIX RightNow-AI#2 + FIX RightNow-AI#3 blocks that overwrote `metadata`. ### Heartbeat (agents/assistant/agent.toml) Added `heartbeat_interval_secs = 300` to the assistant agent. The default (30s) creates a timeout of 60s, causing idle agents to enter a crash-recovery loop every 90 seconds. With 300s the timeout is 600s (10 min), eliminating false positives for idle agents. ### Observability (matrix.rs + bridge.rs) Added structured logging at every decision point: - Sync loop: iteration counter, shutdown channel state, error details - Bridge loop: message receipt, iteration counter, shutdown handling - Dispatch: policy check results, agent routing, RBAC, send result ### Tests (matrix.rs) Added 9 integration tests using wiremock to validate: - Auth success/failure - Message reception and dispatch - Own-message filtering - Error retry with backoff - Clean shutdown - Command parsing - Event deduplication - Room allowlist filtering ## Breaking Changes None. All changes are backward-compatible. ## Test Plan - [x] `cargo check -p openfang-channels` passes - [x] `cargo test -p openfang-channels` — 12/12 tests pass - [x] Tested on Railway with Synapse homeserver - [x] Verified Element @mentions detected via formatted_body - [x] Verified sync recovery after connection drops
|
Thanks for the matrix reliability work @eldelosdatos — the sync retry + Ship-blocker in the current head:
This is invalid TOML and will break every deployment that uses the bundled
Rebase on latest |
jaberjaber23
left a comment
There was a problem hiding this comment.
The three matrix.rs fixes themselves are real and correct. Bug verified, mention detection covers all three Element sources (body, formatted_body, m.mentions), HTTP client timeout is sound, and the duplicate FIX #2 / FIX #3 blocks really did shadow metadata. Tests are solid.
Blocking issues:
- agents/assistant/agent.toml is broken TOML. The diff inserts heartbeat_interval_secs on the same line as max_iterations with no newline:
heartbeat_interval_secs = 300max_iterations = 100
This will fail to deserialize at boot and break the assistant agent. Fix: put each key on its own line.
-
The agent.toml change is also out of scope for a PR titled fix(matrix). Drop it from this PR or move it to a separate PR with its own justification (heartbeat tuning is unrelated to matrix mention detection).
-
The bridge.rs logging additions are out of scope too. They are useful but they do not belong in a matrix-only fix. Either rename the PR to cover bridge observability or split them out.
Once the TOML is unbroken and the unrelated changes are split, the matrix fixes are good to merge.
Summary
formatted_bodyand viam.mentions.user_ids[], not in plain textbody. The adapter only checkedbody, makinggroup_policy=mention_only(default) silently drop all messages in group rooms.reqwest::Client::new()has no timeout. If the homeserver drops TCP silently,/synchangs forever, killing message reception permanently.metadataon the second pass.Problem Details
@mention detection (critical)
Element sends:
hola(no MXID)<a href="https://matrix.to/#/@bot:server">Bot</a> hola["@bot:server"]HTTP timeout
Duplicate code
The FIX #2 (mention detection) and FIX #3 (DM detection) blocks were copy-pasted twice in the sync loop. The second copy overwrites
metadatafrom the first, losing the DM detection results.Test Plan
cargo check -p openfang-channels— compiles cleanlycargo test -p openfang-channels— 12/12 tests pass (9 new + 3 existing)formatted_bodyandm.mentionsgroup_policy=mention_onlyworks correctly with Element pillsNew Tests
test_adapter_auth_and_starttest_adapter_auth_failuretest_sync_receives_messagetest_sync_skips_own_messagestest_sync_retries_on_errortest_shutdown_stops_synctest_command_parsing/command argsparsingtest_dedup_eventstest_allowed_rooms_filterBreaking Changes
None. All changes are backward-compatible.
🤖 Generated with Claude Code