You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: apps/api/src/ai-agent/AI-README.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -658,31 +658,36 @@ When adding/updating a tool:
658
658
### Response Timing
659
659
660
660
Queue delay is disabled (0ms) so the AI responds as fast as possible.
661
-
For visitor-trigger bursts, worker-side debounce (`AI_AGENT_VISITOR_DEBOUNCE_MS`, default 800ms) is applied before selecting an effective trigger.
661
+
No visitor burst coalescing or debounce is applied.
662
662
Natural typing delays between multi-part messages are still applied to keep the experience human.
663
663
664
664
### Queueing Model
665
665
666
666
- Each conversation has a Redis sorted set queue ordered by `createdAt` (with `messageId` tiebreaker).
667
-
- A BullMQ drain job processes messages sequentially and advances a DB cursor for recovery.
668
-
- Visitor bursts are coalesced at queue head: contiguous visitor triggers are handled as one effective trigger (latest message in the burst).
667
+
- Wake jobs are conversation-scoped (`ai-agent-{conversationId}`), with single-active semantics:
668
+
-`waiting`/`delayed`/`completed`/`failed` wake jobs are replaced
669
+
-`active` wake jobs are never replaced
670
+
- A BullMQ drain job processes queued messages sequentially and advances a DB cursor for recovery.
669
671
- BullMQ wake jobs remain signals only; Redis queue + DB cursor are authoritative state.
672
+
- Conversations with queued items are tracked in Redis (`ai-agent:active-conversations`), and producer/worker recovery markers are tracked via `ai-agent:wake-needed:{conversationId}`.
673
+
- A worker-side wake sweeper periodically repairs missing wakes for non-empty queues.
670
674
671
675
### Trigger-Level Reliability Rules
672
676
673
677
1.**FIFO Trigger Processing**: Conversation triggers are processed in queue order using the Redis ZSET cursor model.
674
-
2.**Burst Coalescing**: Contiguous visitor messages at queue head are coalesced and processed once using the latest coalesced trigger.
675
-
3.**Continuation Gate**: If a queued visitor trigger already has a newer public AI reply, the pipeline runs `skip vs supplement` before generation.
676
-
4.**Bias to Supplement on Uncertainty**: If continuation classification is uncertain (timeout/model error), fallback favors `supplement` (never silent miss).
677
-
5.**No Full-Turn Retry After Visible Reply**: If a trigger already sent any public message, that trigger is marked `retryable=false` and dropped on subsequent pipeline error.
678
-
6.**Retry Only Pre-Reply Failures**: If a trigger fails before any public send, it stays queued and is retried (with per-message failure threshold).
679
-
7.**Typing Always Ends**: Typing is stopped before each visible send and force-stopped in final pipeline cleanup.
678
+
2.**Strict Per-Conversation Serial Execution**: Redis lock (`ai-agent:lock:{conversationId}`) ensures only one worker processes a conversation at a time.
679
+
3.**No Burst Coalescing**: Every queued message is processed in order; no contiguous visitor batching.
680
+
4.**Reliable Producer Path**: Producer enqueues message (`ZADD NX`) then ensures wake with bounded retries; on exhaustion it marks `wake-needed` recovery.
681
+
5.**Lock Miss/Loss Recovery**: Worker attempts continuation wake with jitter when lock cannot be acquired or is lost during processing.
682
+
6.**End-of-Job Invariant**: If queue remains non-empty, worker must ensure a runnable wake exists or mark recovery.
683
+
7.**Sweeper Reconciliation**: Periodic sweeper scans active + wake-needed conversations and recreates missing wakes.
684
+
8.**Typing Always Ends**: Typing is stopped before each visible send and force-stopped in final pipeline cleanup.
680
685
681
686
### Failure Handling
682
687
683
-
1.**`retryable=true` and below threshold**: Keep trigger message at queue head, schedule continuation drain.
684
-
2.**`retryable=false`**: Advance cursor to effective trigger and remove processed/coalesced queue items immediately.
685
-
3.**Threshold reached**: Drop trigger/coalesced batch, advance cursor, continue draining.
688
+
1.**`retryable=true` and below threshold**: Keep trigger message at queue head for retry.
689
+
2.**`retryable=false`**: Advance cursor and remove the failed message immediately.
690
+
3.**Threshold reached**: Drop failed message, advance cursor, continue draining.
686
691
4.**Stalled jobs**: BullMQ stalled-job recovery still applies at worker level.
687
692
5.**Error events**: `aiAgentProcessingCompleted` with `status: "error"` is still emitted for dashboard observability.
0 commit comments