fix(core): decouple auto-memory recall from main-agent request path#4172
fix(core): decouple auto-memory recall from main-agent request path#4172LaZzyMan wants to merge 12 commits into
Conversation
…AbortController field
…at UserQuery consume point Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… consume point misses Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ryPrefetch in all cleanup paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r — caller controls lifetime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p fake timers and deadline references Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lResult when recall settles after UserQuery Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three findings fixed: 1. Abort previous prefetch before installing a new one (line 1059): A new UserQuery/Cron used to overwrite pendingMemoryPrefetch without aborting the old controller, leaking an unbounded background recall now that the 1s side-query timeout is gone. 2. Move the UserQuery consume poll AFTER the async reminder setup: ensureTool + listSubagents are awaited between the old poll location and the final assembly, so recalls that settled during those awaits used to be missed (and a tool-less turn never got a ToolResult retry). The poll now runs immediately before requestToSend assembly, and unshifts memory to the front of systemReminders to preserve ordering. 3. Append memory after functionResponse on ToolResult turns: The Qwen API requires the functionResponse part to immediately follow the model's functionCall (see lines 1209-1213). Prepending memory text risked breaking that pairing on the native Gemini path. Appending keeps the pair intact on Gemini and produces the same OpenAI output (text becomes a separate user message after the tool messages). Tests: - Updated ToolResult inject test to assert memory index > functionResponse - Added abort-previous-prefetch test (mid-flight UserQuery aborts old handle) 224/224 tests pass; tsc clean on changed files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📋 Review SummaryThis PR refactors auto-memory recall from a blocking 2.5s deadline pattern to a fire-and-forget prefetch with opportunistic consumption. The change addresses cold-start latency issues (#3761) where qwen3.5-flash's ~908ms response time consistently tripped the previous 1s timeout. The implementation is well-structured with comprehensive test coverage and clear documentation. 🔍 General FeedbackPositive aspects:
Architectural decisions:
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
0b4bcc1 to
ce02bd0
Compare
Annotations only, no behavior change: - MemoryPrefetchHandle: full JSDoc covering lifecycle (create → consume → discard) - UserQuery consume site: explain why we unshift (front of systemReminders) - ToolResult inject site: reference hasPendingToolCall pattern instead of brittle line numbers when citing the Qwen functionCall/Response constraint - relevanceSelector.ts: explain why the side-query has no inline timeout (caller controls lifetime via MemoryPrefetchHandle.controller) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the review. Outcomes (commit 2c41fa1): Fixed (annotations only, no behavior change):
Declined:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors managed auto-memory recall so the main model request no longer waits on the recall selector, using a pending prefetch handle and opportunistic injection points.
Changes:
- Replaces blocking recall deadline handling with
MemoryPrefetchHandlelifecycle management. - Removes the selector’s internal 1s timeout so cancellation is caller-controlled.
- Adds tests and a design doc for async recall behavior and ToolResult injection.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
packages/core/src/core/client.ts |
Introduces fire-and-forget memory prefetch and late injection handling. |
packages/core/src/memory/relevanceSelector.ts |
Removes the hardcoded selector timeout. |
packages/core/src/core/client.test.ts |
Updates/adds tests for non-blocking recall and late ToolResult injection. |
docs/design/2026-05-15-async-memory-recall-design.md |
Documents the async memory recall design. |
Comments suppressed due to low confidence (1)
docs/design/2026-05-15-async-memory-recall-design.md:130
- This code sample still shows prepending the memory prompt on ToolResult turns, but the implementation appends it after the functionResponse to preserve tool-call pairing. Keeping the stale sample makes the spec misleading for anyone following it later.
if (result.prompt) {
requestToSend = [result.prompt, ...requestToSend];
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…racy fixes
Behavior fix (addresses copilot review on client.ts:1071):
- When the parent sendMessageStream signal aborts (user Ctrl-C / Esc),
the prefetch controller now aborts too. Previously the recall side-query
would keep running until a later cleanup (next UserQuery / /clear / etc),
wasting fast-model tokens on work whose result no one would consume.
- Listener uses { once: true } and is also removed in the promise's
finally() so a long-lived parent signal doesn't accumulate listeners
across many turns under normal completion.
- Edge case: if signal is already aborted when fire runs, abort the
controller synchronously instead of attaching a listener.
Test:
- New regression guard: "should abort the pending prefetch when the caller
signal aborts" — verifies the abort handler installed on the recall side
fires once the parent signal aborts.
Doc accuracy (addresses copilot review on the design spec):
- ToolResult inject: was documented as "prepend", actual implementation
appends to preserve functionCall/functionResponse pairing. Updated both
the prose summary and the code sample.
- Cleanup section: was documented as 6 abort-locations including the
"post-consume clear"; the consume sites don't actually abort (the promise
has already settled). Reorganized as 5 abort-and-clear sites + 2
clear-only sites with the distinction made explicit.
- Fire path snippet: added the abort-previous-prefetch line and the
caller-signal bridge so the spec matches the current implementation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
E2E Reproduction ReportThree test groups run via tmux against the local Memory fixture (identical across all groups)Filename filter
Group A — UserQuery memory injection (P1)Prompt:
Final reply: Note: recall took >1 s (side-query response logged 1.07 s before req-1, but settlement landed after the UserQuery consume point). The new ToolResult inject path carried the load — exactly the regression scenario the old 1 s timeout silently dropped. Group B — Non-blocking proof (P2) — CORE EVIDENCEPrompt:
Definitive non-blocking signal:
Under the old Warm-turn measurement (Turn 4 in the same session): Bundle symbol check: Group C — ToolResult inject path (P3)Prompt:
req-2 message structure:
Final reply: req-1 having zero memory + req-2 having memory at messages[7] is the direct proof of the new ToolResult inject path firing. Sanity check after codex review (Phase 7 fixes applied)Re-ran Group C against the rebuilt CLI with the three codex fixes (abort-previous-prefetch, poll-after-async-setup, append-not-prepend for ToolResult).
Append-not-prepend is observable end-to-end: tool messages stay grouped before the memory user message, preserving Qwen's functionCall/functionResponse pairing. Summary
|
Summary
settledAtpoll right before the UserQuery request is sent, and a fallback inject on the first ToolResult turn. The blockingresolveAutoMemoryWithDeadline(2.5 s outer deadline) and the inner 1 sAbortSignal.timeout(1_000)inrelevanceSelector.tsare both removed.packages/core/src/core/client.ts— the newMemoryPrefetchHandlelifecycle, the fire-path abort-before-replace guard, the moved zero-wait poll (now afterensureTool/listSubagentsawaits), and the ToolResult inject point (append, not prepend, to respect the Qwen functionCall/functionResponse pairing documented at lines 1209-1213).packages/core/src/memory/relevanceSelector.ts— the timeout is gone; caller controls cancellation.Design spec: docs/design/2026-05-15-async-memory-recall-design.md.
Validation
What is the codename for this project?(UserQuery-injection scenario),Hello, can you say one word back?(non-blocking timing scenario),Read the file ./secret-clue.txt then tell me the project codename based on memory.(ToolResult-inject scenario). Memory fixture: aMEMORY.mdindex plus aproject_secret_codename.mdentry stating the codename isOperation Nightingale, loaded viaQWEN_CODE_MEMORY_LOCAL=1.项目的内部代号是 **Operation Nightingale**。✅. In every live run the recall actually took >1 s, so the new ToolResult inject path carried the load — exactly the regression scenario the old 1 s timeout silently dropped.resolveAutoMemoryWithDeadlinealways sequenced side-query before main). Strongest single piece of evidence the new design is non-blocking.req-1(UserQuery turn): 4 messages,## Relevant memoryabsent.req-2(ToolResult turn): 8 messages,role: toolat indexes [5, 6],## Relevant memoryblock at index [7]. Direct proof of the new inject path AND of the append-not-prepend ordering fix.npm run build && npm run bundlecd /tmp/scratch && git init && mkdir -p .qwen/memory.qwen/settings.jsonwith{"managedAutoMemory": true}, a smallMEMORY.mdindex, and at least one topic fileQWEN_CODE_MEMORY_LOCAL=1 node dist/cli.js --openai-logging --openai-logging-dir ./api-logsapi-logs/*.json— main-request files have no-side-query-suffix; the## Relevant memoryblock lands inrequest.messagesof either turn 1 or turn 2.Scope / Risk
MEMORY.mdindex is always in the system prompt, providing baseline coverage. TherelevanceSelectorside-query is now uncapped on the side, so a runaway recall could in theory burn fast-model tokens until the next UserQuery aborts it — but the controller is wired through every cleanup path, so any session-ending event (clear, abort, token-limit, arena cancel, MaxSessionTurns, new UserQuery) aborts it.@google/genai) was not exercised in E2E — only OpenAI-compatible (qwen3.6-plus). The append-not-prepend fix was applied specifically because the existing IDE-context skip (client.ts:1209-1213) documents that the native Gemini path is strict about functionCall/functionResponse pairing; with append we keep tool parts first and add the text after, so this should be safe, but a native-Gemini reviewer should sanity-check.pendingRecallAbortController→pendingMemoryPrefetch(typeMemoryPrefetchHandle).Testing Matrix
Testing matrix notes:
node dist/cli.jsagainst live qwen3.6-plus. Linux/Windows pathways are unchanged by this PR (no platform-specific code touched), but I have not directly run on those platforms.Linked Issues / Bugs
Closes #3761
🤖 Generated with Claude Code