fix(core,cli): close tool_use↔tool_result invariant across all failure paths#4176
fix(core,cli): close tool_use↔tool_result invariant across all failure paths#4176wenshao wants to merge 4 commits into
Conversation
…_use
Weak-network failures during an Anthropic-compatible stream (DeepSeek,
api.anthropic.com, etc.) can drop the SSE between a tool_use
`content_block_stop` and the terminal `message_stop`. The functionCall
chunk is already yielded at content_block_stop, so:
- Turn.run records a ToolCallRequest event.
- useGeminiStream's for-await exits and schedules the tool.
- handleCompletedTools eventually fires submitQuery(..., ToolResult)
and pushes a user[functionResponse] into history.
- But processStreamResponse's history.push for the model turn never
ran (the for-await threw first), so the matching tool_use is gone.
The next request body has `user → user[tool_result]` with no tool_use
in between, and the server rejects with HTTP 400:
"tool_use_id ... must have a corresponding tool_use block in the
previous message". Ctrl+Y can't recover because
stripOrphanedUserEntriesFromHistory only strips trailing user
entries — the lost tool_use is unrecoverable, and the session is
wedged.
Wrap the for-await loop in processStreamResponse with try/catch.
When the stream throws AND any functionCall chunk was already
yielded (hasToolCall=true), persist the partial assistant turn to
history before re-throwing. The eventual tool_result submission
then has a matching tool_use and the session can continue.
Plain-text partial turns (no functionCall yielded) are intentionally
NOT persisted: the Retry path pops the trailing user prompt and
re-issues it, so a stale partial-text model turn between them would
either bias the retry or surface as duplicate output.
📋 Review SummaryThis PR fixes a critical unrecoverable wedge that occurs when streaming connections drop mid-tool_use on weak networks. The fix ensures partial assistant turns containing tool calls are persisted to history before re-throwing stream errors, maintaining the tool_use/tool_result pairing invariant required by Anthropic-compatible APIs. The implementation is well-reasoned with comprehensive tests and excellent documentation. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
…ry fix Adds a third case to the partial-history persistence test suite for reasoning-mode providers (DeepSeek thinking, Claude 4.6+ adaptive): when the assistant turn streams a thinking block AND a tool_use before the SSE drops, the partial push must keep the thinking part before the functionCall so DeepSeek's `injectThinkingOnToolUseTurns` converter pass sees an existing block on the replayed turn and does not pre-pend a synthetic empty one (which would discard the model's original reasoning text).
There was a problem hiding this comment.
Pull request overview
Fixes a wedge state on weak networks where Anthropic-compatible streams (DeepSeek, Anthropic, etc.) drop after a tool_use content_block_stop but before message_stop. The yielded functionCall chunk causes downstream tool execution and a follow-up tool_result user turn, but no matching tool_use was ever pushed into history, leading to permanent 400s. The fix wraps the stream loop in processStreamResponse with try/catch and persists the partial assistant turn into history when a tool call has already been yielded before re-throwing.
Changes:
- Capture mid-stream errors in
processStreamResponseinstead of letting them abort post-loop processing. - When
hasToolCallis true on error, push the partial assistant turn (thinking + consolidated parts) into history before re-throwing, preserving tool_use/tool_result pairing. - Add three vitest cases covering tool-only, thinking+tool, and text-only mid-stream failures.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/core/src/core/geminiChat.ts | Adds try/catch around stream iteration and a recovery branch that conditionally persists the partial model turn before re-throwing. |
| packages/core/src/core/geminiChat.test.ts | Adds three new tests verifying partial-turn persistence with tool_use, with thinking+tool, and non-persistence for text-only failures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1369,6 +1383,46 @@ export class GeminiChat { | |||
| }); | |||
There was a problem hiding this comment.
Non-blocking follow-up.
This persists the partial tool_use and relies on handleCompletedTools to eventually push a matching tool_result. Two paths re-create the same wedge one layer down:
- Ctrl+Y before the tool finishes — history is
[user, model(tool_use)], no trailing user entry forstripOrphanedUserEntriesFromHistoryto strip. The retry sends the orphantool_useand 400s with the same error this PR is trying to escape. - Scheduler aborts / process killed before
submitQuery(ToolResult)fires — the persistedtool_usebecomes a permanent orphan. Same wedge.
The stronger invariant is to close the pair at the failure point itself: for every already-yielded tool_use, synthesize an is_error: true tool_result before re-throwing, instead of betting on a downstream consumer. anthropics/claude-code does this in src/query.ts (yieldMissingToolResultBlocks, ~L123) and calls it from both the stream-error and the user-abort paths — the pair is never out of sync.
Current fix recovers the common case, so not blocking. Worth a follow-up to also cover the retry race and the abort path symmetrically.
There was a problem hiding this comment.
Confirmed both paths reproduce — handleCompletedTools is gated on isResponding (useGeminiStream.ts:1971) and useReactToolScheduler.allToolCallsCompleteHandler is single-shot (useReactToolScheduler.ts:127-132), so a Ctrl+Y that lands between the partial-push and tool completion does silently swallow the eventual tool_result. The abort-before-submit path is the same shape.
(Worth noting: this race exists pre-#4176 too — any Ctrl+Y while a tool is still running on a normally-completed stream hits the same wedge. This PR moves the partial-tool_use case into the same race class, but doesn't widen it.)
On yieldMissingToolResultBlocks shape — straight port has a corner: if we synthesize is_error: true at the failure point AND the live scheduler later completes, handleCompletedTools submits a second tool_result with the same tool_use_id, producing two consecutive user turns (orphan tool_result in the trailing one). Upstream sidesteps this because their StreamingToolExecutor is owned by the same loop that does the synthesis — it can discard() in-flight tools atomically (query.ts:733). Qwen-code's React scheduler runs out-of-band, so the synthesis path needs to coordinate cancellation/dedup with it.
Folded both races + the cancel-coordination requirement into #4178 — that's where the closing-at-failure-point work belongs. Will tackle it as the next PR; #4176 stays minimal to ship the common-case recovery now.
There was a problem hiding this comment.
Folded both races + the cancel-coordination requirement into this PR in 3de3241a2 rather than splitting to #4178. Three pieces:
repairOrphanedToolUseTurns(history)synthesizeserrorfunctionResponsefor every danglingfunctionCall.id— close to upstream'syieldMissingToolResultBlocksshape.- Wired into
startChat()(Race B/C —--resumeof a crashed session) and thesendMessageStreamRetry branch afterstripOrphanedUserEntriesFromHistory(Race A — Ctrl+Y mid-flight). handleCompletedToolsdedupes againstchat.historybefore submitting tool_results — that's the cross-layer coordination piece you flagged. Since our React scheduler runs out-of-band (vs. upstream's in-bandStreamingToolExecutorthat can.discard()atomically at synthesis), the synthesizer plants the syntheticfunctionResponsefirst and the scheduler drops its late real result when it sees a matchingcallIdalready in history. Same trade-off upstream's.discard()+yieldMissingToolResultBlocksmakes — the model sees the synthetic error and can retry the tool if it still wants the result.
Tests: 8 new repair-helper cases in geminiChat.test.ts (Race A merge-into-existing-user-turn, Race B trailing-tool_use-only, parallel tool_use with partial responses, no-op on already-paired history, multiple non-adjacent dangling rounds, caller-supplied reason text, no-op on tool-free history). 131/131 client.test.ts and 91/91 useGeminiStream.test.tsx still passing.
#4178 will be closed once this PR merges (its scope is now this PR). #4177 (idle watchdog) stays as the independent follow-up.
yiliang114
left a comment
There was a problem hiding this comment.
The fix recovers the common SSE-drop case correctly. Left one non-blocking follow-up inline about closing the tool_use ↔ tool_result invariant at the failure point rather than handing off to the scheduler.
|
For reviewers — this PR is the minimal, surgical fix. The broader weak-network resilience story has two follow-ups filed so they can be reviewed independently:
Together (#4176 + #4177 + #4178) qwen-code's behavior on weak networks would be comparable to upstream Claude Code, addressing the same class of bug as |
Extends the partial-history fix in fe35e37 to cover the residual race paths surfaced in PR QwenLM#4176 review: - Race A: Ctrl+Y while in-flight tool hasn't finished. History is [user, model(tool_use)] — `stripOrphanedUserEntriesFromHistory` only pops trailing user entries, so the retry payload lands as a fresh user turn after the orphan tool_use and API rejects. Meanwhile the scheduler's `onAllToolCallsComplete` is single-shot and gated on `isResponding`, so the eventual tool_result is silently swallowed. - Race B: process crash / OOM / SIGKILL between the partial-tool_use push and the React scheduler's tool_result submission. On `--resume` the dangling model(tool_use) wedges the first API call. - Race C: external tooling / manual JSONL edits leaving the same dangling shape. The fix has three pieces working together: 1. `repairOrphanedToolUseTurns(history)` in geminiChat.ts walks history left-to-right and synthesizes an `error`-typed functionResponse for every functionCall whose id is not echoed back in the next user turn. Appends to an existing user turn when present, otherwise inserts a new one. Returns the injected (callId, name) list. 2. `GeminiClient.repairOrphanedToolUseTurnsInHistory()` wraps the helper and is called from three points: - `startChat()` after loading the transcript (Race B/C, --resume). - `sendMessageStream` Retry branch after stripOrphans (Race A). - `sendMessageStream` UserQuery/Cron branch (defensive belt-and- suspenders for anything that slipped past 1 and 2). 3. `handleCompletedTools` in useGeminiStream.ts dedupes against chat.history before submitting tool_results — if a synthetic functionResponse for the same callId is already present (planted by the repair pass), the in-flight scheduler's late result is dropped and the call is `markToolsAsSubmitted` so the UI advances. Same trade-off upstream Claude Code's `StreamingToolExecutor.discard()` makes — late real results are dropped on the wire after synthesis, the model sees the synthetic error and can retry the tool if it still wants the result. Together with the partial-history push from fe35e37, every tool_use that ever streamed to the consumer is guaranteed to have a matching tool_result on the wire — regardless of whether the stream errored, the user retried mid-flight, the process crashed, or the session is later resumed. This is the qwen-code analogue of upstream Claude Code's `yieldMissingToolResultBlocks` (query.ts:123-149), but split across the core/cli boundary because the React tool scheduler runs out-of-band from the stream loop (so the synthesis path can't atomically discard in-flight tools the way upstream's StreamingToolExecutor can; the history-dedup at handleCompletedTools fills that gap instead). Tests: - 8 new repair-helper tests in geminiChat.test.ts cover Race A, Race B, partial coverage of parallel tool_use, idempotence on already-paired history, no-op on tool-free history, caller- supplied reason text, multiple non-adjacent dangling rounds, and routes through the GeminiChat instance-method wrapper. - client.test.ts mocks updated for the new GeminiChat method. - All 88 geminiChat tests pass; 131 client tests pass; 91 useGeminiStream tests pass.
| export function repairOrphanedToolUseTurns( | ||
| history: Content[], | ||
| reason: string = ORPHAN_TOOL_USE_REPAIR_REASON, | ||
| ): { injected: Array<{ callId: string; name: string }> } { | ||
| const injected: Array<{ callId: string; name: string }> = []; | ||
|
|
||
| // Forward walk: i mutates as we splice, so use index-based iteration | ||
| // and skip the freshly-inserted user turn to avoid re-scanning it. | ||
| for (let i = 0; i < history.length; i++) { | ||
| const turn = history[i]; | ||
| if (turn.role !== 'model') continue; | ||
|
|
||
| // Collect (id → name) for every functionCall in this model turn. | ||
| const expected = new Map<string, string>(); | ||
| for (const part of turn.parts ?? []) { | ||
| const fc = part.functionCall; | ||
| if (fc?.id) { | ||
| expected.set(fc.id, fc.name ?? 'unknown'); | ||
| } | ||
| } | ||
| if (expected.size === 0) continue; | ||
|
|
||
| const next = history[i + 1]; | ||
| const matched = new Set<string>(); | ||
| if (next?.role === 'user') { | ||
| for (const part of next.parts ?? []) { | ||
| const id = part.functionResponse?.id; | ||
| if (id) matched.add(id); | ||
| } | ||
| } | ||
|
|
||
| const missing = [...expected.entries()].filter(([id]) => !matched.has(id)); | ||
| if (missing.length === 0) continue; | ||
|
|
||
| const syntheticParts: Part[] = missing.map(([callId, name]) => ({ | ||
| functionResponse: { | ||
| id: callId, | ||
| name, | ||
| response: { error: reason }, | ||
| }, | ||
| })); | ||
|
|
||
| if (next?.role === 'user') { | ||
| next.parts = [...(next.parts ?? []), ...syntheticParts]; | ||
| } else { | ||
| history.splice(i + 1, 0, { role: 'user', parts: syntheticParts }); | ||
| // Skip the freshly-inserted user turn so the outer loop doesn't | ||
| // visit it as a model turn (it isn't) and stays linear-time. | ||
| i++; | ||
| } | ||
|
|
||
| for (const [callId, name] of missing) { | ||
| injected.push({ callId, name }); | ||
| } | ||
| } | ||
|
|
||
| return { injected }; | ||
| } |
| // Record assistant turn with raw Content and metadata | ||
| if (thoughtContentPart || contentText || hasToolCall || usageMetadata) { | ||
| const contextWindowSize = | ||
| this.config.getContentGeneratorConfig()?.contextWindowSize; | ||
| this.chatRecordingService?.recordAssistantTurn({ | ||
| model, | ||
| message: [ | ||
| ...(thoughtContentPart ? [thoughtContentPart] : []), | ||
| ...(contentText ? [{ text: contentText }] : []), | ||
| ...(hasToolCall | ||
| ? contentParts | ||
| .map(redactStructuredOutputArgsForRecording) | ||
| .filter( | ||
| ( | ||
| p, | ||
| ): p is { functionCall: NonNullable<Part['functionCall']> } => | ||
| p !== null, | ||
| ) | ||
| : []), | ||
| ], | ||
| tokens: usageMetadata, | ||
| contextWindowSize, | ||
| }); | ||
| } | ||
|
|
||
| // Mid-stream failure recovery: if the upstream stream threw (typical on | ||
| // weak networks — SSE cut between a tool_use `content_block_stop` and | ||
| // the terminal `message_stop`) AND any `functionCall` chunk was already | ||
| // yielded to consumers, we must persist the partial assistant turn here. | ||
| // | ||
| // The content generator (Anthropic / OpenAI) emits a `functionCall` part | ||
| // only at the end of a tool_use block. Once yielded, `Turn.run` registers | ||
| // a `ToolCallRequest` event, the React tool scheduler queues the call, | ||
| // and `handleCompletedTools` will fire `submitQuery(..., ToolResult)` — | ||
| // pushing a user message with `functionResponse` into history — even | ||
| // though the parent stream errored. Without preserving the matching | ||
| // tool_use on the model side, the next request body would have | ||
| // `user → user[tool_result]` with no tool_use in between, and the | ||
| // Anthropic-compatible API (DeepSeek, Anthropic, etc.) rejects with | ||
| // "tool_use_id ... must have a corresponding tool_use block in the | ||
| // previous message" | ||
| // — an unrecoverable state because Ctrl+Y's `stripOrphanedUserEntries` | ||
| // only strips trailing user entries; the lost tool_use can't be | ||
| // resurrected. | ||
| // | ||
| // Plain-text partial turns (no functionCall yielded) are deliberately | ||
| // NOT persisted — the Retry path pops the trailing user prompt and | ||
| // re-issues it; a stale partial-text model turn between them would | ||
| // either bias the retry or surface as a duplicate. | ||
| if (streamError !== null) { | ||
| if ( | ||
| hasToolCall && | ||
| (thoughtContentPart || consolidatedHistoryParts.length > 0) | ||
| ) { | ||
| this.history.push({ | ||
| role: 'model', | ||
| parts: [ | ||
| ...(thoughtContentPart ? [thoughtContentPart] : []), | ||
| ...consolidatedHistoryParts, | ||
| ], | ||
| }); | ||
| } | ||
| throw streamError; | ||
| } |
Post-review audit of 3de3241 surfaced two real races the earlier fix did not actually close: - Race A (the one yiliang114 originally flagged): handleCompletedTools is gated on `isResponding` (useGeminiStream:1971) BEFORE the dedup branch ran, so when the user Ctrl+Y'd mid-tool, the dedup never fired and the in-flight tool stayed permanently stuck in `completed-but-not-submitted` (the scheduler's `allToolCallsCompleteHandler` is single-shot). The dedup branch was structurally unreachable on the very race it was meant to defend against. - Race in the Retry path: the repair pass in `client.sendMessageStream` Retry branch ran BEFORE the chat-internal `push(userContent)`, so a Retry of a previous ToolResult submission (lastPrompt is a functionResponse part array) raced its own real `functionResponse` against the synthesized error one. The synthesis was winning and pre-empting the real result, producing two `functionResponse` entries with the same callId on the wire. This commit relocates both pieces to where the contract actually holds: 1. Move the dedup branch in `handleCompletedTools` to BEFORE the `isResponding` early-return so `markToolsAsSubmitted` always runs for callIds already paired in history, unblocking the UI/scheduler even when a new stream is in flight. The submission-side `geminiTools = completedAndReadyToSubmitTools.filter(... && !inHistory ...)` then covers the no-double-submit case in one expression. 2. Move the repair call from `client.sendMessageStream` Retry branch into `chat.sendMessageStream` immediately AFTER the user-supplied turn is pushed. The user's own tool_result (when the Retry payload carries one) gets the first chance to close the pair before the synthesizer sees it as dangling. `client.startChat()` keeps a belt-and-suspenders pass at session-load time so any pre-send code reading `chat.history` sees a well-formed shape. Adds three integration tests covering the new wiring: - geminiChat.test.ts: chat.sendMessageStream synthesizes a functionResponse when history carries a dangling tool_use AND the user-supplied content doesn't already close it (Race B/C plus Race A from the chat side). - geminiChat.test.ts: chat.sendMessageStream does NOT synthesize when the user-supplied content IS a matching functionResponse (the Retry-of-ToolResult race the previous wiring tripped on). - useGeminiStream.test.tsx: handleCompletedTools dedups a late real result whose callId already has a functionResponse in chat.history (Race A end-to-end: markToolsAsSubmitted fires but no submitQuery is dispatched). Test summary: 90/90 geminiChat, 131/131 client, 92/92 useGeminiStream; all 8186 core tests pass; tsc clean. Known limitation logged for follow-up: a Retry of ToolResult whose intervening stream itself partial-pushed a SECOND tool_use produces a trailing user turn with both the stale re-pushed `fr_A` and the synthesized `fr_B` for the second call — `fr_A` retry lands as an orphan because `fc_A` was already paired earlier in history. Triggered only when partial-push fires on the second stream AND user retries the tool_result rather than the user prompt; extremely low frequency in practice. Cleanest fix is in the retryLastPrompt layer (don't re-push an already-paired tool_result), which is out of scope for this PR.
| // History-based dedup MUST run before the `isResponding` early-return. | ||
| // If a synthetic `functionResponse` for this callId is already in | ||
| // chat.history (planted on session-load by | ||
| // `client.repairOrphanedToolUseTurnsInHistory` or on every | ||
| // `chat.sendMessageStream` push by the inline repair pass), the | ||
| // in-flight scheduler result must be marked submitted NOW — | ||
| // `useReactToolScheduler.allToolCallsCompleteHandler` is single-shot | ||
| // per batch, so a later isResponding=true early-return would leave | ||
| // the tool stuck in `completed-but-not-submitted` forever (Race A | ||
| // surfaced in PR #4176 review). The real result is dropped on the | ||
| // wire — same trade-off upstream Claude Code makes when its | ||
| // `StreamingToolExecutor.discard()` follows a | ||
| // `yieldMissingToolResultBlocks` synthesis (`query.ts:733` + `:984`). | ||
| const historyCallIdsWithResponse = new Set<string>(); | ||
| // Guard the call: some test harnesses build a partial GeminiClient | ||
| // mock without `getHistory`. Skipping dedup in that case is safe — | ||
| // it just means tests that never set up the repair pre-condition | ||
| // run with the original (pre-dedup) submission shape. | ||
| if (geminiClient && typeof geminiClient.getHistory === 'function') { | ||
| for (const entry of geminiClient.getHistory()) { | ||
| if (entry.role !== 'user') continue; | ||
| for (const part of entry.parts ?? []) { | ||
| const id = part.functionResponse?.id; | ||
| if (id) historyCallIdsWithResponse.add(id); | ||
| } | ||
| } | ||
| } | ||
| const dedupedCallIds = completedAndReadyToSubmitTools | ||
| .filter((tc) => historyCallIdsWithResponse.has(tc.request.callId)) | ||
| .map((tc) => tc.request.callId); | ||
| if (dedupedCallIds.length > 0) { | ||
| debugLogger.warn( | ||
| `[REPAIR] Dropping ${dedupedCallIds.length} late tool result(s) ` + | ||
| `whose callId already has a functionResponse in history: ` + | ||
| `${dedupedCallIds.join(', ')}`, | ||
| ); | ||
| markToolsAsSubmitted(dedupedCallIds); | ||
| } | ||
|
|
||
| if (isResponding) { | ||
| return; | ||
| } |
| if (thoughtContentPart || contentText || hasToolCall || usageMetadata) { | ||
| const contextWindowSize = | ||
| this.config.getContentGeneratorConfig()?.contextWindowSize; | ||
| this.chatRecordingService?.recordAssistantTurn({ | ||
| model, | ||
| message: [ | ||
| ...(thoughtContentPart ? [thoughtContentPart] : []), | ||
| ...(contentText ? [{ text: contentText }] : []), | ||
| ...(hasToolCall | ||
| ? contentParts | ||
| .map(redactStructuredOutputArgsForRecording) | ||
| .filter( | ||
| ( | ||
| p, | ||
| ): p is { functionCall: NonNullable<Part['functionCall']> } => | ||
| p !== null, | ||
| ) | ||
| : []), | ||
| ], | ||
| tokens: usageMetadata, | ||
| contextWindowSize, | ||
| }); | ||
| } | ||
|
|
||
| // Mid-stream failure recovery: if the upstream stream threw (typical on | ||
| // weak networks — SSE cut between a tool_use `content_block_stop` and | ||
| // the terminal `message_stop`) AND any `functionCall` chunk was already | ||
| // yielded to consumers, we must persist the partial assistant turn here. | ||
| // | ||
| // The content generator (Anthropic / OpenAI) emits a `functionCall` part | ||
| // only at the end of a tool_use block. Once yielded, `Turn.run` registers | ||
| // a `ToolCallRequest` event, the React tool scheduler queues the call, | ||
| // and `handleCompletedTools` will fire `submitQuery(..., ToolResult)` — | ||
| // pushing a user message with `functionResponse` into history — even | ||
| // though the parent stream errored. Without preserving the matching | ||
| // tool_use on the model side, the next request body would have | ||
| // `user → user[tool_result]` with no tool_use in between, and the | ||
| // Anthropic-compatible API (DeepSeek, Anthropic, etc.) rejects with | ||
| // "tool_use_id ... must have a corresponding tool_use block in the | ||
| // previous message" | ||
| // — an unrecoverable state because Ctrl+Y's `stripOrphanedUserEntries` | ||
| // only strips trailing user entries; the lost tool_use can't be | ||
| // resurrected. | ||
| // | ||
| // Plain-text partial turns (no functionCall yielded) are deliberately | ||
| // NOT persisted — the Retry path pops the trailing user prompt and | ||
| // re-issues it; a stale partial-text model turn between them would | ||
| // either bias the retry or surface as a duplicate. | ||
| if (streamError !== null) { | ||
| if ( | ||
| hasToolCall && | ||
| (thoughtContentPart || consolidatedHistoryParts.length > 0) | ||
| ) { | ||
| this.history.push({ | ||
| role: 'model', | ||
| parts: [ | ||
| ...(thoughtContentPart ? [thoughtContentPart] : []), | ||
| ...consolidatedHistoryParts, | ||
| ], | ||
| }); | ||
| } | ||
| throw streamError; | ||
| } |
wenshao
left a comment
There was a problem hiding this comment.
Review Summary: 2 files, +270/-36. Fixes unrecoverable wedge when streaming drops between tool_use content_block_stop and message_stop. Core logic is correct; 606 tests pass; tsc + eslint clean.
No high-confidence Critical issues found. 5 low-confidence suggestions identified (not posted as inline comments — needs human review):
- AbortSignal mid-tool_use persists partial turn — catch 块未区分 abort 与网络错误,取消后 history 含孤立 tool_use。建议加 守卫。
- Zero logging on partial turn persistence — 静默修改 history,调试困难。建议加 。
- recordAssistantTurn 在错误路径无条件调用 — 失败流的部分数据被记录为正常 turn。建议移入 块。
- Missing thinking-only test — 推理模型场景(thought:true 无 functionCall)无回归覆盖。
- Duplicate history.push 代码 — 错误/成功路径各一份相同实现,建议提取方法。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
| // The dedup MUST run regardless of `isResponding`, because the | ||
| // scheduler's `onAllToolCallsComplete` is single-shot and would | ||
| // otherwise leave the tool stuck in `completed-but-not-submitted`. | ||
| const lateRealResult: TrackedCompletedToolCall = { |
There was a problem hiding this comment.
[Critical] This fixture does not structurally satisfy TrackedCompletedToolCall, and the changed-file typecheck fails here with TS2352. As written, the PR cannot pass tsc on the changed files.
Please update the mock object to match the real TrackedCompletedToolCall shape (or choose a narrower test type that actually reflects what this test needs) instead of forcing the incompatible object through a direct assertion.
— gpt-5.5 via Qwen Code /review
| }, | ||
| })); | ||
|
|
||
| if (next?.role === 'user') { |
There was a problem hiding this comment.
[Critical] Appending the synthetic functionResponse after existing user parts means the Ctrl+Y retry shape becomes user: text, tool_result. The Anthropic converter preserves part order, so it serializes the retry prompt before the tool_result that is supposed to answer the immediately preceding tool_use. Anthropic-compatible backends require those tool_result blocks to come first in the following user message, so this can still reject the retry and leave the session wedged.
Insert the synthetic responses before the first non-functionResponse part when merging into an existing user turn, while preserving any real tool results that are already at the front.
| if (next?.role === 'user') { | |
| const existingParts = next.parts ?? []; | |
| const firstNonFunctionResponseIndex = existingParts.findIndex( | |
| (part) => !part.functionResponse, | |
| ); | |
| const insertAt = | |
| firstNonFunctionResponseIndex === -1 | |
| ? existingParts.length | |
| : firstNonFunctionResponseIndex; | |
| next.parts = [ | |
| ...existingParts.slice(0, insertAt), | |
| ...syntheticParts, | |
| ...existingParts.slice(insertAt), | |
| ]; |
— gpt-5.5 via Qwen Code /review
| // subsystem is trying to escape. (Belt-and-suspenders: the same | ||
| // helper runs again inside `chat.sendMessageStream` after the user | ||
| // content is pushed, so a dangling left here by setHistory / | ||
| // compaction reordering is also caught — but doing it here keeps |
There was a problem hiding this comment.
[Suggestion] This new startChat() wiring is the only resume-time integration point that repairs a dangling model[functionCall] before the first resumed send, but the tests only cover the helper directly and mock this method in client.test.ts. A future refactor could remove or reorder this call while the helper tests still pass, regressing the --resume recovery path this PR is meant to protect.
Please add a client-level test that starts a chat with extraHistory ending in a model turn containing a functionCall, then asserts the initialized chat history contains the synthetic user[functionResponse] before any send occurs.
— gpt-5.5 via Qwen Code /review
Summary
Fixes an unrecoverable wedge on weak-network connections (e.g. on a train) when calling DeepSeek-V4-Pro (or any Anthropic-compatible backend) through the Anthropic protocol. Same wedge reproduces under three additional failure modes surfaced in review; all closed in this PR.
Symptom — the model 400s with:
and pressing Ctrl+Y to retry yields the same error indefinitely. The chat session is permanently wedged.
Root cause — mid-stream
tool_usedropThe Anthropic-compatible SSE emits
tool_usecontent blocks ascontent_block_start→input_json_delta*→content_block_stop→message_delta→message_stop. InanthropicContentGenerator.processStream, thefunctionCallchunk is yielded atcontent_block_stop.When the SSE drops between
content_block_stopof a tool_use and the terminalmessage_stop(typical for flaky train/airplane WiFi):functionCallchunk has already been yielded →Turn.runregisters aToolCallRequestevent →useGeminiStreamschedules the tool after the for-await exits.processStreamResponse'sthis.history.push({role:'model'…})never runs — the for-await throws before reaching it.handleCompletedTools→submitQuery(…, ToolResult)→geminiChat.sendMessageStreampushesuser:[functionResponse]into history.[…, user:"query", user:[tool_result]]. The converter emits two consecutive user messages where the second carriestool_resultblocks but the preceding message has no matchingtool_use→ 400.stripOrphanedUserEntriesFromHistoryonly pops trailing user entries. The losttool_useis unrecoverable, and the chat is wedged for the rest of the session.Fixes
1. Persist partial assistant turn when stream errors mid
tool_useprocessStreamResponse(packages/core/src/core/geminiChat.ts) wraps its for-await in a try/catch. On stream error, if anyfunctionCallchunk was already yielded (hasToolCall === true) and we accumulated parts, persist the partial assistant turn into history before re-throwing — so the eventualtool_resultsubmission has its matchingtool_use.Plain-text partial turns (no
functionCall) are intentionally not persisted: the Retry path pops the trailing user prompt and re-issues it; a stale partial-text model turn between them would either bias the retry or surface as duplicate output.The shared
processStreamResponsecovers both Anthropic and OpenAI content generators, so the fix applies to all anthropic-compatible and openai-compatible providers — not just DeepSeek.2. Close
tool_use ↔ tool_resultinvariant at every failure pointSurfaced in @yiliang114's review: the partial-history push above relies on the React tool scheduler eventually producing a matching
tool_result. Three races break that contract:model[tool_use], not user, sostripOrphanedUserEntriesFromHistorydoesn't pop it. Retry pushes a fresh user turn after the orphan tool_use. Meanwhile the scheduler'sonAllToolCallsCompleteis single-shot AND gated onisResponding(useGeminiStream.ts:1971), so the eventual realtool_resultis silently swallowed.model[tool_use]wedges the first API call after--resume.Three pieces working together close every race:
repairOrphanedToolUseTurns(history)ingeminiChat.ts— walks history left-to-right and synthesizes anerror-typedfunctionResponsefor everyfunctionCallwhoseidisn't echoed back in the next user turn. Appends to an existing user turn when present, otherwise inserts a new one.client.startChat()after loading the transcript — covers Race B/C (--resume).chat.sendMessageStreamimmediately afterthis.history.push(userContent)— covers Race A in-session. The user-supplied turn gets the first chance to close the pair before the synthesizer sees it as dangling, so a Retry of a previous ToolResult submission (lastPrompt is afunctionResponse-parts array) lands the realfunctionResponseinstead of a synthetic error.handleCompletedTools(useGeminiStream.ts) — before submitting tool results, scanchat.historyfor anyfunctionResponse.idalready present (planted by the repair pass) and drop those toolCalls' responseParts from the wire payload while stillmarkToolsAsSubmittedso the UI/scheduler advances. The dedup runs before theisRespondingearly-return so the scheduler's single-shotonAllToolCallsCompletedoesn't leave the tool stuck incompleted-but-not-submittedwhen a Retry stream is in flight.This is the qwen-code analogue of Claude Code's
yieldMissingToolResultBlocks(query.ts:123-149), split across the core/cli boundary because our React scheduler runs out-of-band from the stream loop (upstream's in-bandStreamingToolExecutorcan atomically.discard()live tools at the synthesis point; we use history-dedup athandleCompletedToolsinstead to keep the in-flight scheduler's late real result from colliding with the synthesized error).Test plan
npx vitest run packages/core/src/core/geminiChat.test.ts— 90 tests pass (3 partial-history-push cases + 8 repair-helper unit cases + 2 chat.sendMessageStream integration cases)npx vitest run packages/core/src/core/client.test.ts— 131 tests pass (Retry path mocks updated for new method)npx vitest run packages/cli/src/ui/hooks/useGeminiStream.test.tsx— 92 tests pass (Race A end-to-end dedup integration test)npx vitest run --project '@qwen-code/qwen-code-core'— all 8186 core tests passtsc --noEmit -p packages/core/tsconfig.jsonclean--resumeof a session crashed mid-flight.Related
anthropics/claude-code#6836— upstream "tool_use/tool_result block mismatch" meta-issue with the same fix shape (~155 duplicates,oncalllabel, OPEN)Closes #4178