fix(core,cli): close tool_use↔tool_result invariant across all failure paths by wenshao · Pull Request #4176 · QwenLM/qwen-code

wenshao · 2026-05-15T13:14:33Z

Summary

Fixes an unrecoverable wedge on weak-network connections (e.g. on a train) when calling DeepSeek-V4-Pro (or any Anthropic-compatible backend) through the Anthropic protocol. Same wedge reproduces under three additional failure modes surfaced in review; all closed in this PR.

Symptom — the model 400s with:

unexpected messages.N.content.0: tool_use_id found in tool_result blocks:
call_00_xxx. Each tool_result block must have a corresponding tool_use
block in the previous message.

and pressing Ctrl+Y to retry yields the same error indefinitely. The chat session is permanently wedged.

Root cause — mid-stream `tool_use` drop

The Anthropic-compatible SSE emits tool_use content blocks as content_block_start → input_json_delta* → content_block_stop → message_delta → message_stop. In anthropicContentGenerator.processStream, the functionCall chunk is yielded at content_block_stop.

When the SSE drops between content_block_stop of a tool_use and the terminal message_stop (typical for flaky train/airplane WiFi):

The functionCall chunk has already been yielded → Turn.run registers a ToolCallRequest event → useGeminiStream schedules the tool after the for-await exits.
But processStreamResponse's this.history.push({role:'model'…}) never runs — the for-await throws before reaching it.
The tool finishes asynchronously → handleCompletedTools → submitQuery(…, ToolResult) → geminiChat.sendMessageStream pushes user:[functionResponse] into history.
History is now […, user:"query", user:[tool_result]]. The converter emits two consecutive user messages where the second carries tool_result blocks but the preceding message has no matching tool_use → 400.
stripOrphanedUserEntriesFromHistory only pops trailing user entries. The lost tool_use is unrecoverable, and the chat is wedged for the rest of the session.

Fixes

1. Persist partial assistant turn when stream errors mid `tool_use`

processStreamResponse (packages/core/src/core/geminiChat.ts) wraps its for-await in a try/catch. On stream error, if any functionCall chunk was already yielded (hasToolCall === true) and we accumulated parts, persist the partial assistant turn into history before re-throwing — so the eventual tool_result submission has its matching tool_use.

Plain-text partial turns (no functionCall) are intentionally not persisted: the Retry path pops the trailing user prompt and re-issues it; a stale partial-text model turn between them would either bias the retry or surface as duplicate output.

The shared processStreamResponse covers both Anthropic and OpenAI content generators, so the fix applies to all anthropic-compatible and openai-compatible providers — not just DeepSeek.

2. Close `tool_use ↔ tool_result` invariant at every failure point

Surfaced in @yiliang114's review: the partial-history push above relies on the React tool scheduler eventually producing a matching tool_result. Three races break that contract:

Race A — Ctrl+Y while the in-flight tool hasn't finished. Trailing entry is model[tool_use], not user, so stripOrphanedUserEntriesFromHistory doesn't pop it. Retry pushes a fresh user turn after the orphan tool_use. Meanwhile the scheduler's onAllToolCallsComplete is single-shot AND gated on isResponding (useGeminiStream.ts:1971), so the eventual real tool_result is silently swallowed.
Race B — process crash / OOM / SIGKILL between the partial-tool_use push and tool completion. The dangling model[tool_use] wedges the first API call after --resume.
Race C — manual JSONL edits / external tooling leaving the same dangling shape.

Three pieces working together close every race:

repairOrphanedToolUseTurns(history) in geminiChat.ts — walks history left-to-right and synthesizes an error-typed functionResponse for every functionCall whose id isn't echoed back in the next user turn. Appends to an existing user turn when present, otherwise inserts a new one.
Wiring at two points:
- client.startChat() after loading the transcript — covers Race B/C (--resume).
- chat.sendMessageStream immediately after this.history.push(userContent) — covers Race A in-session. The user-supplied turn gets the first chance to close the pair before the synthesizer sees it as dangling, so a Retry of a previous ToolResult submission (lastPrompt is a functionResponse-parts array) lands the real functionResponse instead of a synthetic error.
History-based dedup in handleCompletedTools (useGeminiStream.ts) — before submitting tool results, scan chat.history for any functionResponse.id already present (planted by the repair pass) and drop those toolCalls' responseParts from the wire payload while still markToolsAsSubmitted so the UI/scheduler advances. The dedup runs before the isResponding early-return so the scheduler's single-shot onAllToolCallsComplete doesn't leave the tool stuck in completed-but-not-submitted when a Retry stream is in flight.

This is the qwen-code analogue of Claude Code's yieldMissingToolResultBlocks (query.ts:123-149), split across the core/cli boundary because our React scheduler runs out-of-band from the stream loop (upstream's in-band StreamingToolExecutor can atomically .discard() live tools at the synthesis point; we use history-dedup at handleCompletedTools instead to keep the in-flight scheduler's late real result from colliding with the synthesized error).

Test plan

npx vitest run packages/core/src/core/geminiChat.test.ts — 90 tests pass (3 partial-history-push cases + 8 repair-helper unit cases + 2 chat.sendMessageStream integration cases)
npx vitest run packages/core/src/core/client.test.ts — 131 tests pass (Retry path mocks updated for new method)
npx vitest run packages/cli/src/ui/hooks/useGeminiStream.test.tsx — 92 tests pass (Race A end-to-end dedup integration test)
npx vitest run --project '@qwen-code/qwen-code-core' — all 8186 core tests pass
tsc --noEmit -p packages/core/tsconfig.json clean
CI green on all four commits (Lint + Test×3 platforms + CodeQL)
Manual: throttle network, trigger a tool_use response, kill the connection mid-stream, confirm the next user/tool_result submission succeeds instead of 400ing. Ctrl+Y mid-flight and --resume of a session crashed mid-flight.

Issue Add SSE stream idle watchdog to abort hung streams (weak-network hang) #4177 — SSE stream idle watchdog (independent follow-up; addresses the hang-forever symptom where the SSE goes silent without dropping)
anthropics/claude-code#6836 — upstream "tool_use/tool_result block mismatch" meta-issue with the same fix shape (~155 duplicates, oncall label, OPEN)

Closes #4178

…_use Weak-network failures during an Anthropic-compatible stream (DeepSeek, api.anthropic.com, etc.) can drop the SSE between a tool_use `content_block_stop` and the terminal `message_stop`. The functionCall chunk is already yielded at content_block_stop, so: - Turn.run records a ToolCallRequest event. - useGeminiStream's for-await exits and schedules the tool. - handleCompletedTools eventually fires submitQuery(..., ToolResult) and pushes a user[functionResponse] into history. - But processStreamResponse's history.push for the model turn never ran (the for-await threw first), so the matching tool_use is gone. The next request body has `user → user[tool_result]` with no tool_use in between, and the server rejects with HTTP 400: "tool_use_id ... must have a corresponding tool_use block in the previous message". Ctrl+Y can't recover because stripOrphanedUserEntriesFromHistory only strips trailing user entries — the lost tool_use is unrecoverable, and the session is wedged. Wrap the for-await loop in processStreamResponse with try/catch. When the stream throws AND any functionCall chunk was already yielded (hasToolCall=true), persist the partial assistant turn to history before re-throwing. The eventual tool_result submission then has a matching tool_use and the session can continue. Plain-text partial turns (no functionCall yielded) are intentionally NOT persisted: the Retry path pops the trailing user prompt and re-issues it, so a stale partial-text model turn between them would either bias the retry or surface as duplicate output.

github-actions · 2026-05-15T13:16:30Z

📋 Review Summary

This PR fixes a critical unrecoverable wedge that occurs when streaming connections drop mid-tool_use on weak networks. The fix ensures partial assistant turns containing tool calls are persisted to history before re-throwing stream errors, maintaining the tool_use/tool_result pairing invariant required by Anthropic-compatible APIs. The implementation is well-reasoned with comprehensive tests and excellent documentation.

🔍 General Feedback

Exceptional documentation: The code comments and PR description provide outstanding context about the root cause, including the specific Anthropic SSE emission pattern (content_block_start → input_json_delta* → content_block_stop → message_delta → message_stop)
Targeted fix: The change is surgical—only persisting partial turns when hasToolCall === true, deliberately skipping text-only partials to avoid biasing retry behavior
Shared code path benefit: The fix in processStreamResponse automatically covers both Anthropic and OpenAI content generators
Test coverage: Two new test cases precisely validate both branches of the new error handling logic

🎯 Specific Feedback

🟡 High

File: packages/core/src/core/geminiChat.ts:1320 - The streamError capture pattern is sound, but consider whether the caught error should be logged before re-throwing for debugging purposes. Weak-network issues are notoriously hard to reproduce, and having structured logs when this recovery path triggers would aid future debugging.

Suggested fix:
```
  } catch (e) {
    streamError = e;
    // Log for debugging weak-network recovery path
    if (hasToolCall) {
      debugLogger.log('Persisting partial assistant turn due to stream error');
    }
  }
```

🟢 Medium

File: packages/core/src/core/geminiChat.ts:1257-1268 - The comment explaining streamError is extremely detailed (which is good), but it's positioned before the variable is actually used. Consider moving this extensive explanation to the recovery branch at line 1383 where the actual decision logic lives, and keep a shorter note here.

Suggested restructuring:

  let streamError: unknown = null;  // Captured mid-stream; see recovery logic below

  try {
    // ... loop unchanged ...
  } catch (e) {
    streamError = e;
  }

  // ... later, at the recovery branch ...
  /**
   * Mid-stream failure recovery: [full explanation currently at line 1257]
   */
  if (streamError !== null) { ... }

File: packages/core/src/core/geminiChat.test.ts:702-760 - The test "persists partial assistant turn when stream throws after a tool_use chunk" is thorough but could benefit from explicitly asserting that the history length is exactly 2 (user + model) after the error, making the test's intent even clearer to future readers.

Suggested addition (after line 758):
```
    expect(history.length).toBe(2);  // user turn + model turn with tool_use
```

🔵 Low

File: packages/core/src/core/geminiChat.ts:1383 - The recovery comment block is ~20 lines. While the detail is valuable, consider extracting this into a private method like shouldPersistPartialTurn() with an inline JSDoc. This would make the main flow easier to scan and the recovery logic independently testable.
File: packages/core/src/core/geminiChat.test.ts - Consider adding a test case that simulates the full retry flow (Ctrl+Y scenario) to verify that the persisted tool_use allows the subsequent ToolResult submission to succeed rather than 400ing. This would be an integration-style test complementing the current unit tests.

✅ Highlights

Brilliant root cause analysis: The PR description's explanation of the Anthropic SSE emission pattern and how the drop between content_block_stop and message_stop causes the wedge is exemplary technical writing
Discriminating fix: The deliberate choice to persist tool_use partials but NOT text-only partials shows deep understanding of the retry path's behavior and avoids introducing duplicate output issues
Comprehensive test coverage: The two new test cases cover both branches of the new logic with realistic stream simulations
Cross-provider benefit: Fixing the shared processStreamResponse method means Anthropic, OpenAI, and DeepSeek all benefit without duplicate changes
Clear failure mode documentation: The comment explaining why stripOrphanedUserEntries can't recover from this scenario (it only strips trailing user entries, not resurrect lost tool_use blocks) is invaluable for future maintainers

…ry fix Adds a third case to the partial-history persistence test suite for reasoning-mode providers (DeepSeek thinking, Claude 4.6+ adaptive): when the assistant turn streams a thinking block AND a tool_use before the SSE drops, the partial push must keep the thinking part before the functionCall so DeepSeek's `injectThinkingOnToolUseTurns` converter pass sees an existing block on the replayed turn and does not pre-pend a synthetic empty one (which would discard the model's original reasoning text).

Copilot

Pull request overview

Fixes a wedge state on weak networks where Anthropic-compatible streams (DeepSeek, Anthropic, etc.) drop after a tool_use content_block_stop but before message_stop. The yielded functionCall chunk causes downstream tool execution and a follow-up tool_result user turn, but no matching tool_use was ever pushed into history, leading to permanent 400s. The fix wraps the stream loop in processStreamResponse with try/catch and persists the partial assistant turn into history when a tool call has already been yielded before re-throwing.

Changes:

Capture mid-stream errors in processStreamResponse instead of letting them abort post-loop processing.
When hasToolCall is true on error, push the partial assistant turn (thinking + consolidated parts) into history before re-throwing, preserving tool_use/tool_result pairing.
Add three vitest cases covering tool-only, thinking+tool, and text-only mid-stream failures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
packages/core/src/core/geminiChat.ts	Adds try/catch around stream iteration and a recovery branch that conditionally persists the partial model turn before re-throwing.
packages/core/src/core/geminiChat.test.ts	Adds three new tests verifying partial-turn persistence with tool_use, with thinking+tool, and non-persistence for text-only failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yiliang114 · 2026-05-15T13:35:48Z

@@ -1369,6 +1383,46 @@ export class GeminiChat {
      });


Non-blocking follow-up.

This persists the partial tool_use and relies on handleCompletedTools to eventually push a matching tool_result. Two paths re-create the same wedge one layer down:

Ctrl+Y before the tool finishes — history is [user, model(tool_use)], no trailing user entry for stripOrphanedUserEntriesFromHistory to strip. The retry sends the orphan tool_use and 400s with the same error this PR is trying to escape.

Scheduler aborts / process killed before submitQuery(ToolResult) fires — the persisted tool_use becomes a permanent orphan. Same wedge.

The stronger invariant is to close the pair at the failure point itself: for every already-yielded tool_use, synthesize an is_error: true tool_result before re-throwing, instead of betting on a downstream consumer. anthropics/claude-code does this in src/query.ts (yieldMissingToolResultBlocks, ~L123) and calls it from both the stream-error and the user-abort paths — the pair is never out of sync.

Current fix recovers the common case, so not blocking. Worth a follow-up to also cover the retry race and the abort path symmetrically.

Confirmed both paths reproduce — handleCompletedTools is gated on isResponding (useGeminiStream.ts:1971) and useReactToolScheduler.allToolCallsCompleteHandler is single-shot (useReactToolScheduler.ts:127-132), so a Ctrl+Y that lands between the partial-push and tool completion does silently swallow the eventual tool_result. The abort-before-submit path is the same shape.

(Worth noting: this race exists pre-#4176 too — any Ctrl+Y while a tool is still running on a normally-completed stream hits the same wedge. This PR moves the partial-tool_use case into the same race class, but doesn't widen it.)

On yieldMissingToolResultBlocks shape — straight port has a corner: if we synthesize is_error: true at the failure point AND the live scheduler later completes, handleCompletedTools submits a second tool_result with the same tool_use_id, producing two consecutive user turns (orphan tool_result in the trailing one). Upstream sidesteps this because their StreamingToolExecutor is owned by the same loop that does the synthesis — it can discard() in-flight tools atomically (query.ts:733). Qwen-code's React scheduler runs out-of-band, so the synthesis path needs to coordinate cancellation/dedup with it.

Folded both races + the cancel-coordination requirement into #4178 — that's where the closing-at-failure-point work belongs. Will tackle it as the next PR; #4176 stays minimal to ship the common-case recovery now.

Folded both races + the cancel-coordination requirement into this PR in 3de3241a2 rather than splitting to #4178. Three pieces:

repairOrphanedToolUseTurns(history) synthesizes error functionResponse for every dangling functionCall.id — close to upstream's yieldMissingToolResultBlocks shape.

Wired into startChat() (Race B/C — --resume of a crashed session) and the sendMessageStream Retry branch after stripOrphanedUserEntriesFromHistory (Race A — Ctrl+Y mid-flight).

handleCompletedTools dedupes against chat.history before submitting tool_results — that's the cross-layer coordination piece you flagged. Since our React scheduler runs out-of-band (vs. upstream's in-band StreamingToolExecutor that can .discard() atomically at synthesis), the synthesizer plants the synthetic functionResponse first and the scheduler drops its late real result when it sees a matching callId already in history. Same trade-off upstream's .discard() + yieldMissingToolResultBlocks makes — the model sees the synthetic error and can retry the tool if it still wants the result.

Tests: 8 new repair-helper cases in geminiChat.test.ts (Race A merge-into-existing-user-turn, Race B trailing-tool_use-only, parallel tool_use with partial responses, no-op on already-paired history, multiple non-adjacent dangling rounds, caller-supplied reason text, no-op on tool-free history). 131/131 client.test.ts and 91/91 useGeminiStream.test.tsx still passing.

#4178 will be closed once this PR merges (its scope is now this PR). #4177 (idle watchdog) stays as the independent follow-up.

yiliang114

The fix recovers the common SSE-drop case correctly. Left one non-blocking follow-up inline about closing the tool_use ↔ tool_result invariant at the failure point rather than handing off to the scheduler.

wenshao · 2026-05-15T13:52:58Z

For reviewers — this PR is the minimal, surgical fix. The broader weak-network resilience story has two follow-ups filed so they can be reviewed independently:

Add SSE stream idle watchdog to abort hung streams (weak-network hang) #4177 — SSE stream idle watchdog (90s default). Addresses the hang-forever symptom, which fix(core,cli): close tool_use↔tool_result invariant across all failure paths #4176 alone doesn't fix (we still wait on streamResponse.next() indefinitely if the SSE goes silent without dropping). Modeled on claude-code/services/api/claude.ts:1868-1928.
Close tool_use↔tool_result invariant at failure point (defense in depth) #4178 — Synthetic is_error: true tool_result repair pass at session-load and Retry. Defense-in-depth for residual paths where the scheduler contract doesn't hold (process crash, --resume of a crashed session, scheduler bugs). Modeled on claude-code/query.ts:123-149 (yieldMissingToolResultBlocks).

Together (#4176 + #4177 + #4178) qwen-code's behavior on weak networks would be comparable to upstream Claude Code, addressing the same class of bug as anthropics/claude-code#6836 (open meta-issue with ~155 duplicate reports, oncall label).

Extends the partial-history fix in fe35e37 to cover the residual race paths surfaced in PR QwenLM#4176 review: - Race A: Ctrl+Y while in-flight tool hasn't finished. History is [user, model(tool_use)] — `stripOrphanedUserEntriesFromHistory` only pops trailing user entries, so the retry payload lands as a fresh user turn after the orphan tool_use and API rejects. Meanwhile the scheduler's `onAllToolCallsComplete` is single-shot and gated on `isResponding`, so the eventual tool_result is silently swallowed. - Race B: process crash / OOM / SIGKILL between the partial-tool_use push and the React scheduler's tool_result submission. On `--resume` the dangling model(tool_use) wedges the first API call. - Race C: external tooling / manual JSONL edits leaving the same dangling shape. The fix has three pieces working together: 1. `repairOrphanedToolUseTurns(history)` in geminiChat.ts walks history left-to-right and synthesizes an `error`-typed functionResponse for every functionCall whose id is not echoed back in the next user turn. Appends to an existing user turn when present, otherwise inserts a new one. Returns the injected (callId, name) list. 2. `GeminiClient.repairOrphanedToolUseTurnsInHistory()` wraps the helper and is called from three points: - `startChat()` after loading the transcript (Race B/C, --resume). - `sendMessageStream` Retry branch after stripOrphans (Race A). - `sendMessageStream` UserQuery/Cron branch (defensive belt-and- suspenders for anything that slipped past 1 and 2). 3. `handleCompletedTools` in useGeminiStream.ts dedupes against chat.history before submitting tool_results — if a synthetic functionResponse for the same callId is already present (planted by the repair pass), the in-flight scheduler's late result is dropped and the call is `markToolsAsSubmitted` so the UI advances. Same trade-off upstream Claude Code's `StreamingToolExecutor.discard()` makes — late real results are dropped on the wire after synthesis, the model sees the synthetic error and can retry the tool if it still wants the result. Together with the partial-history push from fe35e37, every tool_use that ever streamed to the consumer is guaranteed to have a matching tool_result on the wire — regardless of whether the stream errored, the user retried mid-flight, the process crashed, or the session is later resumed. This is the qwen-code analogue of upstream Claude Code's `yieldMissingToolResultBlocks` (query.ts:123-149), but split across the core/cli boundary because the React tool scheduler runs out-of-band from the stream loop (so the synthesis path can't atomically discard in-flight tools the way upstream's StreamingToolExecutor can; the history-dedup at handleCompletedTools fills that gap instead). Tests: - 8 new repair-helper tests in geminiChat.test.ts cover Race A, Race B, partial coverage of parallel tool_use, idempotence on already-paired history, no-op on tool-free history, caller- supplied reason text, multiple non-adjacent dangling rounds, and routes through the GeminiChat instance-method wrapper. - client.test.ts mocks updated for the new GeminiChat method. - All 88 geminiChat tests pass; 131 client tests pass; 91 useGeminiStream tests pass.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

+export function repairOrphanedToolUseTurns(
+  history: Content[],
+  reason: string = ORPHAN_TOOL_USE_REPAIR_REASON,
+): { injected: Array<{ callId: string; name: string }> } {
+  const injected: Array<{ callId: string; name: string }> = [];
+
+  // Forward walk: i mutates as we splice, so use index-based iteration
+  // and skip the freshly-inserted user turn to avoid re-scanning it.
+  for (let i = 0; i < history.length; i++) {
+    const turn = history[i];
+    if (turn.role !== 'model') continue;
+
+    // Collect (id → name) for every functionCall in this model turn.
+    const expected = new Map<string, string>();
+    for (const part of turn.parts ?? []) {
+      const fc = part.functionCall;
+      if (fc?.id) {
+        expected.set(fc.id, fc.name ?? 'unknown');
+      }
+    }
+    if (expected.size === 0) continue;
+
+    const next = history[i + 1];
+    const matched = new Set<string>();
+    if (next?.role === 'user') {
+      for (const part of next.parts ?? []) {
+        const id = part.functionResponse?.id;
+        if (id) matched.add(id);
+      }
+    }
+
+    const missing = [...expected.entries()].filter(([id]) => !matched.has(id));
+    if (missing.length === 0) continue;
+
+    const syntheticParts: Part[] = missing.map(([callId, name]) => ({
+      functionResponse: {
+        id: callId,
+        name,
+        response: { error: reason },
+      },
+    }));
+
+    if (next?.role === 'user') {
+      next.parts = [...(next.parts ?? []), ...syntheticParts];
+    } else {
+      history.splice(i + 1, 0, { role: 'user', parts: syntheticParts });
+      // Skip the freshly-inserted user turn so the outer loop doesn't
+      // visit it as a model turn (it isn't) and stays linear-time.
+      i++;
+    }
+
+    for (const [callId, name] of missing) {
+      injected.push({ callId, name });
+    }
+  }
+
+  return { injected };
+}


    // Record assistant turn with raw Content and metadata
    if (thoughtContentPart || contentText || hasToolCall || usageMetadata) {
      const contextWindowSize =
        this.config.getContentGeneratorConfig()?.contextWindowSize;
      this.chatRecordingService?.recordAssistantTurn({
        model,
        message: [
          ...(thoughtContentPart ? [thoughtContentPart] : []),
          ...(contentText ? [{ text: contentText }] : []),
          ...(hasToolCall
            ? contentParts
                .map(redactStructuredOutputArgsForRecording)
                .filter(
                  (
                    p,
                  ): p is { functionCall: NonNullable<Part['functionCall']> } =>
                    p !== null,
                )
            : []),
        ],
        tokens: usageMetadata,
        contextWindowSize,
      });
    }

+    // Mid-stream failure recovery: if the upstream stream threw (typical on
+    // weak networks — SSE cut between a tool_use `content_block_stop` and
+    // the terminal `message_stop`) AND any `functionCall` chunk was already
+    // yielded to consumers, we must persist the partial assistant turn here.
+    //
+    // The content generator (Anthropic / OpenAI) emits a `functionCall` part
+    // only at the end of a tool_use block. Once yielded, `Turn.run` registers
+    // a `ToolCallRequest` event, the React tool scheduler queues the call,
+    // and `handleCompletedTools` will fire `submitQuery(..., ToolResult)` —
+    // pushing a user message with `functionResponse` into history — even
+    // though the parent stream errored. Without preserving the matching
+    // tool_use on the model side, the next request body would have
+    // `user → user[tool_result]` with no tool_use in between, and the
+    // Anthropic-compatible API (DeepSeek, Anthropic, etc.) rejects with
+    //   "tool_use_id ... must have a corresponding tool_use block in the
+    //    previous message"
+    // — an unrecoverable state because Ctrl+Y's `stripOrphanedUserEntries`
+    // only strips trailing user entries; the lost tool_use can't be
+    // resurrected.
+    //
+    // Plain-text partial turns (no functionCall yielded) are deliberately
+    // NOT persisted — the Retry path pops the trailing user prompt and
+    // re-issues it; a stale partial-text model turn between them would
+    // either bias the retry or surface as a duplicate.
+    if (streamError !== null) {
+      if (
+        hasToolCall &&
+        (thoughtContentPart || consolidatedHistoryParts.length > 0)
+      ) {
+        this.history.push({
+          role: 'model',
+          parts: [
+            ...(thoughtContentPart ? [thoughtContentPart] : []),
+            ...consolidatedHistoryParts,
+          ],
+        });
+      }
+      throw streamError;
+    }


Post-review audit of 3de3241 surfaced two real races the earlier fix did not actually close: - Race A (the one yiliang114 originally flagged): handleCompletedTools is gated on `isResponding` (useGeminiStream:1971) BEFORE the dedup branch ran, so when the user Ctrl+Y'd mid-tool, the dedup never fired and the in-flight tool stayed permanently stuck in `completed-but-not-submitted` (the scheduler's `allToolCallsCompleteHandler` is single-shot). The dedup branch was structurally unreachable on the very race it was meant to defend against. - Race in the Retry path: the repair pass in `client.sendMessageStream` Retry branch ran BEFORE the chat-internal `push(userContent)`, so a Retry of a previous ToolResult submission (lastPrompt is a functionResponse part array) raced its own real `functionResponse` against the synthesized error one. The synthesis was winning and pre-empting the real result, producing two `functionResponse` entries with the same callId on the wire. This commit relocates both pieces to where the contract actually holds: 1. Move the dedup branch in `handleCompletedTools` to BEFORE the `isResponding` early-return so `markToolsAsSubmitted` always runs for callIds already paired in history, unblocking the UI/scheduler even when a new stream is in flight. The submission-side `geminiTools = completedAndReadyToSubmitTools.filter(... && !inHistory ...)` then covers the no-double-submit case in one expression. 2. Move the repair call from `client.sendMessageStream` Retry branch into `chat.sendMessageStream` immediately AFTER the user-supplied turn is pushed. The user's own tool_result (when the Retry payload carries one) gets the first chance to close the pair before the synthesizer sees it as dangling. `client.startChat()` keeps a belt-and-suspenders pass at session-load time so any pre-send code reading `chat.history` sees a well-formed shape. Adds three integration tests covering the new wiring: - geminiChat.test.ts: chat.sendMessageStream synthesizes a functionResponse when history carries a dangling tool_use AND the user-supplied content doesn't already close it (Race B/C plus Race A from the chat side). - geminiChat.test.ts: chat.sendMessageStream does NOT synthesize when the user-supplied content IS a matching functionResponse (the Retry-of-ToolResult race the previous wiring tripped on). - useGeminiStream.test.tsx: handleCompletedTools dedups a late real result whose callId already has a functionResponse in chat.history (Race A end-to-end: markToolsAsSubmitted fires but no submitQuery is dispatched). Test summary: 90/90 geminiChat, 131/131 client, 92/92 useGeminiStream; all 8186 core tests pass; tsc clean. Known limitation logged for follow-up: a Retry of ToolResult whose intervening stream itself partial-pushed a SECOND tool_use produces a trailing user turn with both the stale re-pushed `fr_A` and the synthesized `fr_B` for the second call — `fr_A` retry lands as an orphan because `fc_A` was already paired earlier in history. Triggered only when partial-push fires on the second stream AND user retries the tool_result rather than the user prompt; extremely low frequency in practice. Cleanest fix is in the retryLastPrompt layer (don't re-push an already-paired tool_result), which is out of scope for this PR.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

+      // History-based dedup MUST run before the `isResponding` early-return.
+      // If a synthetic `functionResponse` for this callId is already in
+      // chat.history (planted on session-load by
+      // `client.repairOrphanedToolUseTurnsInHistory` or on every
+      // `chat.sendMessageStream` push by the inline repair pass), the
+      // in-flight scheduler result must be marked submitted NOW —
+      // `useReactToolScheduler.allToolCallsCompleteHandler` is single-shot
+      // per batch, so a later isResponding=true early-return would leave
+      // the tool stuck in `completed-but-not-submitted` forever (Race A
+      // surfaced in PR #4176 review). The real result is dropped on the
+      // wire — same trade-off upstream Claude Code makes when its
+      // `StreamingToolExecutor.discard()` follows a
+      // `yieldMissingToolResultBlocks` synthesis (`query.ts:733` + `:984`).
+      const historyCallIdsWithResponse = new Set<string>();
+      // Guard the call: some test harnesses build a partial GeminiClient
+      // mock without `getHistory`. Skipping dedup in that case is safe —
+      // it just means tests that never set up the repair pre-condition
+      // run with the original (pre-dedup) submission shape.
+      if (geminiClient && typeof geminiClient.getHistory === 'function') {
+        for (const entry of geminiClient.getHistory()) {
+          if (entry.role !== 'user') continue;
+          for (const part of entry.parts ?? []) {
+            const id = part.functionResponse?.id;
+            if (id) historyCallIdsWithResponse.add(id);
+          }
+        }
+      }
+      const dedupedCallIds = completedAndReadyToSubmitTools
+        .filter((tc) => historyCallIdsWithResponse.has(tc.request.callId))
+        .map((tc) => tc.request.callId);
+      if (dedupedCallIds.length > 0) {
+        debugLogger.warn(
+          `[REPAIR] Dropping ${dedupedCallIds.length} late tool result(s) ` +
+            `whose callId already has a functionResponse in history: ` +
+            `${dedupedCallIds.join(', ')}`,
+        );
+        markToolsAsSubmitted(dedupedCallIds);
+      }
+
+      if (isResponding) {
+        return;
+      }


    if (thoughtContentPart || contentText || hasToolCall || usageMetadata) {
      const contextWindowSize =
        this.config.getContentGeneratorConfig()?.contextWindowSize;
      this.chatRecordingService?.recordAssistantTurn({
        model,
        message: [
          ...(thoughtContentPart ? [thoughtContentPart] : []),
          ...(contentText ? [{ text: contentText }] : []),
          ...(hasToolCall
            ? contentParts
                .map(redactStructuredOutputArgsForRecording)
                .filter(
                  (
                    p,
                  ): p is { functionCall: NonNullable<Part['functionCall']> } =>
                    p !== null,
                )
            : []),
        ],
        tokens: usageMetadata,
        contextWindowSize,
      });
    }

+    // Mid-stream failure recovery: if the upstream stream threw (typical on
+    // weak networks — SSE cut between a tool_use `content_block_stop` and
+    // the terminal `message_stop`) AND any `functionCall` chunk was already
+    // yielded to consumers, we must persist the partial assistant turn here.
+    //
+    // The content generator (Anthropic / OpenAI) emits a `functionCall` part
+    // only at the end of a tool_use block. Once yielded, `Turn.run` registers
+    // a `ToolCallRequest` event, the React tool scheduler queues the call,
+    // and `handleCompletedTools` will fire `submitQuery(..., ToolResult)` —
+    // pushing a user message with `functionResponse` into history — even
+    // though the parent stream errored. Without preserving the matching
+    // tool_use on the model side, the next request body would have
+    // `user → user[tool_result]` with no tool_use in between, and the
+    // Anthropic-compatible API (DeepSeek, Anthropic, etc.) rejects with
+    //   "tool_use_id ... must have a corresponding tool_use block in the
+    //    previous message"
+    // — an unrecoverable state because Ctrl+Y's `stripOrphanedUserEntries`
+    // only strips trailing user entries; the lost tool_use can't be
+    // resurrected.
+    //
+    // Plain-text partial turns (no functionCall yielded) are deliberately
+    // NOT persisted — the Retry path pops the trailing user prompt and
+    // re-issues it; a stale partial-text model turn between them would
+    // either bias the retry or surface as a duplicate.
+    if (streamError !== null) {
+      if (
+        hasToolCall &&
+        (thoughtContentPart || consolidatedHistoryParts.length > 0)
+      ) {
+        this.history.push({
+          role: 'model',
+          parts: [
+            ...(thoughtContentPart ? [thoughtContentPart] : []),
+            ...consolidatedHistoryParts,
+          ],
+        });
+      }
+      throw streamError;
+    }


wenshao

⚠️ Downgraded from Approve to Comment: self-PR.

Review Summary: 2 files, +270/-36. Fixes unrecoverable wedge when streaming drops between tool_use content_block_stop and message_stop. Core logic is correct; 606 tests pass; tsc + eslint clean.

No high-confidence Critical issues found. 5 low-confidence suggestions identified (not posted as inline comments — needs human review):

AbortSignal mid-tool_use persists partial turn — catch 块未区分 abort 与网络错误，取消后 history 含孤立 tool_use。建议加守卫。
Zero logging on partial turn persistence — 静默修改 history，调试困难。建议加。
recordAssistantTurn 在错误路径无条件调用 — 失败流的部分数据被记录为正常 turn。建议移入块。
Missing thinking-only test — 推理模型场景（thought:true 无 functionCall）无回归覆盖。
Duplicate history.push 代码 — 错误/成功路径各一份相同实现，建议提取方法。

— DeepSeek/deepseek-v4-pro via Qwen Code /review

wenshao

⚠️ Downgraded from Request changes to Comment: self-PR.

wenshao · 2026-05-16T01:13:23Z

+    // The dedup MUST run regardless of `isResponding`, because the
+    // scheduler's `onAllToolCallsComplete` is single-shot and would
+    // otherwise leave the tool stuck in `completed-but-not-submitted`.
+    const lateRealResult: TrackedCompletedToolCall = {


[Critical] This fixture does not structurally satisfy TrackedCompletedToolCall, and the changed-file typecheck fails here with TS2352. As written, the PR cannot pass tsc on the changed files.

Please update the mock object to match the real TrackedCompletedToolCall shape (or choose a narrower test type that actually reflects what this test needs) instead of forcing the incompatible object through a direct assertion.

— gpt-5.5 via Qwen Code /review

wenshao · 2026-05-16T01:13:23Z

+      },
+    }));
+
+    if (next?.role === 'user') {


[Critical] Appending the synthetic functionResponse after existing user parts means the Ctrl+Y retry shape becomes user: text, tool_result. The Anthropic converter preserves part order, so it serializes the retry prompt before the tool_result that is supposed to answer the immediately preceding tool_use. Anthropic-compatible backends require those tool_result blocks to come first in the following user message, so this can still reject the retry and leave the session wedged.

Insert the synthetic responses before the first non-functionResponse part when merging into an existing user turn, while preserving any real tool results that are already at the front.

Suggested change

if (next?.role === 'user') {

const existingParts = next.parts ?? [];

const firstNonFunctionResponseIndex = existingParts.findIndex(

(part) => !part.functionResponse,

);

const insertAt =

firstNonFunctionResponseIndex === -1

? existingParts.length

: firstNonFunctionResponseIndex;

next.parts = [

...existingParts.slice(0, insertAt),

...syntheticParts,

...existingParts.slice(insertAt),

];

— gpt-5.5 via Qwen Code /review

wenshao · 2026-05-16T01:13:23Z

+      // subsystem is trying to escape. (Belt-and-suspenders: the same
+      // helper runs again inside `chat.sendMessageStream` after the user
+      // content is pushed, so a dangling left here by setHistory /
+      // compaction reordering is also caught — but doing it here keeps


[Suggestion] This new startChat() wiring is the only resume-time integration point that repairs a dangling model[functionCall] before the first resumed send, but the tests only cover the helper directly and mock this method in client.test.ts. A future refactor could remove or reorder this call while the helper tests still pass, regressing the --resume recovery path this PR is meant to protect.

Please add a client-level test that starts a chat with extraHistory ending in a model turn containing a functionCall, then asserts the initialized chat history contains the synthetic user[functionResponse] before any send occurs.

— gpt-5.5 via Qwen Code /review

wenshao requested a review from Copilot May 15, 2026 13:25

Copilot started reviewing on behalf of wenshao May 15, 2026 13:26 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

yiliang114 reviewed May 15, 2026

View reviewed changes

This was referenced May 15, 2026

Add SSE stream idle watchdog to abort hung streams (weak-network hang) #4177

Open

Close tool_use↔tool_result invariant at failure point (defense in depth) #4178

Open

wenshao requested a review from Copilot May 15, 2026 14:19

Copilot started reviewing on behalf of wenshao May 15, 2026 14:19 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

wenshao requested a review from Copilot May 15, 2026 14:46

Copilot started reviewing on behalf of wenshao May 15, 2026 14:46 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

wenshao changed the title ~~fix(core): persist partial assistant turn when stream errors mid tool_use~~ fix(core,cli): close tool_use↔tool_result invariant across all failure paths May 15, 2026

wenshao requested a review from yiliang114 May 15, 2026 15:19

wenshao commented May 15, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 16, 2026

📊 AI CLI 工具社区动态日报 2026-05-16 zx0828/big_model_radar#58

Open

wenshao commented May 16, 2026

View reviewed changes

This was referenced May 16, 2026

📊 AI CLI 工具社区动态日报 2026-05-16 gsscsd/big_model_radar#351

Open

📊 AI CLI 工具社区动态日报 2026-05-16 ivanweng2077/big_model_radar#49

Open

-    if (next?.role === 'user') {
+      const existingParts = next.parts ?? [];
+      const firstNonFunctionResponseIndex = existingParts.findIndex(
+        (part) => !part.functionResponse,
+      );
+      const insertAt =
+        firstNonFunctionResponseIndex === -1
+          ? existingParts.length
+          : firstNonFunctionResponseIndex;
+      next.parts = [
+        ...existingParts.slice(0, insertAt),
+        ...syntheticParts,
+        ...existingParts.slice(insertAt),
+      ];

Conversation

wenshao commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause — mid-stream tool_use drop

Fixes

1. Persist partial assistant turn when stream errors mid tool_use

2. Close tool_use ↔ tool_result invariant at every failure point

Test plan

Related

Uh oh!

github-actions Bot commented May 15, 2026

📋 Review Summary

🔍 General Feedback

🎯 Specific Feedback

🟡 High

🟢 Medium

🔵 Low

✅ Highlights

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

yiliang114 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao May 15, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yiliang114 left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao May 16, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao May 16, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenshao commented May 15, 2026 •

edited

Loading

Root cause — mid-stream `tool_use` drop

1. Persist partial assistant turn when stream errors mid `tool_use`

2. Close `tool_use ↔ tool_result` invariant at every failure point