@@ -397,6 +397,75 @@ const ORPHAN_TOOL_USE_REPAIR_REASON =
397397 'Tool execution result was not recorded — likely interrupted by network ' +
398398 'failure, abort, or process exit. Treat as failure and retry if needed.' ;
399399
400+ /*
401+ * ============================================================================
402+ * Partial-tool_use repair subsystem — canonical design note.
403+ * ============================================================================
404+ *
405+ * Every comment block elsewhere in this file that mentions one of the
406+ * concepts below points back here. Per-site comments should be one or two
407+ * lines stating WHAT the local code does; the WHY lives here.
408+ *
409+ * --- The wedge ----------------------------------------------------------
410+ *
411+ * Anthropic-compatible backends (Anthropic, DeepSeek, …) reject a request
412+ * whose `user[tool_result]` blocks are not at the HEAD of the user message
413+ * immediately following the `model[tool_use]` they answer:
414+ *
415+ * "tool_use_id ... must have a corresponding tool_use block in the
416+ * previous message"
417+ *
418+ * Without a matching pair the session is unrecoverable — `stripOrphanedUser
419+ * EntriesFromHistory` only strips trailing user entries, so a lost tool_use
420+ * cannot be resurrected and the next send 400s repeatedly.
421+ *
422+ * --- The race classes that produce dangling tool_uses --------------------
423+ *
424+ * Race A (Ctrl+Y mid-flight): user retries before the in-flight tool
425+ * finishes. The scheduler's `onAllToolCallsComplete` is single-shot
426+ * per batch and would otherwise leave the tool stuck in
427+ * `completed-but-not-submitted` forever.
428+ * Race B (process crash / OOM mid-flight): the JSONL transcript captures
429+ * the dangling `model[fc]` and `--resume` rehydrates it.
430+ * Race C (network drop between `content_block_stop` of a tool_use and
431+ * the terminal `message_stop`): `processStreamResponse` re-throws
432+ * after we have already yielded a `functionCall` chunk, so the React
433+ * scheduler is on its way to submit a real `functionResponse` while
434+ * in-memory history has no matching `model[fc]`.
435+ *
436+ * --- The two-layer fix ---------------------------------------------------
437+ *
438+ * (1) Persist the partial assistant turn at the failure point in
439+ * `processStreamResponse` (`this.history.push({role: 'model', parts:
440+ * [...]})` plus the `pendingPartialAssistantTurnIndex` /
441+ * `pendingPartialAssistantRecord` markers) so the matching
442+ * `model[fc]` is on disk and in memory when the late `user[fr]`
443+ * arrives.
444+ * (2) Repair any remaining dangling `model[fc]` whose
445+ * `user[fr]` never landed (`repairOrphanedToolUseTurns`):
446+ * - SYNTHESIZE an `error` fr for ids with no matching response;
447+ * - HOIST the real fr into the immediately-adjacent user turn
448+ * when it landed in a non-adjacent later turn;
449+ * - DROP duplicate fr copies for the same id.
450+ * Then `useGeminiStream.handleCompletedTools` dedupes the
451+ * scheduler's late real result against `chat.history` so the
452+ * synthetic and the real result never collide on the wire.
453+ *
454+ * --- Partial-push marker lifecycle ---------------------------------------
455+ *
456+ * Set together on (streamError + hasToolCall + hasContent) inside
457+ * `processStreamResponse`. Cleared together by `popPartialIfPushed` on a
458+ * retryable error rollback, or flushed together to JSONL by the outer
459+ * `finally` after the retry loop exits. Defense-in-depth: every
460+ * history-mutation method (clearHistory / addHistory / setHistory /
461+ * truncateHistory / stripThoughtsFromHistory /
462+ * stripOrphanedUserEntriesFromHistory) resets both markers in lockstep so
463+ * a stale index can't shift onto an unrelated model turn and cause
464+ * `popPartialIfPushed` to splice the wrong entry. Any single-field reset
465+ * is a bug.
466+ * ============================================================================
467+ */
468+
400469/**
401470 * Walk `history` left-to-right and close every dangling tool_use ↔ tool_result
402471 * pair so the wire format the next API call sees is always
@@ -410,15 +479,9 @@ const ORPHAN_TOOL_USE_REPAIR_REASON =
410479 * `yieldMissingToolResultBlocks` (`query.ts:123-149`).
411480 * - HOIST: for any `functionCall.id` whose real `functionResponse` lives in
412481 * a non-adjacent following user turn (typical shape:
413- * `model[fc], user[text], user[fr_real]` — produced when a user aborts a
414- * long-running tool, types a follow-up, and the React scheduler's late
415- * `submitQuery` appends the real `fr` as a SEPARATE user entry), MOVE the
416- * real `fr` part out of its original turn into the adjacent one. Without
417- * hoisting, the synthesis pass correctly skips the call (a real `fr`
418- * exists somewhere later) but the wire layout still serializes
419- * `model[tool_use] → user[text] → user[tool_result]`, which
420- * Anthropic-compatible backends reject with "tool_use_id ... must have a
421- * corresponding tool_use block in the previous message".
482+ * `model[fc], user[text], user[fr_real]`), MOVE the real `fr` part out
483+ * of its original turn into the adjacent one. Required by the wire
484+ * layout — see the canonical design note above for the wedge.
422485 *
423486 * Mutates `history` in place and returns the set of injected `(callId, name)`
424487 * tuples so callers (the React tool scheduler) can dedupe a real `tool_result`
@@ -492,16 +555,10 @@ interface RepairPlan {
492555 * follows. Pure read; no mutation.
493556 *
494557 * Storing ALL locations (not just the first) is load-bearing for the
495- * duplicate case: if the same callId is echoed back more than once
496- * across the consecutive user turns (e.g.
497- * `model[fc id=cid], user[text], user[fr cid], user[fr cid]` — possible
498- * when the React scheduler retries the late `submitQuery` and a
499- * duplicate fr lands), hoisting only the first would leave the
500- * duplicate behind. The wire payload then serializes
501- * `model[tool_use] -> user[tool_result] -> user[tool_result]`
502- * and the backend rejects the trailing block as an orphan
503- * ("tool_use_id ... must have a corresponding tool_use block in the
504- * previous message").
558+ * duplicate case: e.g. `model[fc id=cid], user[text], user[fr cid],
559+ * user[fr cid]` (the React scheduler retries the late `submitQuery`).
560+ * The downstream decision phase needs every copy so it can drop
561+ * duplicates — otherwise the wire payload re-triggers the wedge.
505562 */
506563function scanModelTurn ( history : Content [ ] , modelIdx : number ) : ScanResult {
507564 const expected = new Map < string , string > ( ) ;
@@ -604,16 +661,10 @@ function planRepair(scan: ScanResult): RepairPlan {
604661 * user turn (before any non-fr parts) OR insert a new user turn
605662 * between `modelIdx` and whatever follows.
606663 *
607- * Anthropic-compatible backends require the tool_result blocks at the
608- * head of the immediately following user message; appending instead
609- * (`[text, fr]`) re-triggers the 400 the synthesis pass exists to
610- * escape. Mirrors upstream Claude Code's `hoistToolResults`.
611- *
612- * CONSEQUENCE OF REMOVAL of the head-insert: dropping this hoist (e.g.
613- * naively `next.parts = [...existing, ...partsToInject]`) re-introduces
614- * the "tool_use_id ... must have a corresponding tool_use block in the
615- * previous message" 400 the synthesis pass exists to prevent. Do not
616- * "simplify" this branch.
664+ * Step 3 inserts at the HEAD (before non-fr parts), not the tail —
665+ * mirrors upstream Claude Code's `hoistToolResults`. See the canonical
666+ * design note above `ORPHAN_TOOL_USE_REPAIR_REASON` for why a tail
667+ * append re-triggers the 400. Do not "simplify" the head-insert away.
617668 */
618669function applyRepair (
619670 history : Content [ ] ,
@@ -1172,15 +1223,13 @@ export class GeminiChat {
11721223 lastError = error ;
11731224
11741225 // If `processStreamResponse` persisted a partial assistant turn
1175- // (mid-stream error after a `functionCall` was already yielded),
1176- // every retry-and-continue path below must drop that turn first.
1177- // Otherwise a successful retry's response lands AFTER the stale
1178- // failed-attempt model turn — two consecutive `model` entries
1179- // with an orphan tool_use in the first, re-triggering the
1180- // "tool_use_id ... corresponding tool_use" 400 this fix is
1181- // supposed to escape. Paths that `break` (unretryable) keep
1182- // the partial — the caller will see it as part of the error
1183- // surface.
1226+ // (mid-stream error after a `functionCall` was already
1227+ // yielded), every retry-and-continue path below must drop
1228+ // that turn first; otherwise the retry's response lands as
1229+ // a second consecutive model turn with an orphan tool_use
1230+ // (the wedge — see the canonical note above
1231+ // `ORPHAN_TOOL_USE_REPAIR_REASON`). Paths that `break`
1232+ // (unretryable) keep the partial.
11841233 const popPartialIfPushed = ( ) => {
11851234 const idx = self . pendingPartialAssistantTurnIndex ;
11861235 if ( idx === null ) return ;
@@ -1302,21 +1351,10 @@ export class GeminiChat {
13021351 reactiveInfo . compressionStatus ===
13031352 CompressionStatus . COMPRESSED
13041353 ) {
1305- // Defense-in-depth no-op: tryCompress() succeeded
1306- // means it has already replaced this.history via
1307- // setHistory(), which calls clearPendingPartialState()
1308- // — so by the time we reach this line, the marker is
1309- // null and popPartialIfPushed splices nothing. We
1310- // keep the call as a uniformity assertion against
1311- // future refactors that might switch tryCompress to
1312- // an in-place mutation: in that world, the marker
1313- // would NOT be reset by setHistory and this call
1314- // becomes the only thing that drops the stale
1315- // partial before requestContents is rebuilt below.
1316- // Removing it would couple correctness to the
1317- // implementation detail "setHistory always clears
1318- // the marker", which the other retry branches don't
1319- // share.
1354+ // No-op today: tryCompress's setHistory has already
1355+ // cleared the marker. Kept for uniformity with the
1356+ // other retry branches in case a future in-place
1357+ // tryCompress stops resetting it.
13201358 popPartialIfPushed ( ) ;
13211359 requestContents = self . getRequestHistory ( ) ;
13221360 debugLogger . info (
@@ -1552,47 +1590,13 @@ export class GeminiChat {
15521590 // coalesced back into the preceding model entry after the loop.
15531591 successfulRecoveries ++ ;
15541592 } catch ( recoveryError ) {
1555- // If a recovery attempt fails (e.g., empty response, network
1556- // error), stop recovering and let the partial output stand.
1557- // Pop the dangling recovery message to keep history valid.
1558- //
1559- // Order matters: when the recovery stream errors AFTER
1560- // yielding a `functionCall` chunk, `processStreamResponse`
1561- // pushes a partial `model` turn into history before
1562- // re-throwing. The naive "if last is user, pop" check
1563- // would then no-op (last is now the partial `model`),
1564- // leaving `user(OUTPUT_RECOVERY_MESSAGE)` stranded as a
1565- // real user turn the user never sent. Two consequences:
1566- // - the control-prompt text (which carries instructions
1567- // meant only for the model's own continuation context)
1568- // pollutes durable history and biases later turns,
1569- // - the inline repair on the next sendMessageStream
1570- // synthesizes an `error` `functionResponse` for the
1571- // dangling `functionCall`, which the
1572- // `handleCompletedTools` history-based dedup then drops
1573- // when the React scheduler's REAL tool result arrives,
1574- // so the model sees an "execution result was not
1575- // recorded" error for a tool that actually succeeded.
1576- // Pop the partial model turn FIRST, then the recovery
1577- // user turn. The partial-push markers are also cleared
1578- // in lockstep so the outer `finally` JSONL flush can't
1579- // resurrect a partial we just deleted from live history.
1580- //
1581- // Index-checked pop instead of a positional `pop()` so
1582- // we match the diagnostic standard set by
1583- // `popPartialIfPushed` above (splice at `idx` + warn on
1584- // bounds/role mismatch). The two rollback strategies
1585- // share an undocumented positional assumption: nothing
1586- // mutates `this.history` between
1587- // `processStreamResponse`'s push and the for-await
1588- // catch here. If a future change inserts a mutation in
1589- // that window (compression side-effect, abort-signal
1590- // handler, telemetry hook), a naked
1591- // `history.pop()` would silently remove the wrong
1592- // entry while `clearPendingPartialState()` clears
1593- // markers for the actual partial — leaving it
1594- // permanently stranded with no log trail. The warn
1595- // makes any future violation visible immediately.
1593+ // Pop the partial `model[fc]` FIRST (if processStreamResponse
1594+ // pushed one before re-throwing), THEN the recovery user
1595+ // turn. Reversed order would strand `OUTPUT_RECOVERY_MESSAGE`
1596+ // as a real user turn. Index-checked pop mirrors
1597+ // `popPartialIfPushed` above — see the design note above
1598+ // `ORPHAN_TOOL_USE_REPAIR_REASON` for the wedge mechanism
1599+ // and the partial-push marker lifecycle.
15961600 const expectedIdx = self . pendingPartialAssistantTurnIndex ;
15971601 const lastIdx = self . history . length - 1 ;
15981602 if (
@@ -1936,34 +1940,18 @@ export class GeminiChat {
19361940 */
19371941 addHistory ( content : Content ) : void {
19381942 this . history . push ( content ) ;
1939- // The marker is per-send-attempt. Today's callers (cancelled-tool
1940- // synthesis in useGeminiStream, ACP session injects,
1941- // shellCommandProcessor) only run between sends, so the originating
1942- // sendMessageStream has either already popped the partial via the
1943- // retry loop or hit an unrecoverable break — in both cases the
1944- // marker is no longer load-bearing.
1945- //
1946- // If a future code path ever calls addHistory BETWEEN the partial
1947- // push and the retry attempt, silently clearing the marker would
1948- // strand the partial: popPartialIfPushed would no-op, the failed
1949- // attempt's `model[functionCall]` would survive into the retry,
1950- // and a successful retry's response would land as a SECOND
1951- // consecutive model turn (the wedge this whole subsystem exists
1952- // to prevent). The log below makes that coupling observable —
1953- // anyone investigating a stale-partial bug will see this log line
1954- // pointing straight at the offending caller. Error-level (not
1955- // warn) because this is a true invariant violation: the existing
1956- // call graph cannot legitimately hit this branch, so any
1957- // occurrence is a real bug in a future caller, not noise.
1943+ // addHistory only runs between sends, so the partial-push marker
1944+ // should already be cleared. If it is not, a new caller is
1945+ // violating that invariant — surface it at error level so the
1946+ // offending stack is visible. See the design note above
1947+ // `ORPHAN_TOOL_USE_REPAIR_REASON` for the marker lifecycle.
19581948 if (
19591949 this . pendingPartialAssistantTurnIndex !== null ||
19601950 this . pendingPartialAssistantRecord !== null
19611951 ) {
19621952 debugLogger . error (
19631953 '[INVARIANT_VIOLATION] addHistory called while a partial-push ' +
1964- 'marker is active — clearing it. This is unexpected during an active sendMessageStream ' +
1965- 'and likely indicates a new caller violating the between-sends ' +
1966- 'invariant. See comment at GeminiChat.addHistory for context.' ,
1954+ 'marker is active — clearing it.' ,
19671955 ) ;
19681956 }
19691957 this . clearPendingPartialState ( ) ;
@@ -2251,30 +2239,20 @@ export class GeminiChat {
22512239 }
22522240 }
22532241
2254- // Mid-stream failure recovery: if the upstream stream threw (typical on
2255- // weak networks — SSE cut between a tool_use `content_block_stop` and
2256- // the terminal `message_stop`) AND any `functionCall` chunk was already
2257- // yielded to consumers, we must persist the partial assistant turn here.
2258- //
2259- // The content generator (Anthropic / OpenAI) emits a `functionCall` part
2260- // only at the end of a tool_use block. Once yielded, `Turn.run` registers
2261- // a `ToolCallRequest` event, the React tool scheduler queues the call,
2262- // and `handleCompletedTools` will fire `submitQuery(..., ToolResult)` —
2263- // pushing a user message with `functionResponse` into history — even
2264- // though the parent stream errored. Without preserving the matching
2265- // tool_use on the model side, the next request body would have
2266- // `user → user[tool_result]` with no tool_use in between, and the
2267- // Anthropic-compatible API (DeepSeek, Anthropic, etc.) rejects with
2268- // "tool_use_id ... must have a corresponding tool_use block in the
2269- // previous message"
2270- // — an unrecoverable state because Ctrl+Y's `stripOrphanedUserEntries`
2271- // only strips trailing user entries; the lost tool_use can't be
2272- // resurrected.
2242+ // Mid-stream failure recovery (Race C in the canonical note above
2243+ // `ORPHAN_TOOL_USE_REPAIR_REASON`): if the upstream stream threw
2244+ // AFTER a `functionCall` chunk was already yielded — typical on
2245+ // weak networks: SSE cut between a tool_use `content_block_stop`
2246+ // and the terminal `message_stop` — we persist the partial
2247+ // assistant turn so the React scheduler's incoming
2248+ // `user[functionResponse]` has a matching `model[tool_use]` to
2249+ // pair with.
22732250 //
2274- // Plain-text partial turns (no functionCall yielded) are deliberately
2275- // NOT persisted — the Retry path pops the trailing user prompt and
2276- // re-issues it; a stale partial-text model turn between them would
2277- // either bias the retry or surface as a duplicate.
2251+ // Plain-text partial turns (no functionCall yielded) are
2252+ // deliberately NOT persisted — the Retry path pops the trailing
2253+ // user prompt and re-issues it; a stale partial-text model turn
2254+ // between them would either bias the retry or surface as a
2255+ // duplicate.
22782256 if ( streamError !== null ) {
22792257 // Reuse the `willPersistToHistory` gate from the recordAssistantTurn
22802258 // block above instead of re-deriving it. When `streamError !== null`,
0 commit comments