Skip to content

fix(cli): unstick streamingState after Esc-cancel + close leaked turn span#145

Merged
mabry1985 merged 1 commit into
devfrom
fix/cancel-clears-stuck-state
Apr 26, 2026
Merged

fix(cli): unstick streamingState after Esc-cancel + close leaked turn span#145
mabry1985 merged 1 commit into
devfrom
fix/cancel-clears-stuck-state

Conversation

@mabry1985
Copy link
Copy Markdown

@mabry1985 mabry1985 commented Apr 26, 2026

Why

Reproduced on Langfuse: session 442ed5c7, turn ba924d250d7c. The trace shows a 739-second turn with zero LLM activity in the middle — only the turn root span and two LLM calls clustered at the very end (which turn out to be the recap + prompt-suggestion calls firing on streamingState=Idle). The 12 minutes weren't model work; the OTel turn span just leaked open and got force-closed when the next prompt finally went through.

User-side, the symptom was: press Esc to cancel a turn → anything typed afterwards is silently dropped → app feels hung.

Root cause (two bugs interacting)

  1. cancelOngoingRequest doesn't clear stuck toolCalls. It aborts the controller and flips isResponding=false, but if a tool ignores its AbortSignal (or finishes between the cancel firing and responseSubmittedToGemini being set), the toolCall stays in a non-terminal state. streamingState (useGeminiStream.ts:424-437) computes Responding, and submitQuery's guard (1305-1314) silently returns every subsequent submission. The in-code comment at line 244-245 already documents this class of bug for a different root cause.
  2. cancelOngoingRequest doesn't call endTurnSpan, so the OTel turn span leaks. The recap + usePromptSuggestions LLM calls that fire on streamingState=Idle then nest under the dead span — Langfuse reports the turn as 12 minutes long when the actual model work was 1.7s.

Inheritance check

  • Upstream QwenLM/qwen-code main: identical cancelOngoingRequest shape, identical silent-return guard. Same bug. Issue #914 closed without resolution.
  • Further upstream google-gemini/gemini-cli: PR #21960 (which closed #21096) fixed a different cancel-related issue — retry-loop loading indicator showing stale "still on it" text. Not the same fix. Open issue #18525 ("Agent Stuck between Responses") is essentially this same symptom, still unresolved.
  • protoCLI: the toolCall-stuck-state side is inherited. The leaked-turn-span side is fork-only because startTurnSpan / endTurnSpan and the activeTurnContext are part of our agent harness (see docs/architecture/divergence-from-upstream.md).

Fix

  1. useReactToolScheduler exposes forceCancelStaleToolCalls() — flips responseSubmittedToGemini=true on terminal calls and synthesizes a cancelled state for non-terminal calls (with a "User cancelled. Tool was force-cleared after abort signal did not stop it within the grace window" message in responseParts so downstream consumers don't choke).
  2. cancelOngoingRequest:
    • calls markToolsAsSubmitted on every current toolCall immediately (handles the common case),
    • schedules a 3s setTimeout that runs forceCancelStaleToolCalls and surfaces a WARNING if anything had to be force-cleared,
    • calls endTurnSpan('ok') so the recap/suggestion calls don't attach to the dead span.
  3. submitQuery: when dropping a submission because streamingState !== Idle, surfaces a clear WARNING explaining the state (Responding / WaitingForConfirmation / Backgrounded) and the next step. No more silent drops.

Tests

  • 3,767 cli tests + 5,337 core tests pass.
  • useGeminiStream.test.tsx's 49-test cancellation suite continues to pass with the new flow.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of user-initiated cancellations to properly clean up stuck or stale operations.
    • Fixed state synchronization issues that could leave incomplete operations unresolved.
  • New Features

    • Added informative warning messages when operations cannot be submitted during active requests, guiding users on next steps.
    • Enhanced automatic cleanup of abandoned operations with 3-second timeout mechanism.

… span

Symptom (also reported on the Langfuse trace, session 442ed5c7,
turn ba924d250d7c with 739s latency and zero LLM activity in the
middle): user presses Esc to cancel, then any subsequent input is
silently dropped — UI stays in the loading indicator forever.

Root cause:
  - cancelOngoingRequest aborts the AbortController and flips
    isResponding=false, but does not clear toolCalls. If a tool
    ignores its AbortSignal, it stays in 'executing' / 'scheduled' /
    etc., or finishes with responseSubmittedToGemini=false. Either
    way, streamingState computes 'Responding' (useGeminiStream.ts:
    424-437) and submitQuery's guard (1305-1314) silently drops the
    next user submission. The in-code comment at line 244-245 even
    flags this class of bug for a different cause.
  - cancelOngoingRequest never calls endTurnSpan, so the OTel turn
    span leaks. The recap + prompt-suggestion LLM calls that fire on
    streamingState=Idle then attach to the dead span — Langfuse
    reports the turn as 12 minutes long when the actual model work
    was 1.7 seconds.

Inheritance: upstream qwen-code has the identical cancelOngoingRequest
shape and the identical silent-return guard. gemini-cli upstream's
PR #21960 (closing #21096) addressed a different cancel-related bug
(retry-loop loading indicator showing stale "still on it" text), not
the stuck-toolCalls case. Issue #18525 there ("Agent Stuck between
Responses") is essentially the same symptom and is still open. So
this is inherited, not introduced — but the leaked turn span is
fork-only, since startTurnSpan/endTurnSpan are part of our agent
harness.

Fix:
- useReactToolScheduler exposes a new forceCancelStaleToolCalls()
  that flips responseSubmittedToGemini=true on terminal calls and
  synthesizes a 'cancelled' state for any non-terminal call (with a
  clear "User cancelled. Tool was force-cleared after the abort
  signal did not stop it within the grace window" message in the
  responseParts so downstream consumers don't choke).
- cancelOngoingRequest in useGeminiStream:
  * marks every current toolCall as submitted immediately (handles
    the common case where the tool finished but the flag wasn't
    flipped),
  * schedules a 3s setTimeout that calls forceCancelStaleToolCalls
    and surfaces a WARNING if anything had to be force-cleared so
    the user knows the underlying process may still be running,
  * calls endTurnSpan('ok') so the recap/suggestion LLM calls don't
    keep nesting under a dead turn span in Langfuse.
- submitQuery no longer silently drops submissions when streamingState
  is non-Idle. It logs a clear WARNING explaining what state we're
  in (Responding / WaitingForConfirmation / Backgrounded) and what
  the user should do (approve the tool, wait, or press Esc).

Tests: 3,767 cli tests + 5,337 core tests pass. Existing cancellation
tests in useGeminiStream.test.tsx (49 tests in that file) continue to
pass with the new flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mabry1985 mabry1985 merged commit da4928f into dev Apr 26, 2026
1 of 2 checks passed
@mabry1985 mabry1985 deleted the fix/cancel-clears-stuck-state branch April 26, 2026 22:48
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1e3be83c-9574-485b-a84e-8d13057f38a4

📥 Commits

Reviewing files that changed from the base of the PR and between d697ddf and a5005ca.

📒 Files selected for processing (2)
  • packages/cli/src/ui/hooks/useGeminiStream.ts
  • packages/cli/src/ui/hooks/useReactToolScheduler.ts

Walkthrough

The changes add a cancellation mechanism for stale tool calls. useReactToolScheduler exports a new forceCancelStaleToolCalls callback that synthesizes cancelled entries and updates tool-call state. useGeminiStream integrates this callback into its cancellation path, closing OTel spans, blocking new submissions, and emitting warning messages when tool calls are cleared.

Changes

Cohort / File(s) Summary
Tool Call Cancellation Enhancement
packages/cli/src/ui/hooks/useReactToolScheduler.ts, packages/cli/src/ui/hooks/useGeminiStream.ts
Added forceCancelStaleToolCalls callback to useReactToolScheduler that marks non-terminal tool calls as cancelled and updates state. useGeminiStream now uses this callback on user cancellation, closes OTel turn spans, blocks new submissions, and emits warnings when stale tool calls are cleared.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cancel-clears-stuck-state

Comment @coderabbitai help to get the list of available commands and usage tips.

mabry1985 pushed a commit that referenced this pull request Apr 28, 2026
Two leaks in processGeminiStreamEvents that #145 / #152 didn't cover:

1. case ServerGeminiEventType.UserCancelled fell through with a plain
   `break`, which only exits the switch — the for-await kept iterating
   and any toolCallRequests already collected this iteration would still
   get scheduled at the post-loop scheduleToolCalls call.

2. The post-loop scheduleToolCalls fired unconditionally. If abort
   landed in the same tick as a chunk that carried finish_reason=tool_calls
   (model emitted tool calls while user was hitting Esc), the scheduler
   added the tools in 'validating' state with an already-aborted signal.
   The per-tool aborted check at coreToolScheduler.ts:861 eventually
   marks them 'cancelled', but the React state briefly flips
   streamingState back to Responding — sticking the spinner until the
   3s forceCancelStaleToolCalls grace window catches up.

Fix: UserCancelled returns early; scheduleToolCalls is gated on
!signal.aborted. Two new tests in the Cancellation describe block
cover both arms (UserCancelled-then-toolCallRequest, and
toolCallRequest-arriving-after-abort).

51/51 useGeminiStream tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request gets stuck with “This is taking a bit longer, we're still on it” after canceling request

1 participant