fix(session): add idle timeout to LLM streaming response#29348
Draft
WonderJL wants to merge 1 commit into
Draft
Conversation
If the model provider's SSE stream stalls without sending [DONE] or closing the socket (proxy drop, transport half-open, upstream hang), the session loop blocks on the next chunk forever and the session appears stuck with no error. Wrap the LLM event stream in Stream.timeoutOrElse so that an idle window with no events fails the stream with a tagged LLMStreamIdleTimeout error, allowing the session loop to surface the failure to the user. Applies to both the native runtime and ai-sdk runtime paths. Configurable via experimental.llm_stream_idle_timeout (ms); defaults to 120000ms. Set to 0 to disable.
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
Contributor
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
experimental.llm_stream_idle_timeout(ms); defaults to120000(2 min). Set to0to disable.Why
The session loop consumes the LLM stream via
Stream.fromAsyncIterable(result.fullStream, …)with no read/idle deadline on either runtime path. If the upstream stops sending bytes without delivering[DONE], an error frame, or a TCP close — e.g. an intermittent provider hiccup or a half-open SSE connection — the loop blocks on the next chunk forever. Symptom in practice: the session never advancesstep, nofinishReasonis emitted, and the UI shows it as "not processing and not stopping." Because there is no idle deadline anywhere in the stream pipeline, any provider-side stall is unrecoverable without killing the process.What changes
packages/opencode/src/session/llm.tsLLMStreamIdleTimeouterror (_tag: "LLMStreamIdleTimeout") andwithIdleTimeout(stream, idleMs)helper built onStream.timeoutOrElse.LLMEventstream (both native + ai-sdk paths) after the runtime selection so the failure surface is symmetric.cfg.experimental?.llm_stream_idle_timeoutwith the default constantDEFAULT_LLM_STREAM_IDLE_TIMEOUT_MS = 120_000.0disables.packages/opencode/src/config/config.tsexperimental.llm_stream_idle_timeout: PositiveInt(optional), documented inline.packages/opencode/test/session/llm-idle-timeout.test.ts(new)LLMStreamIdleTimeout.Test plan
bun run typecheck(packages/opencode) — clean.bun test test/session/llm-idle-timeout.test.ts— 3/3 pass.bun test test/session/{llm,llm-native,retry,session}.test.ts— 78/78 pass; no regressions in the existing LLM/session suites.openai/gpt-5.4with a non-trivial prompt; confirm normal streaming + finish path is unaffected (default 120s window is well above expected gaps).Notes
Stream.timeoutOrElsefrom Effect 4 is per-element/idle, not total — matches the symptom we want to catch.acquireReleasefinalizer still aborts the underlying request/transport.