fix(core): bump stream stall watchdog default to 5 min#155
Conversation
Long agentic turns (parallel subagent dispatch, deep mid-stream planning pauses) were tripping the 90 s per-chunk idle watchdog and surfacing as "Stream stalled: no data received for 90s" even on healthy connections. Bump the default to 300 s — still catches genuine frozen connections, but gives legitimate slow-but-steady streams the headroom they need. PROTO_STREAM_STALL_TIMEOUT_MS env override unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughIncreases the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/core/turn.ts`:
- Around line 45-47: STREAM_STALL_TIMEOUT_MS currently uses parseInt on an env
var without validation; after parsing the raw value for STREAM_STALL_TIMEOUT_MS,
validate it (reject NaN or values <= 0), apply a safe default of 300000 if
invalid, and clamp the parsed value to a reasonable min/max range (e.g., min
1000 ms, max something large) before assigning; locate and update the constant
definition for STREAM_STALL_TIMEOUT_MS in turn.ts so it uses the
validated/clamped numeric value instead of the raw parseInt result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cd97024d-d04a-4072-adc8-7d0d75e9c6f3
📒 Files selected for processing (2)
docs/reference/settings.mdpackages/core/src/core/turn.ts
| const STREAM_STALL_TIMEOUT_MS = parseInt( | ||
| process.env['PROTO_STREAM_STALL_TIMEOUT_MS'] ?? '90000', | ||
| process.env['PROTO_STREAM_STALL_TIMEOUT_MS'] ?? '300000', | ||
| 10, |
There was a problem hiding this comment.
Harden env timeout parsing to avoid invalid watchdog values.
parseInt(process.env[...] ?? '300000', 10) accepts invalid/unsafe inputs (NaN, <= 0), which can cause immediate or undefined stall behavior. Add a bounded fallback after parsing.
Suggested patch
-const STREAM_STALL_TIMEOUT_MS = parseInt(
- process.env['PROTO_STREAM_STALL_TIMEOUT_MS'] ?? '300000',
- 10,
-);
+const parsedStreamStallTimeoutMs = Number.parseInt(
+ process.env['PROTO_STREAM_STALL_TIMEOUT_MS'] ?? '300000',
+ 10,
+);
+const STREAM_STALL_TIMEOUT_MS =
+ Number.isFinite(parsedStreamStallTimeoutMs) && parsedStreamStallTimeoutMs > 0
+ ? parsedStreamStallTimeoutMs
+ : 300_000;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/core/turn.ts` around lines 45 - 47, STREAM_STALL_TIMEOUT_MS
currently uses parseInt on an env var without validation; after parsing the raw
value for STREAM_STALL_TIMEOUT_MS, validate it (reject NaN or values <= 0),
apply a safe default of 300000 if invalid, and clamp the parsed value to a
reasonable min/max range (e.g., min 1000 ms, max something large) before
assigning; locate and update the constant definition for STREAM_STALL_TIMEOUT_MS
in turn.ts so it uses the validated/clamped numeric value instead of the raw
parseInt result.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Summary
Stream stalled: no data received for 90seven on healthy connections.STREAM_STALL_TIMEOUT_MSfrom90_000→300_000(5 min). Still catches genuine frozen connections, but gives slow-but-steady streams the headroom they need.PROTO_STREAM_STALL_TIMEOUT_MSenv override is unchanged — set it lower if you want tighter detection.Test plan
PROTO_STREAM_STALL_TIMEOUT_MS=2000, kill the upstream connection mid-stream, and confirmStreamStallErrorstill fires + retries.streamStall.test.tssuite passes (no test pinned to the old default).🤖 Generated with Claude Code
Summary by CodeRabbit