fix(transport): isolate stderr callback failures, continue reading lines#932
Merged
ashwin-ant merged 1 commit intoMay 14, 2026
Merged
Conversation
`SubprocessCLITransport._handle_stderr` wrapped the entire ``async for`` loop in a single ``except Exception: pass``, so a raise from the user-provided ``options.stderr`` callback was caught at the outer level — the loop terminated and no further stderr lines were delivered for the rest of the session. The failure was silent: no log, no traceback. A reproducer at the regression test confirms a callback that raises on the first line previously dropped lines 2 and 3; with the fix all three lines are delivered. Move the ``try/except`` inside the loop and log at debug level so a buggy callback fails per-line but doesn't disable stderr piping. Also log (instead of silently swallow) at the outer level so a stream-read failure is at least visible at debug level. Closes anthropics#929 Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
ashwin-ant
approved these changes
May 14, 2026
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #932 +/- ##
=======================================
Coverage ? 89.27%
=======================================
Files ? 23
Lines ? 3982
Branches ? 0
=======================================
Hits ? 3555
Misses ? 427
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ashwin-ant
pushed a commit
that referenced
this pull request
May 14, 2026
…ic wait (#933) ## Summary Fixes #928. The two eager-flush tests assumed 2 `await asyncio.sleep(0)` yields between consecutive `enqueue` calls were enough for each drain to complete and append. Under lock contention between drains the path from `enqueue` to `store.append` needs ~4 turns (drain releases lock → next drain acquires it → `wait_for(store.append)` schedules its inner task → record). Both tests fail 5/5 locally on Python 3.11.14 / macOS arm64; CI got lucky on event-loop scheduling at merge time of #905. See #928 for the full probe and yield-count sweep. ## Changes ### Unit-level test (`test_eager_mode_flushes_per_frame`) Replace fixed `sleep(0)` count with a new `_wait_until(predicate, timeout=1.0)` helper that yields until `len(store.append_calls)` reaches the expected value, with a 1-second deadline. Deterministic — works regardless of Python / pytest-asyncio / OS scheduling differences. ### Integration-level test (`test_eager_flush_mode_appends_per_frame_before_result`) Convert `_make_mock_transport`'s `yield_between: bool` to `yields_between: int` (default `0`) and pass `yields_between=10` for this test, so the mock yields the loop enough times between frames for each eager flush to drain before the next frame arrives. Robust headroom — 4 was the observed minimum, 10 leaves room for slower environments. The signature change touches only one caller (this same test); other callers omit the parameter and behave identically to before. ## Test plan - [x] `for i in 1 2 3 4 5; do uv run pytest <both tests> -q; done` → 5/5 passed (was 5/5 failed before) - [x] `uv run pytest tests/test_transcript_mirror.py` → 42/42 passed - [x] `ruff check / ruff format` clean ## Related issues / PRs - Filed alongside two other fixes from the same audit pass: #929 (stderr callback swallow → PR #932), #930 (cancellation log noise → PR #931). Independent of those. Co-authored-by: Xian Zheng <xian.zheng@challenger.gauntletai.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #929.
SubprocessCLITransport._handle_stderrwrapped the entireasync forloop in a singleexcept Exception: pass, so a raise from the user-providedoptions.stderrcallback was caught at the outer level — the loop terminated and no further stderr lines were delivered for the rest of the session. Silent: no log, no traceback.The repro in #929 confirmed a callback that raises on the first line dropped all subsequent lines (
callback_raised_count = 1for a 2-line stream). The contract onstderr: Callable[[str], None](types.py:1741) doesn't document any "must not raise" constraint, so this is a bug, not user error.Changes
src/claude_agent_sdk/_internal/transport/subprocess_cli.py: per-linetry/exceptaroundself._options.stderr(line_str)so a buggy callback fails for that one line but the loop continues. The outerexcept Exception: passbecomeslogger.debug(..., exc_info=True)so stream-read failures are at least visible at debug level. Theexcept anyio.ClosedResourceErrorfor legitimate end-of-stream is preserved.tests/test_transport.py: regression testtest_stderr_callback_raise_does_not_terminate_loop— 3-line stream, callback raises on line 1, asserts all 3 lines delivered.Test plan
uv run pytest tests/test_transport.py— 90 passeduv run mypy src/— cleanruff check / ruff format— cleancount = 3(wascount = 1before fix)