Summary
SubprocessCLITransport in _internal/transport/subprocess_cli.py uses the exact anyio.create_task_group() + manual __aenter__/__aexit__ anti-pattern that was fixed in query.py by PR #746 for the stderr reader. The fix was not applied to the sibling file. Every normal completion of async for message in query(...) now triggers a cross-task cancel scope exit on async-generator finalization, which cancels the caller's task from an unrelated context and pins anyio's _deliver_cancellation in a 100% CPU loop until the process restarts.
This is the same failure mode as #454 and #776, just in a different file.
Affected version
claude-agent-sdk==0.1.58 (current latest). Also present on main as of 2026-04-12 — I pulled src/claude_agent_sdk/_internal/transport/subprocess_cli.py directly from the main branch and the offending lines are still there.
The offending code
subprocess_cli.py lines 395–399 in connect():
if should_pipe_stderr and self._process.stderr:
self._stderr_stream = TextReceiveStream(self._process.stderr)
self._stderr_task_group = anyio.create_task_group()
await self._stderr_task_group.__aenter__()
self._stderr_task_group.start_soon(self._handle_stderr)
And subprocess_cli.py lines 458–462 in close():
if self._stderr_task_group:
with suppress(Exception):
self._stderr_task_group.cancel_scope.cancel()
await self._stderr_task_group.__aexit__(None, None, None)
self._stderr_task_group = None
Anyio cancel scopes have task affinity — they must be exited by the same async task that entered them. connect() enters the scope in whatever task is consuming the query() generator. close() is called from the generator's finally clause, which on normal completion runs in an asyncio-created finalizer task (via sys.set_asyncgen_hooks → async_generator_athrow), not the original consumer. The cross-task cancel_scope.cancel() + __aexit__(None, None, None) then cancels the scope's host task — the original consumer — from an unrelated task context.
Observed failure mode
async for message in query(...) runs to completion. Immediately after the async for loop exits, asyncio's asyncgen finalizer schedules a task (Task-<N>) to run async_generator_athrow(GeneratorExit). That task drives process_query's finally, which calls transport.close(), which hits the broken cleanup path.
With diagnostic instrumentation added at the except asyncio.CancelledError points in my application code, I captured this traceback on two independent completions (different scope IDs, different caller Task IDs, identical structure):
Cancelled via cancel scope 77a038de8cb0 by <Task pending
name='Task-12426' coro=<<async_generator_athrow without __name__>()>>
Cancelled via cancel scope 77a038034320 by <Task pending
name='Task-39833' coro=<<async_generator_athrow without __name__>()>>
Once the cancellation is delivered from the non-host task, the scope's state is inconsistent: it has a pending cancel but the host task never exited the scope. Anyio's _deliver_cancellation reschedules itself via call_soon on every event loop tick, pinning one CPU core at 100% for the remaining lifetime of the process. One stuck scope per completed agent — they accumulate.
This fires on every normal completion of the documented async for message in query(...) usage pattern. It is deterministic and reproducible.
Proposed fix
Apply the same fix PR #746 already applied to query.py: replace the anyio task group with a plain asyncio.create_task(), since asyncio tasks have no cancel-scope affinity and can be safely cancelled from any task context.
The diff is mechanical:
- Add
self._stderr_task: asyncio.Task | None = None to __init__
- In
connect(), replace the task-group-create + __aenter__ + start_soon block with self._stderr_task = asyncio.create_task(self._handle_stderr(), name="claude-sdk-stderr-reader")
- In
close(), replace the task-group cleanup block with:
if self._stderr_task is not None and not self._stderr_task.done():
self._stderr_task.cancel()
with suppress(asyncio.CancelledError, Exception):
await self._stderr_task
self._stderr_task = None
- Remove the
_stderr_task_group attribute entirely
_handle_stderr() itself needs no changes — it already handles ClosedResourceError and generic Exception cleanly.
Reference implementation
I have a working subclass that applies this exact fix as a downstream workaround: it subclasses SubprocessCLITransport, delegates to super().connect() for all the subprocess setup, then immediately tears down the anyio task group in the same task frame that entered it (so the __aexit__ is legal), and replaces it with asyncio.create_task(self._handle_stderr()). Happy to port this into a PR against main if it would help.
After deploying the subclass, I verified with 4+ consecutive agent completions on the same process:
| Before fix |
After fix |
| Every completion → cancel scope leak → CPU at 100%+ until process restart |
Every completion → clean event pipeline, CPU stays at <15% |
The diagnostic traceback from above disappears entirely.
Why the incomplete fix likely slipped through
PR #746's description explains that query.py used a TaskGroup with manual __aenter__/__aexit__ and hit the cross-task affinity issue. The PR fixed that file but the same pattern exists in subprocess_cli.py as a separate, smaller task group for stderr reading — it was apparently not surfaced by the test case in #746 (which tested cross-task close of query, not of the subprocess transport). The stderr task group has the same affinity semantics and the same trigger.
Summary
SubprocessCLITransportin_internal/transport/subprocess_cli.pyuses the exactanyio.create_task_group()+ manual__aenter__/__aexit__anti-pattern that was fixed inquery.pyby PR #746 for the stderr reader. The fix was not applied to the sibling file. Every normal completion ofasync for message in query(...)now triggers a cross-task cancel scope exit on async-generator finalization, which cancels the caller's task from an unrelated context and pins anyio's_deliver_cancellationin a 100% CPU loop until the process restarts.This is the same failure mode as #454 and #776, just in a different file.
Affected version
claude-agent-sdk==0.1.58(current latest). Also present onmainas of 2026-04-12 — I pulledsrc/claude_agent_sdk/_internal/transport/subprocess_cli.pydirectly from themainbranch and the offending lines are still there.The offending code
subprocess_cli.pylines 395–399 inconnect():And
subprocess_cli.pylines 458–462 inclose():Anyio cancel scopes have task affinity — they must be exited by the same async task that entered them.
connect()enters the scope in whatever task is consuming thequery()generator.close()is called from the generator'sfinallyclause, which on normal completion runs in an asyncio-created finalizer task (viasys.set_asyncgen_hooks→async_generator_athrow), not the original consumer. The cross-taskcancel_scope.cancel()+__aexit__(None, None, None)then cancels the scope's host task — the original consumer — from an unrelated task context.Observed failure mode
async for message in query(...)runs to completion. Immediately after theasync forloop exits, asyncio's asyncgen finalizer schedules a task (Task-<N>) to runasync_generator_athrow(GeneratorExit). That task drivesprocess_query'sfinally, which callstransport.close(), which hits the broken cleanup path.With diagnostic instrumentation added at the
except asyncio.CancelledErrorpoints in my application code, I captured this traceback on two independent completions (different scope IDs, different caller Task IDs, identical structure):Once the cancellation is delivered from the non-host task, the scope's state is inconsistent: it has a pending cancel but the host task never exited the scope. Anyio's
_deliver_cancellationreschedules itself viacall_soonon every event loop tick, pinning one CPU core at 100% for the remaining lifetime of the process. One stuck scope per completed agent — they accumulate.This fires on every normal completion of the documented
async for message in query(...)usage pattern. It is deterministic and reproducible.Proposed fix
Apply the same fix PR #746 already applied to
query.py: replace the anyio task group with a plainasyncio.create_task(), since asyncio tasks have no cancel-scope affinity and can be safely cancelled from any task context.The diff is mechanical:
self._stderr_task: asyncio.Task | None = Noneto__init__connect(), replace the task-group-create +__aenter__+start_soonblock withself._stderr_task = asyncio.create_task(self._handle_stderr(), name="claude-sdk-stderr-reader")close(), replace the task-group cleanup block with:_stderr_task_groupattribute entirely_handle_stderr()itself needs no changes — it already handlesClosedResourceErrorand genericExceptioncleanly.Reference implementation
I have a working subclass that applies this exact fix as a downstream workaround: it subclasses
SubprocessCLITransport, delegates tosuper().connect()for all the subprocess setup, then immediately tears down the anyio task group in the same task frame that entered it (so the__aexit__is legal), and replaces it withasyncio.create_task(self._handle_stderr()). Happy to port this into a PR againstmainif it would help.After deploying the subclass, I verified with 4+ consecutive agent completions on the same process:
The diagnostic traceback from above disappears entirely.
Why the incomplete fix likely slipped through
PR #746's description explains that
query.pyused aTaskGroupwith manual__aenter__/__aexit__and hit the cross-task affinity issue. The PR fixed that file but the same pattern exists insubprocess_cli.pyas a separate, smaller task group for stderr reading — it was apparently not surfaced by the test case in #746 (which tested cross-task close ofquery, not of the subprocess transport). The stderr task group has the same affinity semantics and the same trigger.