Bug Description
After a ClaudeSDKClient session ends (via client.disconnect() with timeout + force-kill fallback), the Python process continues to burn ~24% CPU indefinitely, even when completely idle with no active sessions.
The root cause is a leaked CLOSE_WAIT TCP socket to the Anthropic API that remains registered in the kqueue event loop. Since CLOSE_WAIT sockets are permanently "readable" (EOF pending), kqueue returns them as ready on every poll cycle, causing the asyncio event loop to busy-spin.
How This Differs from #378
Issue #378 describes close() hanging during the call due to _deliver_cancellation spinning. Our issue is about what happens after — even when disconnect completes or times out and the subprocess is force-killed:
- A TCP socket to the Anthropic API remains in CLOSE_WAIT state
- The socket FD stays registered in kqueue
- The asyncio event loop spins polling this permanently-readable FD
- CPU stays at ~24% with the process doing absolutely nothing
Reproduction
We run a long-lived FastAPI daemon that uses ClaudeSDKClient for periodic tasks. Between tasks, the daemon should be near 0% CPU.
# Simplified pattern
client = ClaudeSDKClient(ClaudeAgentOptions(max_turns=5))
# ... use client ...
# Disconnect with timeout (workaround from #378)
try:
await asyncio.wait_for(client.disconnect(), timeout=5.0)
except asyncio.TimeoutError:
# Force-kill the subprocess
os.kill(subprocess_pid, signal.SIGKILL)
After this sequence, lsof shows the leaked socket:
Python 71226 user 13u IPv6 ... TCP [local]:59274->[2600:9000:2134:...]:https (CLOSE_WAIT)
And sample confirms kqueue spin:
789/889 samples in:
select_kqueue_control_impl → kevent (should be blocking, but returns immediately)
Evidence
- Process state: Daemon fully idle — no active sessions, no scheduled tasks running
- CPU: 23.7% sustained, for over 1 hour
lsof output: CLOSE_WAIT TCP socket (FD 13) to Anthropic API endpoint, never closed
sample output: 88.7% of samples in kevent call, but CPU not idle — kqueue returning immediately due to permanently-readable CLOSE_WAIT socket
- No orphaned pipes: Subprocess pipes were properly closed (we implemented a workaround for that). The socket is from the SDK's internal HTTP transport, not the subprocess stdio.
Root Cause Analysis
The SDK (or its HTTP transport layer) opens HTTPS connections to the Anthropic API. When the remote server closes the connection (sends FIN):
- The local TCP stack ACKs the FIN → socket enters CLOSE_WAIT
- The SDK never calls
close() on the socket
- The socket's FD remains registered in kqueue (via asyncio's event loop)
- kqueue reports it as readable every poll cycle (EOF is pending)
- asyncio event loop never blocks → CPU spin
The asyncio.wait_for() workaround from #378 doesn't help here because:
Suggested Fix
The SDK's transport layer should:
- Track all opened sockets (HTTP connections to the API, not just subprocess pipes)
- Close them in
transport.close() — ensure close() on the TCP socket is called
- Deregister from the event loop — remove the FD from kqueue/epoll before closing
Alternatively, a defensive cleanup in Query.close():
async def close(self) -> None:
self._closed = True
if self._tg:
self._tg.cancel_scope.cancel()
with suppress(anyio.get_cancelled_exc_class()):
try:
with anyio.fail_after(5.0):
await self._tg.__aexit__(None, None, None)
except TimeoutError:
pass
await self.transport.close()
# Defensive: close any remaining sockets to prevent kqueue spin
self._close_leaked_fds()
Environment
- claude-agent-sdk: 0.1.45
- Python: 3.13.5
- Platform: macOS 15.6.1 (Darwin 24.6.0), ARM64 (Apple Silicon)
- Event loop: asyncio with kqueue selector
- Use case: Long-running FastAPI daemon with periodic
ClaudeSDKClient sessions
Related
Bug Description
After a
ClaudeSDKClientsession ends (viaclient.disconnect()with timeout + force-kill fallback), the Python process continues to burn ~24% CPU indefinitely, even when completely idle with no active sessions.The root cause is a leaked CLOSE_WAIT TCP socket to the Anthropic API that remains registered in the kqueue event loop. Since CLOSE_WAIT sockets are permanently "readable" (EOF pending), kqueue returns them as ready on every poll cycle, causing the asyncio event loop to busy-spin.
How This Differs from #378
Issue #378 describes
close()hanging during the call due to_deliver_cancellationspinning. Our issue is about what happens after — even when disconnect completes or times out and the subprocess is force-killed:Reproduction
We run a long-lived FastAPI daemon that uses
ClaudeSDKClientfor periodic tasks. Between tasks, the daemon should be near 0% CPU.After this sequence,
lsofshows the leaked socket:And
sampleconfirms kqueue spin:Evidence
lsofoutput: CLOSE_WAIT TCP socket (FD 13) to Anthropic API endpoint, never closedsampleoutput: 88.7% of samples inkeventcall, but CPU not idle — kqueue returning immediately due to permanently-readable CLOSE_WAIT socketRoot Cause Analysis
The SDK (or its HTTP transport layer) opens HTTPS connections to the Anthropic API. When the remote server closes the connection (sends FIN):
close()on the socketThe
asyncio.wait_for()workaround from #378 doesn't help here because:Suggested Fix
The SDK's transport layer should:
transport.close()— ensureclose()on the TCP socket is calledAlternatively, a defensive cleanup in
Query.close():Environment
ClaudeSDKClientsessionsRelated
Query.close()hangs indefinitely causing 100% CPU (same family of bugs, different manifestation)