CLOSE_WAIT socket leak causes persistent CPU spin on macOS (kqueue) even when fully idle

## Bug Description

After a `ClaudeSDKClient` session ends (via `client.disconnect()` with timeout + force-kill fallback), the Python process continues to burn ~24% CPU **indefinitely**, even when completely idle with no active sessions.

The root cause is a **leaked CLOSE_WAIT TCP socket** to the Anthropic API that remains registered in the kqueue event loop. Since CLOSE_WAIT sockets are permanently "readable" (EOF pending), kqueue returns them as ready on every poll cycle, causing the asyncio event loop to busy-spin.

## How This Differs from #378

Issue #378 describes `close()` hanging during the call due to `_deliver_cancellation` spinning. Our issue is about what happens **after** — even when disconnect completes or times out and the subprocess is force-killed:

1. A TCP socket to the Anthropic API remains in CLOSE_WAIT state
2. The socket FD stays registered in kqueue
3. The asyncio event loop spins polling this permanently-readable FD
4. CPU stays at ~24% with the process doing absolutely nothing

## Reproduction

We run a long-lived FastAPI daemon that uses `ClaudeSDKClient` for periodic tasks. Between tasks, the daemon should be near 0% CPU.

```python
# Simplified pattern
client = ClaudeSDKClient(ClaudeAgentOptions(max_turns=5))
# ... use client ...

# Disconnect with timeout (workaround from #378)
try:
    await asyncio.wait_for(client.disconnect(), timeout=5.0)
except asyncio.TimeoutError:
    # Force-kill the subprocess
    os.kill(subprocess_pid, signal.SIGKILL)
```

After this sequence, `lsof` shows the leaked socket:

```
Python  71226 user   13u   IPv6 ...   TCP [local]:59274->[2600:9000:2134:...]:https (CLOSE_WAIT)
```

And `sample` confirms kqueue spin:

```
789/889 samples in:
  select_kqueue_control_impl → kevent  (should be blocking, but returns immediately)
```

## Evidence

- **Process state**: Daemon fully idle — no active sessions, no scheduled tasks running
- **CPU**: 23.7% sustained, for over 1 hour
- **`lsof` output**: CLOSE_WAIT TCP socket (FD 13) to Anthropic API endpoint, never closed
- **`sample` output**: 88.7% of samples in `kevent` call, but CPU not idle — kqueue returning immediately due to permanently-readable CLOSE_WAIT socket
- **No orphaned pipes**: Subprocess pipes were properly closed (we implemented a workaround for that). The socket is from the SDK's internal HTTP transport, not the subprocess stdio.

## Root Cause Analysis

The SDK (or its HTTP transport layer) opens HTTPS connections to the Anthropic API. When the remote server closes the connection (sends FIN):

1. The local TCP stack ACKs the FIN → socket enters CLOSE_WAIT
2. The SDK never calls `close()` on the socket
3. The socket's FD remains registered in kqueue (via asyncio's event loop)
4. kqueue reports it as readable every poll cycle (EOF is pending)
5. asyncio event loop never blocks → CPU spin

The `asyncio.wait_for()` workaround from #378 doesn't help here because:
- The socket leak is independent of the task group cancellation issue
- Even after force-killing the subprocess and closing its pipes, the HTTP socket persists
- As noted in #378 comments, anyio cancellation doesn't propagate cleanly through asyncio

## Suggested Fix

The SDK's transport layer should:

1. **Track all opened sockets** (HTTP connections to the API, not just subprocess pipes)
2. **Close them in `transport.close()`** — ensure `close()` on the TCP socket is called
3. **Deregister from the event loop** — remove the FD from kqueue/epoll before closing

Alternatively, a defensive cleanup in `Query.close()`:
```python
async def close(self) -> None:
    self._closed = True
    if self._tg:
        self._tg.cancel_scope.cancel()
        with suppress(anyio.get_cancelled_exc_class()):
            try:
                with anyio.fail_after(5.0):
                    await self._tg.__aexit__(None, None, None)
            except TimeoutError:
                pass
    await self.transport.close()
    # Defensive: close any remaining sockets to prevent kqueue spin
    self._close_leaked_fds()
```

## Environment

- **claude-agent-sdk**: 0.1.45
- **Python**: 3.13.5
- **Platform**: macOS 15.6.1 (Darwin 24.6.0), ARM64 (Apple Silicon)
- **Event loop**: asyncio with kqueue selector
- **Use case**: Long-running FastAPI daemon with periodic `ClaudeSDKClient` sessions

## Related

- #378 — `Query.close()` hangs indefinitely causing 100% CPU (same family of bugs, different manifestation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLOSE_WAIT socket leak causes persistent CPU spin on macOS (kqueue) even when fully idle #665

Bug Description

How This Differs from #378

Reproduction

Evidence

Root Cause Analysis

Suggested Fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CLOSE_WAIT socket leak causes persistent CPU spin on macOS (kqueue) even when fully idle #665

Description

Bug Description

How This Differs from #378

Reproduction

Evidence

Root Cause Analysis

Suggested Fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions