Skip to content

CLI exit errors not propagated to pending control requests - initialize times out instead of failing fast #387

@grumpygordon

Description

@grumpygordon

Bug Description

When the Claude CLI exits with an error (e.g., invalid session ID passed to --resume), the SDK's message reader task catches the error but doesn't signal pending control requests. This causes initialize() to wait for the full 60-second timeout instead of failing immediately.

Steps to Reproduce

  1. Create a ClaudeSDKClient with an invalid/expired session ID via ClaudeAgentOptions(resume="invalid-session-id")
  2. Enter the async context manager (async with client:)
  3. The CLI outputs error to stderr: No conversation found with session ID: xxx
  4. The CLI exits with code 1
  5. Expected: initialize() fails immediately with the error
  6. Actual: initialize() waits 60 seconds before raising Exception: Control request timeout: initialize

Root Cause

In _internal/query.py, the _read_messages method runs in a separate task and catches CLI errors:

# query.py:201-208
except Exception as e:
    logger.error(f"Fatal error in message reader: {e}")
    await self._message_send.send({"type": "error", "error": str(e)})

The error is sent to the message stream, but _send_control_request is waiting on a control response event:

# query.py:355-356
with anyio.fail_after(timeout):
    await event.wait()  # Never signaled when CLI exits with error

These two mechanisms don't communicate - the error in the message reader never wakes up the control request waiter.

Proposed Fix

Signal all pending control requests when an error occurs in _read_messages:

except Exception as e:
    logger.error(f"Fatal error in message reader: {e}")

    # Signal all pending control requests
    for request_id, event in list(self.pending_control_responses.items()):
        if request_id not in self.pending_control_results:
            self.pending_control_results[request_id] = e
            event.set()

    await self._message_send.send({"type": "error", "error": str(e)})

The existing code at lines 361-362 already handles this case:

if isinstance(result, Exception):
    raise result

So the fix just needs to signal the events - the error propagation infrastructure is already in place.

Environment

  • claude-agent-sdk version: 0.1.6
  • Python: 3.12
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions