This document describes the complete fix for claude-agent-sdk issue #462, which has been successfully implemented and tested.
The claude-agent-sdk has a critical bug (#462) where the subprocess transport fails in FastAPI/uvicorn contexts. Symptoms include:
- Only returns init message: The SDK only returns the initial
SystemMessageand terminates - Tools don't execute: Agent cannot use MCP tools (like Databricks commands)
- Subprocess hangs: The subprocess starts but Python never receives more stdout
This makes the SDK unusable in typical FastAPI deployments with middleware, logging, and other production patterns.
When claude-agent-sdk runs in a FastAPI/uvicorn context, the existing event loop interferes with the subprocess transport's anyio.TextReceiveStream, preventing it from receiving all subprocess output.
Python's contextvars (used for per-request Databricks authentication) do not automatically propagate to new threads. When spawning a thread for the fresh event loop, the Databricks auth context was lost, causing all Databricks tool calls to fail.
The Claude agent sometimes passes empty strings ("") instead of null/None for optional parameters like context_id. The Databricks API tries to parse empty strings as numbers, causing NumberFormatException. The MCP tool wrappers need to convert empty strings to None.
We implemented a three-part fix:
Run the Claude agent in a completely fresh event loop in a separate thread (in server/services/agent.py):
def _run_agent_in_fresh_loop(message, options, result_queue, context):
"""Run agent in a fresh event loop (workaround for issue #462)."""
def run_with_context():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
async def run_query():
async def prompt_generator():
yield {'type': 'user', 'message': {'role': 'user', 'content': message}}
try:
async for msg in query(prompt=prompt_generator(), options=options):
result_queue.put(('message', msg))
except Exception as e:
result_queue.put(('error', e))
finally:
result_queue.put(('done', None))
try:
loop.run_until_complete(run_query())
finally:
loop.close()
# Execute in the copied context
context.run(run_with_context)Copy the contextvars context before spawning the thread to preserve Databricks authentication (in server/services/agent.py):
from contextvars import copy_context
# In stream_agent_response():
# Set auth context for this request
set_databricks_auth(databricks_host, databricks_token)
try:
# ... setup options ...
# Copy context BEFORE spawning thread
ctx = copy_context()
result_queue = queue.Queue()
agent_thread = threading.Thread(
target=_run_agent_in_fresh_loop,
args=(message, options, result_queue, ctx), # Pass context
daemon=True
)
agent_thread.start()
# Process messages from queue
while True:
msg_type, msg = await asyncio.get_event_loop().run_in_executor(
None, result_queue.get
)
if msg_type == 'done':
break
elif msg_type == 'error':
raise msg
elif msg_type == 'message':
# Yield message to frontend...
### Part 3: Empty String to None Conversion
Convert empty strings to `None` in MCP tool wrappers (in `databricks-mcp-server/databricks_mcp_server/tools/compute.py`):
```python
@mcp.tool
def execute_databricks_command(
code: str,
cluster_id: Optional[str] = None,
context_id: Optional[str] = None,
# ... other params
) -> Dict[str, Any]:
# Convert empty strings to None (Claude agent sometimes passes "" instead of null)
if cluster_id == "":
cluster_id = None
if context_id == "":
context_id = None
# ... rest of functionThis prevents NumberFormatException when the Databricks API tries to parse empty strings as numbers.
## How It Works
1. **Main Thread (FastAPI)**: Runs the FastAPI/uvicorn event loop
2. **Agent Thread**: Creates a fresh, isolated event loop for the Claude agent
3. **Context Copy**: Copies `contextvars` context (including Databricks auth) to the new thread
4. **Queue Communication**: Uses thread-safe queue to pass messages back to main thread
5. **Async Bridge**: Main thread reads from queue asynchronously using `run_in_executor`
## Benefits
✅ **Fixes subprocess transport**: Fresh event loop isolates agent from FastAPI's event loop
✅ **Preserves authentication**: Context copy propagates Databricks credentials to new thread
✅ **Maintains streaming**: Queue-based communication streams all messages properly
✅ **Production-ready**: Works with middleware, logging, Sentry, and other FastAPI patterns
✅ **MCP tools work**: Databricks commands and other MCP tools execute successfully
## Testing
To verify the fix works:
1. Start the development server: `bash scripts/start_dev.sh`
2. Open the app at http://localhost:3000
3. Ask the agent to execute Databricks commands
4. Verify tools execute (check logs for successful Databricks API calls)
## References
- Original Issue: https://github.com/anthropics/claude-agent-sdk-python/issues/462
- Python contextvars: https://docs.python.org/3/library/contextvars.html
- Threading and contextvars: https://peps.python.org/pep-0567/#implementation-notes