Thread/async safety: unprotected global mutable state causes race conditions in multi-agent deployments

## Summary

Multiple critical concurrency bugs violate the "multi-agent + async safe by default" principle. These can cause `RuntimeError`, lost data, and crashes in production multi-agent deployments.

## Specific Issues

### 1. Unprotected global dicts in `agents/agents.py` (lines 33-35)

```python
# NO lock protection — contrast with agent.py which HAS _server_lock
_agents_server_started = {}
_agents_registered_endpoints = {}
_agents_shared_apps = {}
```

Multiple agents starting API servers concurrently can race on these dicts. `agent/agent.py` correctly uses `_server_lock = threading.Lock()` for the same pattern — `agents.py` does not.

### 2. Race condition in global `ToolRegistry` singleton (`tools/registry.py`, lines 256-261)

```python
_global_registry: Optional[ToolRegistry] = None

def get_registry() -> ToolRegistry:
    global _global_registry
    if _global_registry is None:        # Thread A reads None
        _global_registry = ToolRegistry()  # Thread B also reads None → two registries created
    return _global_registry
```

No lock around initialization. Two threads can create separate registries, causing tools registered in one to be invisible in the other.

**Fix:** Add double-checked locking:
```python
_registry_lock = threading.Lock()

def get_registry() -> ToolRegistry:
    global _global_registry
    if _global_registry is None:
        with _registry_lock:
            if _global_registry is None:
                _global_registry = ToolRegistry()
    return _global_registry
```

### 3. `asyncio.run()` inside potentially-async context (`agent/agent.py`, line 5067)

```python
if hasattr(backend, 'request_approval_sync'):
    decision = backend.request_approval_sync(request)
else:
    decision = asyncio.run(backend.request_approval(request))  # 💥 RuntimeError if event loop running
```

When `_check_tool_approval_sync()` is called during `achat()` or async execution, `asyncio.run()` will raise `RuntimeError: asyncio.run() cannot be called from a running event loop`. The code should detect whether an event loop is running and use `asyncio.get_running_loop().create_task()` or similar instead.

### 4. Unprotected `_pending_approvals` dict (`agent/agent.py`, lines 1630, 8334-8384)

```python
self._pending_approvals = {}  # No lock

# Write (async method):
self._pending_approvals[tracking_id] = {...}

# Read+Delete (concurrent method):
for tid, info in self._pending_approvals.items():  # RuntimeError: dict changed size
    del self._pending_approvals[tid]
```

Concurrent async tasks modifying this dict can cause `RuntimeError: dictionary changed size during iteration`.

## Impact

- **Production crashes** in multi-agent async deployments
- **Silent data loss** (tools registered to wrong registry instance)
- **Intermittent failures** that are hard to reproduce and debug

## Expected Behavior

Per the stated principle: "Multi-agent + async safe by default" — all shared mutable state should be lock-protected, and async/sync boundaries should be handled correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thread/async safety: unprotected global mutable state causes race conditions in multi-agent deployments #1167

Summary

Specific Issues

1. Unprotected global dicts in `agents/agents.py` (lines 33-35)

2. Race condition in global `ToolRegistry` singleton (`tools/registry.py`, lines 256-261)

3. `asyncio.run()` inside potentially-async context (`agent/agent.py`, line 5067)

4. Unprotected `_pending_approvals` dict (`agent/agent.py`, lines 1630, 8334-8384)

Impact

Expected Behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Thread/async safety: unprotected global mutable state causes race conditions in multi-agent deployments #1167

Description

Summary

Specific Issues

1. Unprotected global dicts in agents/agents.py (lines 33-35)

2. Race condition in global ToolRegistry singleton (tools/registry.py, lines 256-261)

3. asyncio.run() inside potentially-async context (agent/agent.py, line 5067)

4. Unprotected _pending_approvals dict (agent/agent.py, lines 1630, 8334-8384)

Impact

Expected Behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Unprotected global dicts in `agents/agents.py` (lines 33-35)

2. Race condition in global `ToolRegistry` singleton (`tools/registry.py`, lines 256-261)

3. `asyncio.run()` inside potentially-async context (`agent/agent.py`, line 5067)

4. Unprotected `_pending_approvals` dict (`agent/agent.py`, lines 1630, 8334-8384)