Summary
Multiple critical concurrency bugs violate the "multi-agent + async safe by default" principle. These can cause RuntimeError, lost data, and crashes in production multi-agent deployments.
Specific Issues
1. Unprotected global dicts in agents/agents.py (lines 33-35)
# NO lock protection — contrast with agent.py which HAS _server_lock
_agents_server_started = {}
_agents_registered_endpoints = {}
_agents_shared_apps = {}
Multiple agents starting API servers concurrently can race on these dicts. agent/agent.py correctly uses _server_lock = threading.Lock() for the same pattern — agents.py does not.
2. Race condition in global ToolRegistry singleton (tools/registry.py, lines 256-261)
_global_registry: Optional[ToolRegistry] = None
def get_registry() -> ToolRegistry:
global _global_registry
if _global_registry is None: # Thread A reads None
_global_registry = ToolRegistry() # Thread B also reads None → two registries created
return _global_registry
No lock around initialization. Two threads can create separate registries, causing tools registered in one to be invisible in the other.
Fix: Add double-checked locking:
_registry_lock = threading.Lock()
def get_registry() -> ToolRegistry:
global _global_registry
if _global_registry is None:
with _registry_lock:
if _global_registry is None:
_global_registry = ToolRegistry()
return _global_registry
3. asyncio.run() inside potentially-async context (agent/agent.py, line 5067)
if hasattr(backend, 'request_approval_sync'):
decision = backend.request_approval_sync(request)
else:
decision = asyncio.run(backend.request_approval(request)) # 💥 RuntimeError if event loop running
When _check_tool_approval_sync() is called during achat() or async execution, asyncio.run() will raise RuntimeError: asyncio.run() cannot be called from a running event loop. The code should detect whether an event loop is running and use asyncio.get_running_loop().create_task() or similar instead.
4. Unprotected _pending_approvals dict (agent/agent.py, lines 1630, 8334-8384)
self._pending_approvals = {} # No lock
# Write (async method):
self._pending_approvals[tracking_id] = {...}
# Read+Delete (concurrent method):
for tid, info in self._pending_approvals.items(): # RuntimeError: dict changed size
del self._pending_approvals[tid]
Concurrent async tasks modifying this dict can cause RuntimeError: dictionary changed size during iteration.
Impact
- Production crashes in multi-agent async deployments
- Silent data loss (tools registered to wrong registry instance)
- Intermittent failures that are hard to reproduce and debug
Expected Behavior
Per the stated principle: "Multi-agent + async safe by default" — all shared mutable state should be lock-protected, and async/sync boundaries should be handled correctly.
Summary
Multiple critical concurrency bugs violate the "multi-agent + async safe by default" principle. These can cause
RuntimeError, lost data, and crashes in production multi-agent deployments.Specific Issues
1. Unprotected global dicts in
agents/agents.py(lines 33-35)Multiple agents starting API servers concurrently can race on these dicts.
agent/agent.pycorrectly uses_server_lock = threading.Lock()for the same pattern —agents.pydoes not.2. Race condition in global
ToolRegistrysingleton (tools/registry.py, lines 256-261)No lock around initialization. Two threads can create separate registries, causing tools registered in one to be invisible in the other.
Fix: Add double-checked locking:
3.
asyncio.run()inside potentially-async context (agent/agent.py, line 5067)When
_check_tool_approval_sync()is called duringachat()or async execution,asyncio.run()will raiseRuntimeError: asyncio.run() cannot be called from a running event loop. The code should detect whether an event loop is running and useasyncio.get_running_loop().create_task()or similar instead.4. Unprotected
_pending_approvalsdict (agent/agent.py, lines 1630, 8334-8384)Concurrent async tasks modifying this dict can cause
RuntimeError: dictionary changed size during iteration.Impact
Expected Behavior
Per the stated principle: "Multi-agent + async safe by default" — all shared mutable state should be lock-protected, and async/sync boundaries should be handled correctly.