Skip to content

Global mutable state lacks thread/async safety in multi-agent scenarios #1129

@MervinPraison

Description

@MervinPraison

Summary

The core SDK has 10+ global mutable variables that are not fully protected for concurrent multi-agent use, violating the "multi-agent + async safe by default" principle.

Unprotected Globals Identified

Location Variable Risk
bus/bus.py _default_bus Shared event bus across agents
trace/protocol.py _default_emitter Shared trace emitter
telemetry/integration.py _queue_processor_running, _telemetry_executor, _telemetry_queue, _performance_mode_enabled Race conditions in telemetry
paths.py _data_dir_cache Minor — cache contention
main.py approval_callback, sync_display_callbacks, async_display_callbacks Callback registration races

What's Already Done Well

  • _lazy.py uses proper _cache_lock with double-checked locking
  • trace/context_events.py uses contextvars for async-safe state
  • Thread safety tests exist in tests/unit/test_thread_safety.py

What's Missing

  1. Display callback registries (sync_display_callbacks, async_display_callbacks) in main.py are plain lists — concurrent modification during multi-agent runs can cause RuntimeError
  2. _default_bus and _default_emitter are set without locks — two agents starting simultaneously could race
  3. Telemetry globals use module-level state that could race during shutdown
  4. No contextvars isolation for per-agent state in async scenarios

Suggested Fix

  • Use contextvars.ContextVar for per-agent/per-session state (bus, emitter, callbacks)
  • Add thread locks around global registration operations
  • Consider a threading.Lock or asyncio.Lock wrapper for callback lists
  • Expand test_thread_safety.py to cover these specific globals

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions