fix: concurrency safety gaps in Core SDK (fixes #1458) by praisonai-triage-agent[bot] · Pull Request #1459 · MervinPraison/PraisonAI

praisonai-triage-agent · 2026-04-18T09:29:02Z

Summary: Fixes three critical concurrency gaps in PraisonAI Core SDK. Gap 1: Replace unsafe asyncio.Semaphore private attribute manipulation with proper sync-to-async bridge. Gap 2: Fix ThreadPoolExecutor resource leak by using reusable agent-level executor. Gap 3: Add thread lock protection for plugin state races. All changes preserve API contracts and follow protocol-driven architecture.

Summary by CodeRabbit

Release Notes

Improvements
- Enhanced concurrency handling for improved stability when managing async event loops
- Optimized tool execution timeout handling with refined resource management and error reporting
- Increased system reliability through thread-safe plugin state synchronization

- Replace unsafe asyncio.Semaphore private attribute manipulation with proper sync-to-async bridge in concurrency.py - Fix ThreadPoolExecutor resource leak by using reusable agent-level executor in tool_execution.py - Add thread lock protection for unprotected plugin state in plugins/__init__.py Fixes #1458 Co-authored-by: MervinPraison <MervinPraison@users.noreply.github.com>

greptile-apps

praisonai-triage-agent[bot] has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

MervinPraison · 2026-04-18T09:29:11Z

@coderabbitai review

MervinPraison · 2026-04-18T09:29:11Z

/review

coderabbitai · 2026-04-18T09:29:17Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-18T09:29:24Z

📝 Walkthrough

Walkthrough

Three concurrent and execution safety updates: (1) ConcurrencyRegistry.acquire_sync() avoids direct semaphore manipulation by delegating to worker threads with isolated event loops; (2) tool_execution.py adopts persistent per-agent ThreadPoolExecutor instead of per-call creation; (3) plugins module adds thread-safe locking for plugin state access.

Changes

Cohort / File(s)	Summary
Semaphore & Event Loop Safety `src/praisonai-agents/praisonaiagents/agent/concurrency.py`	Refactored `acquire_sync()` to avoid manipulating semaphore private state. When a running loop is detected, spawns a worker thread with its own event loop to perform `acquire()` (30s timeout); otherwise creates and destroys an event loop explicitly instead of reusing the cached loop.
Tool Execution & Timeout Handling `src/praisonai-agents/praisonaiagents/agent/tool_execution.py`	Replaced per-call `ThreadPoolExecutor` creation/shutdown with a persistent per-agent executor stored on `self._tool_executor` (max_workers=2). Simplified timeout logic to only cancel futures and wrap errors on `TimeoutError`, removing explicit shutdown calls.
Plugin State Synchronization `src/praisonai-agents/praisonaiagents/plugins/__init__.py`	Added module-level `threading.Lock` to guard plugin state modifications and reads. `enable()`, `disable()`, and `is_enabled()` now acquire the lock when accessing/updating `_plugins_enabled` and `_enabled_plugin_names` to prevent race conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related issues

Concurrency safety gaps in Core SDK: semaphore manipulation, tool executor leak, and unprotected plugin state #1458: Directly addresses the same code-level fixes—safe semaphore acquisition in concurrency.py, persistent executor pattern in tool_execution.py, and module-level locking for plugin state.

Possibly related PRs

fix: Core SDK - resolve unsafe async patterns and global mutable state #1450: Both PRs improve unsafe event-loop usage; main PR avoids semaphore state manipulation via isolated worker-thread loops, while retrieved PR centralizes safe coroutine dispatch with deferred async-lock creation.
fix: address critical concurrency, memory, and resource lifecycle gaps #1366: Both PRs modify tool_execution.py's tool-timeout path, changing how tools are executed in ThreadPoolExecutor and related executor lifecycle behavior.

Poem

🐰 Hops with glee at safer threads,
No semaphore state in our heads,
Executors persist, no churn,
Plugin locks now guard their turn,
Event loops bloom in worker sand—
Concurrency perfected, oh so grand! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective: fixing concurrency safety gaps across three files in the Core SDK.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/issue-1458-20260418-0924

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MervinPraison · 2026-04-18T09:29:32Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:

Bloat check: Are changes minimal and focused? Any unnecessary code or scope creep?
Security: Any hardcoded secrets, unsafe eval/exec, missing input validation?
Performance: Any module-level heavy imports? Hot-path regressions?
Tests: Are tests included? Do they cover the changes adequately?
Backward compat: Any public API changes without deprecation?
Code quality: DRY violations, naming conventions, error handling?
Address reviewer feedback: If Qodo, Coderabbit, or Gemini flagged valid issues, include them in your review
Suggest specific improvements with code examples where possible

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/c9e7c913-a80d-474a-bd5d-731c129ad614 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

greptile-apps

praisonai-triage-agent[bot] has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/c9e7c913-a80d-474a-bd5d-731c129ad614 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

greptile-apps

praisonai-triage-agent[bot] has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copilot · 2026-04-18T09:34:46Z

@copilot Do a thorough review of this PR. Read ALL existing reviewer comments above from Qodo, Coderabbit, and Gemini first — incorporate their findings.

Review areas:...

Completed a full PR review and applied a follow-up fix in 2879179 (plus ec2c519).

Findings by area:

Bloat: scope is still tight (3 original files + 1 focused test update).
Security: no secrets, no eval/exec, no unsafe input handling added.
Performance: no heavy module-level imports added; executor reuse remains in place.
Tests: added/updated concurrency coverage; targeted suite passes (33 passed).
Backward compatibility: no public API signature changes.
Code quality: fixed a deadlock risk in acquire_sync under a running event loop by failing fast with RuntimeError, and added assertions that semaphore state remains correct/recoverable.
Reviewer feedback: no substantive Qodo/Coderabbit/Gemini code findings were present yet in PR comments/review threads; Gemini/CI review workflows show failures with no job logs produced on this branch.

Suggested next improvement (optional): add a small dedicated unit test around ToolExecutionMixin timeout/executor lifecycle, since that path changed but currently has no direct focused regression test.

Screenshot: N/A (no UI changes; backend-only changes).

MervinPraison · 2026-04-18T09:35:13Z

@claude You are the FINAL architecture reviewer. If the branch is under MervinPraison/PraisonAI (not a fork), you are able to make modifications to this branch and push directly. SCOPE: Focus ONLY on Python packages (praisonaiagents, praisonai). Do NOT modify praisonai-rust or praisonai-ts. Read ALL comments above from Gemini, Qodo, CodeRabbit, and Copilot carefully before responding.

Phase 1: Review per AGENTS.md

Protocol-driven: check heavy implementations vs core SDK
Backward compatible: ensure zero feature regressions
Performance: no hot-path regressions

Phase 2: FIX Valid Issues
4. For any VALID bugs or architectural flaws found by Gemini, CodeRabbit, Qodo, Copilot, or any other reviewer: implement the fix
5. Push all code fixes directly to THIS branch (do NOT create a new PR)
6. Comment a summary of exact files modified and what you skipped

Phase 3: Final Verdict
7. If all issues are resolved, approve the PR / close the Issue
8. If blocking issues remain, request changes / leave clear action items

praisonai-triage-agent · 2026-04-18T09:35:34Z

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/praisonai-agents/praisonaiagents/agent/concurrency.py`:
- Around line 92-114: The current acquire_sync implementation binds an
asyncio.Semaphore to different event loops (via new_event_loop or thread worker)
which causes RuntimeError and defeats the timeout; replace this by never
touching asyncio.Semaphore from sync code: update _get_semaphore to create and
return a pair (asyncio.Semaphore, threading.BoundedSemaphore) or store a
thread-safe BoundedSemaphore alongside the async one, remove the
ThreadPoolExecutor/new_event_loop code in acquire_sync and call the
threading.BoundedSemaphore.acquire(timeout=30) there instead, and update
release() (or add release_sync/release_async) to release the corresponding
semaphore; alternatively, make acquire_sync raise a clear RuntimeError directing
callers to await acquire() if you prefer to fail-fast.

In `@src/praisonai-agents/praisonaiagents/agent/tool_execution.py`:
- Around line 205-217: The per-agent executor introduces three issues: racey
lazy init, no shutdown, and ineffective future.cancel(); fix by (1) moving
_tool_executor creation into the Agent constructor (or guard it with a
threading.Lock / cached_property) so execute_tool (or the method containing the
shown code) never does a racy hasattr check, (2) add guarded shutdown logic in
Agent.close() (and the async close counterpart) to call
self._tool_executor.shutdown(wait=False, cancel_futures=True) and clear the
attribute, and (3) handle cancel() returning False after ctx.run has started by
logging the failure and tracking in-flight/orphaned futures (or refusing new
submissions / growing the pool) so hung tasks don’t permanently exhaust the
max_workers=2 pool.

In `@src/praisonai-agents/praisonaiagents/plugins/__init__.py`:
- Line 83: The enable/disable flow is not atomic: _plugins_lock only guards
_plugins_enabled/_enabled_plugin_names while PluginManager._enabled is modified
outside that lock, causing inconsistent is_enabled(None)/is_enabled(name) states
and leaving globals partially set if discovery fails; fix by making plugin
enablement a single synchronized transaction—acquire a single lock that covers
both the globals (_plugins_enabled, _enabled_plugin_names, _plugins_lock) and
the PluginManager._enabled mutations (either by moving the lock into
PluginManager as PluginManager._lock and using it inside
PluginManager.enable/disable/is_enabled, or by always acquiring both locks in a
defined order before making changes), perform discovery and all
PluginManager.enable/disable calls while holding the lock, and only update the
globals after successful enablement (or roll back on failure) so functions like
PluginManager.enable/disable and is_enabled are consistently protected.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6ae6330d-311c-4ba3-ab1e-56f4fcf712fe

📥 Commits

Reviewing files that changed from the base of the PR and between e1642ee and e9d8602.

📒 Files selected for processing (3)

src/praisonai-agents/praisonaiagents/agent/concurrency.py
src/praisonai-agents/praisonaiagents/agent/tool_execution.py
src/praisonai-agents/praisonaiagents/plugins/__init__.py

coderabbitai · 2026-04-18T09:40:49Z

        sem = self._get_semaphore(agent_name)
        if sem is None:
            return
        try:
            asyncio.get_running_loop()
-            # If we're in an async context, we can't block
-            # Just try_acquire or no-op with warning
-            if not sem._value > 0:
-                logger.warning(
-                    f"Sync acquire for '{agent_name}' while async loop running and semaphore full. "
-                    f"Consider using async acquire() instead."
-                )
-            # Decrement manually for sync context
-            sem._value = max(0, sem._value - 1)
+            # We're inside an async loop — cannot block. Run acquire in a thread
+            # with its own loop to go through the semaphore's proper acquire path.
+            import concurrent.futures
+            def _acquire_in_new_loop():
+                loop = asyncio.new_event_loop()
+                try:
+                    loop.run_until_complete(sem.acquire())
+                finally:
+                    loop.close()
+            with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+                pool.submit(_acquire_in_new_loop).result(timeout=30)
        except RuntimeError:
-            # No running loop — safe to use asyncio.run
-            asyncio.get_event_loop().run_until_complete(sem.acquire())
+            # No running loop — safe to create one
+            loop = asyncio.new_event_loop()
+            try:
+                loop.run_until_complete(sem.acquire())
+            finally:
+                loop.close()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm release() path and any existing sync/async mixing callers. rg -nP -C3 '\bacquire_sync\b|registry\.release\b|ConcurrencyRegistry\b'

Repository: MervinPraison/PraisonAI

Length of output: 11770

🏁 Script executed:

cat -n src/praisonai-agents/praisonaiagents/agent/concurrency.py

Repository: MervinPraison/PraisonAI

Length of output: 6363

🏁 Script executed:

# Find actual usage of acquire_sync and release in test/agent code rg -n "acquire_sync|\.release\(" src/praisonai-agents/praisonaiagents/ src/praisonai-agents/tests/ -A 2

Repository: MervinPraison/PraisonAI

Length of output: 4024

🏁 Script executed:

# Check the test more carefully - is it in async context? sed -n '101,110p' src/praisonai-agents/tests/unit/agent/test_agent_concurrency.py # Also check if acquire_sync is called from actual async code anywhere rg -n "acquire_sync" --type py

Repository: MervinPraison/PraisonAI

Length of output: 760

🏁 Script executed:

# Check Python version and asyncio.Semaphore behavior python3 --version # Let's create a test script to see what actually happens with cross-loop semaphore usage cat > /tmp/test_semaphore.py << 'EOF' import asyncio import threading # Test 1: Semaphore created, then acquire in separate loop sem = asyncio.Semaphore(1) print(f"1. Created sem: {sem}") print(f" sem._loop initially: {getattr(sem, '_loop', 'not set')}") # Acquire in a new loop (the worker thread approach) def acquire_in_new_loop(): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) print(f"2. Worker thread loop: {loop}") try: loop.run_until_complete(sem.acquire()) print(f"3. After acquire in worker, sem._loop: {getattr(sem, '_loop', 'not set')}") finally: loop.close() print(f"4. After loop.close() in worker") t = threading.Thread(target=acquire_in_new_loop) t.start() t.join() print(f"5. Main thread, after worker finishes") print(f" sem._loop now: {getattr(sem, '_loop', 'not set')}") # Try to release from main thread (no loop running) try: sem.release() print(f"6. Release succeeded from main thread") except Exception as e: print(f"6. Release failed: {type(e).__name__}: {e}") # Test 2: Create loop, acquire, close immediately print("\n--- Test 2: Close loop immediately ---") sem2 = asyncio.Semaphore(1) loop2 = asyncio.new_event_loop() asyncio.set_event_loop(loop2) try: loop2.run_until_complete(sem2.acquire()) print(f"Acquired in loop2, sem2._loop: {getattr(sem2, '_loop', 'not set')}") finally: loop2.close() asyncio.set_event_loop(None) print(f"After loop2.close(), sem2._loop: {getattr(sem2, '_loop', 'not set')}") try: sem2.release() print(f"Release succeeded from main thread after loop close") except Exception as e: print(f"Release failed: {type(e).__name__}: {e}") EOF python3 /tmp/test_semaphore.py

Repository: MervinPraison/PraisonAI

Length of output: 624

🏁 Script executed:

# Let's test the timeout behavior specifically cat > /tmp/test_timeout.py << 'EOF' import asyncio import concurrent.futures import time import threading # Create a semaphore that will never acquire (locked) sem = asyncio.Semaphore(0) # Start at 0, so acquire will block def acquire_in_new_loop(): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) print(f"[Worker] Starting acquire in new loop at {time.time():.2f}") try: # This will hang forever since sem is at 0 loop.run_until_complete(sem.acquire()) print(f"[Worker] Acquired (should not happen)") except asyncio.CancelledError: print(f"[Worker] Cancelled") finally: loop.close() print(f"[Worker] Finished at {time.time():.2f}") start = time.time() print(f"Starting at {start:.2f}") try: with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool: print(f"Pool created at {time.time() - start:.2f}s") future = pool.submit(acquire_in_new_loop) try: result = future.result(timeout=2) print(f"Got result: {result}") except concurrent.futures.TimeoutError: print(f"[Main] result() raised TimeoutError at {time.time() - start:.2f}s") print(f"[Main] Now exiting 'with' block (this will block waiting for worker)...") except Exception as e: print(f"[Main] Exception: {e}") print(f"[Main] Exited 'with' block at {time.time() - start:.2f}s") print(f"Total time elapsed: {time.time() - start:.2f}s") EOF timeout 10 python3 /tmp/test_timeout.py

Repository: MervinPraison/PraisonAI

Length of output: 115

🏁 Script executed:

# Test the timeout behavior without timeout command - use shorter timeout cat > /tmp/test_timeout.py << 'EOF' import asyncio import concurrent.futures import time import threading # Create a semaphore that will never acquire (locked) sem = asyncio.Semaphore(0) # Start at 0, so acquire will block def acquire_in_new_loop(): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) print(f"[Worker] Starting acquire in new loop at {time.time():.2f}") try: # This will hang forever since sem is at 0 loop.run_until_complete(sem.acquire()) print(f"[Worker] Acquired (should not happen)") except asyncio.CancelledError: print(f"[Worker] Cancelled") finally: loop.close() print(f"[Worker] Finished at {time.time():.2f}") start = time.time() print(f"Starting at {start:.2f}") # Test without hanging forever - use very short timeout with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool: print(f"Pool created at {time.time() - start:.2f}s") future = pool.submit(acquire_in_new_loop) try: result = future.result(timeout=1) print(f"Got result: {result}") except concurrent.futures.TimeoutError: print(f"[Main] result(timeout=1) raised TimeoutError at {time.time() - start:.2f}s") print(f"[Main] Now about to exit 'with' block (ThreadPoolExecutor.__exit__ will do shutdown(wait=True))...") print(f"[Main] Exited 'with' block at {time.time() - start:.2f}s") print(f"Total time elapsed: {time.time() - start:.2f}s (should be >> 1 second due to shutdown wait)") EOF # Run with a shell timeout ( sleep 15; kill $$ 2>/dev/null ) & python3 /tmp/test_timeout.py ; pkill -P $$

Repository: MervinPraison/PraisonAI

Length of output: 153

🏁 Script executed:

# Check if Python 3.11 asyncio.Semaphore stores loop ref differently python3 << 'EOF' import asyncio import sys print(f"Python version: {sys.version}") # Check Semaphore internals sem = asyncio.Semaphore(1) print(f"\nSemaphore attributes: {dir(sem)}") print(f"\nSemaphore __dict__: {sem.__dict__}") # After creating a semaphore, check what happens when we await it in different loops import asyncio loop1 = asyncio.new_event_loop() asyncio.set_event_loop(loop1) async def test1(): print(f"\nIn loop1, sem.__dict__ before acquire: {sem.__dict__}") await sem.acquire() print(f"In loop1, sem.__dict__ after acquire: {sem.__dict__}") loop1.run_until_complete(test1()) # Now try from a different loop loop2 = asyncio.new_event_loop() asyncio.set_event_loop(loop2) async def test2(): print(f"\nIn loop2, sem.__dict__ before acquire: {sem.__dict__}") await sem.acquire() print(f"In loop2, sem.__dict__ after acquire: {sem.__dict__}") try: loop2.run_until_complete(test2()) except RuntimeError as e: print(f"In loop2, got error: {e}") loop1.close() loop2.close() EOF

Repository: MervinPraison/PraisonAI

Length of output: 153

🏁 Script executed:

# Simpler check - just look at asyncio.Semaphore source behavior without executing problematic code python3 -c " import asyncio import inspect # Get the source of asyncio.Semaphore to understand how it handles loops print('=== asyncio.Semaphore.acquire source ===') print(inspect.getsource(asyncio.Semaphore.acquire)) "

Repository: MervinPraison/PraisonAI

Length of output: 1260

🏁 Script executed:

python3 -c " import asyncio import inspect # Get the source of asyncio.Semaphore._get_loop to understand loop binding print('=== asyncio.Semaphore._get_loop source ===') try: print(inspect.getsource(asyncio.Semaphore._get_loop)) except: print('_get_loop not found in Semaphore, checking base Lock...') print(inspect.getsource(asyncio.Lock._get_loop)) "

Repository: MervinPraison/PraisonAI

Length of output: 451

🏁 Script executed:

python3 -c " import asyncio import inspect # Get the source of asyncio.Semaphore.release to understand how it handles the loop print('=== asyncio.Semaphore.release source ===') try: print(inspect.getsource(asyncio.Semaphore.release)) except: print('Checking base Lock.release...') print(inspect.getsource(asyncio.Lock.release)) "

Repository: MervinPraison/PraisonAI

Length of output: 412

🏁 Script executed:

# Now test the actual critical issue: RuntimeError when semaphore is bound to wrong loop python3 << 'EOF' import asyncio import concurrent.futures sem = asyncio.Semaphore(1) print("Initial state: sem._loop =", sem._loop) # Simulate the problematic path: acquire in worker thread def acquire_in_worker(): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) print("Worker: before acquire, sem._loop =", sem._loop) loop.run_until_complete(sem.acquire()) print("Worker: after acquire, sem._loop =", sem._loop) loop.close() print("Worker: loop closed") # Run it with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool: pool.submit(acquire_in_worker).result() print("Main: after worker finished, sem._loop =", sem._loop) # Now try to use semaphore from async context async def use_semaphore(): print("Async: attempting acquire with sem._loop =", sem._loop) try: await sem.acquire() print("Async: acquire succeeded") except RuntimeError as e: print(f"Async: RuntimeError: {e}") # Create a new loop in the main thread loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: loop.run_until_complete(use_semaphore()) finally: loop.close() EOF

Repository: MervinPraison/PraisonAI

Length of output: 153

🏁 Script executed:

# Minimal test without actual event loop execution python3 << 'EOF' import asyncio # Test what _get_loop does when called from non-running context sem = asyncio.Semaphore(1) print("Semaphore._loop initially:", sem._loop) # Check if calling _get_loop outside running loop works try: result = sem._get_loop() print("_get_loop() succeeded:", result) except RuntimeError as e: print("_get_loop() raises RuntimeError:", e) EOF

Repository: MervinPraison/PraisonAI

Length of output: 126

🏁 Script executed:

python3 << 'EOF' import asyncio # Check _wake_up_next implementation import inspect print("=== Semaphore._wake_up_next source ===") try: print(inspect.getsource(asyncio.Semaphore._wake_up_next)) except AttributeError: # Try base class print(inspect.getsource(asyncio.Lock._wake_up_next)) EOF

Repository: MervinPraison/PraisonAI

Length of output: 399

🏁 Script executed:

# Check if the test actually runs the problematic code path # The test_sync_acquire_release is NOT in an async test, so it won't hit the RuntimeError branch sed -n '101,115p' src/praisonai-agents/tests/unit/agent/test_agent_concurrency.py # Verify: is it decorated with `@pytest.mark.asyncio`? sed -n '100,115p' src/praisonai-agents/tests/unit/agent/test_agent_concurrency.py

Repository: MervinPraison/PraisonAI

Length of output: 809

🏁 Script executed:

# Check if any test calls acquire_sync from within an `@pytest.mark.asyncio` test rg -B 5 "acquire_sync" src/praisonai-agents/tests/unit/agent/test_agent_concurrency.py

Repository: MervinPraison/PraisonAI

Length of output: 362

🏁 Script executed:

# Search for actual usage of acquire_sync in agent code (not tests) rg -n "acquire_sync" src/praisonai-agents/praisonaiagents/ --type py --not-tests # Also check if there are any agent implementations that mix sync and async rg -B5 -A5 "def.*acquire" src/praisonai-agents/praisonaiagents/agent/ --type py | head -50

Repository: MervinPraison/PraisonAI

Length of output: 3324

Cross-loop asyncio.Semaphore binding will cause RuntimeError and the timeout guarantee is defeated.

asyncio.Semaphore binds itself to the first event loop that calls _get_loop() (during acquire()). This rewrite introduces subtle issues:

Lines 109–114 (no running loop branch): Creates a loop, acquires (which binds the semaphore to that loop), then immediately closes it. The semaphore retains a reference to the closed loop in _loop. Any subsequent await acquire() from a different event loop will raise RuntimeError: Semaphore object is bound to a different event loop because _get_loop() enforces loop identity.

Lines 96–107 (inside running loop branch): Acquires in a worker thread's new loop, binding the semaphore to that short-lived loop. If release() is later called from the main async context (which has a different loop), the same RuntimeError occurs. Even if it doesn't, the separated loop contexts defeat the purpose of a semaphore.

Timeout is illusory: with ThreadPoolExecutor(...) as pool calls shutdown(wait=True) on __exit__, blocking indefinitely until the worker finishes. If result(timeout=30) raises TimeoutError, the with block still waits for the hung acquire(), defeating the 30-second guarantee.

Violates coding guidelines: "use asyncio primitives for coordination, not threading" — this bridges an asyncio primitive via threading, which breaks loop-binding semantics. The pattern "One event loop per thread; never nest event loops" is intended to prevent exactly this scenario.

Recommended fix: Don't use asyncio.Semaphore for sync contexts. Instead:

Maintain a parallel threading.BoundedSemaphore for the sync path, created alongside the async one in _get_semaphore()

In acquire_sync(), use the sync semaphore with a proper acquire(timeout=...) call

Update release() to release whichever semaphore was acquired (or split into release_sync())

Alternatively, make acquire_sync() raise a clear error when called from within a running loop, directing users to await acquire() instead (fail fast per guidelines).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/praisonai-agents/praisonaiagents/agent/concurrency.py` around lines 92 - 114, The current acquire_sync implementation binds an asyncio.Semaphore to different event loops (via new_event_loop or thread worker) which causes RuntimeError and defeats the timeout; replace this by never touching asyncio.Semaphore from sync code: update _get_semaphore to create and return a pair (asyncio.Semaphore, threading.BoundedSemaphore) or store a thread-safe BoundedSemaphore alongside the async one, remove the ThreadPoolExecutor/new_event_loop code in acquire_sync and call the threading.BoundedSemaphore.acquire(timeout=30) there instead, and update release() (or add release_sync/release_async) to release the corresponding semaphore; alternatively, make acquire_sync raise a clear RuntimeError directing callers to await acquire() if you prefer to fail-fast.

coderabbitai · 2026-04-18T09:40:49Z

+                # Use reusable executor to prevent resource leaks
+                if not hasattr(self, '_tool_executor'):
+                    self._tool_executor = concurrent.futures.ThreadPoolExecutor(
+                        max_workers=2, thread_name_prefix=f"tool-{self.name}"
+                    )
+
+                future = self._tool_executor.submit(ctx.run, execute_with_context)
                try:
-                    future = executor.submit(ctx.run, execute_with_context)
-                    try:
-                        result = future.result(timeout=tool_timeout)
-                    except concurrent.futures.TimeoutError:
-                        # Cancel and shutdown immediately to avoid blocking
-                        future.cancel()
-                        executor.shutdown(wait=False, cancel_futures=True)
-                        logging.warning(f"Tool {function_name} timed out after {tool_timeout}s")
-                        result = {"error": f"Tool timed out after {tool_timeout}s", "timeout": True}
-                    else:
-                        # Normal completion - shutdown gracefully
-                        executor.shutdown(wait=False)
-                finally:
-                    # Ensure executor is always cleaned up
-                    if not executor._shutdown:
-                        executor.shutdown(wait=False)
+                    result = future.result(timeout=tool_timeout)
+                except concurrent.futures.TimeoutError:
+                    future.cancel()
+                    logging.warning(f"Tool {function_name} timed out after {tool_timeout}s")
+                    result = {"error": f"Tool timed out after {tool_timeout}s", "timeout": True}


⚠️ Potential issue | 🔴 Critical

New per-agent executor has three concurrency / lifecycle gaps worth addressing before this "concurrency fix" lands.

Race in lazy initialization (Line 206-209). hasattr(self, '_tool_executor') followed by assignment is not atomic. Two threads invoking execute_tool concurrently on the same agent can both observe the attribute as missing and each construct a ThreadPoolExecutor; the loser is orphaned (its worker thread lingers until GC). For a PR whose stated goal is concurrency safety this is the exact failure mode being patched elsewhere — initialize the executor eagerly in __init__ (or guard creation with a threading.Lock / functools.cached_property-style idiom) so first-use contention cannot double-spawn pools.

Executor is never shut down → thread leak on agent close. The Agent.close() implementation in src/praisonai-agents/praisonaiagents/agent/agent.py (lines 4570-4640, provided as context) cleans up memory, LLM clients, MCP clients, server registrations and background tasks but has no branch for _tool_executor. Each Agent instance that executes a timed-out-or-not tool will retain up to 2 worker threads for the life of the process. This converts the prior "per-call executor leak" into a "per-agent executor leak" rather than fixing it. Add a self._tool_executor.shutdown(wait=False, cancel_futures=True) step (guarded by hasattr) to close() (and the async counterpart if any).

future.cancel() does not stop an already-running tool (Line 215). concurrent.futures.Future.cancel() only succeeds while the task is still in the queue; once ctx.run(execute_with_context) has started, cancel() returns False and the worker thread keeps running the tool body to completion. With max_workers=2, two consecutive timeouts from hung tools will wedge the whole pool and every subsequent tool call on this agent will block inside future.result(timeout=tool_timeout) waiting for a free worker before its own timeout even starts counting. At minimum, log when cancel() returns False so the leak is observable; better, track the number of orphaned in-flight futures and either refuse further submissions or grow the pool.

🔒 Suggested direction (illustrative, not a drop-in)

- # Use reusable executor to prevent resource leaks - if not hasattr(self, '_tool_executor'): - self._tool_executor = concurrent.futures.ThreadPoolExecutor( - max_workers=2, thread_name_prefix=f"tool-{self.name}" - ) - - future = self._tool_executor.submit(ctx.run, execute_with_context) + # Reusable executor; created lazily under a lock to avoid + # double-initialization races on concurrent first use. + executor = getattr(self, '_tool_executor', None) + if executor is None: + import threading + lock = self.__dict__.setdefault('_tool_executor_lock', threading.Lock()) + with lock: + executor = getattr(self, '_tool_executor', None) + if executor is None: + executor = concurrent.futures.ThreadPoolExecutor( + max_workers=2, + thread_name_prefix=f"tool-{self.name}", + ) + self._tool_executor = executor + + future = executor.submit(ctx.run, execute_with_context) try: result = future.result(timeout=tool_timeout) except concurrent.futures.TimeoutError: - future.cancel() + if not future.cancel(): + logging.warning( + f"Tool {function_name} timed out after {tool_timeout}s " + "and is still running in the executor thread; worker " + "slot will remain occupied until the tool returns." + ) logging.warning(f"Tool {function_name} timed out after {tool_timeout}s") result = {"error": f"Tool timed out after {tool_timeout}s", "timeout": True}

And in Agent.close() (agent.py):

# Tool executor cleanup try: executor = getattr(self, '_tool_executor', None) if executor is not None: executor.shutdown(wait=False, cancel_futures=True) self._tool_executor = None except Exception as e: logger.warning(f"Tool executor cleanup failed: {e}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/praisonai-agents/praisonaiagents/agent/tool_execution.py` around lines 205 - 217, The per-agent executor introduces three issues: racey lazy init, no shutdown, and ineffective future.cancel(); fix by (1) moving _tool_executor creation into the Agent constructor (or guard it with a threading.Lock / cached_property) so execute_tool (or the method containing the shown code) never does a racy hasattr check, (2) add guarded shutdown logic in Agent.close() (and the async close counterpart) to call self._tool_executor.shutdown(wait=False, cancel_futures=True) and clear the attribute, and (3) handle cancel() returning False after ctx.run has started by logging the failure and tracking in-flight/orphaned futures (or refusing new submissions / growing the pool) so hung tasks don’t permanently exhaust the max_workers=2 pool.

coderabbitai · 2026-04-18T09:40:49Z

 # Global state for plugin system (lazy initialized)
+import threading
+
+_plugins_lock = threading.Lock()


⚠️ Potential issue | 🟠 Major

Make plugin enablement updates one synchronized transaction.

_plugins_lock protects only _plugins_enabled / _enabled_plugin_names; the actual PluginManager._enabled reads/writes still happen outside that lock, and manager.py:104-120 accesses that dict without using PluginManager._lock. Concurrent enable() / disable() can leave is_enabled(None) and is_enabled(name) disagreeing. Also, if discovery fails after Line 115, globals remain enabled even though plugins were not enabled.

🔒 Proposed synchronization tightening

-_plugins_lock = threading.Lock() +_plugins_lock = threading.RLock() _plugins_enabled: bool = False _enabled_plugin_names: list = None # None = all, list = specific @@ def enable(plugins: list = None) -> None: @@ global _plugins_enabled, _enabled_plugin_names - - with _plugins_lock: - _plugins_enabled = True - _enabled_plugin_names = plugins # None = all, list = specific + target_plugins = list(plugins) if plugins is not None else None @@ - # Snapshot the names under lock to avoid TOCTOU - with _plugins_lock: - target_plugins = list(_enabled_plugin_names) if _enabled_plugin_names is not None else None - - # Enable specific plugins or all - if target_plugins is not None: - # Enable only specified plugins - for name in target_plugins: - manager.enable(name) - else: - # Enable all discovered plugins - for plugin_info in manager.list_plugins(): - manager.enable(plugin_info.get("name", "")) + with _plugins_lock: + # Enable specific plugins or all + if target_plugins is not None: + # Enable only specified plugins + for name in target_plugins: + manager.enable(name) + else: + # Enable all discovered plugins + for plugin_info in manager.list_plugins(): + manager.enable(plugin_info.get("name", "")) + + _plugins_enabled = True + _enabled_plugin_names = target_plugins # None = all, list = specific @@ if plugins is not None: # Disable specific plugins - for name in plugins: - manager.disable(name) + with _plugins_lock: + for name in plugins: + manager.disable(name) else: # Disable all plugins with _plugins_lock: + for plugin_info in manager.list_plugins(): + manager.disable(plugin_info.get("name", "")) _plugins_enabled = False _enabled_plugin_names = None - for plugin_info in manager.list_plugins(): - manager.disable(plugin_info.get("name", "")) @@ def is_enabled(name: str = None) -> bool: @@ with _plugins_lock: if name is None: return _plugins_enabled - + from .manager import get_plugin_manager manager = get_plugin_manager() - return manager.is_enabled(name) + with _plugins_lock: + return manager.is_enabled(name)

If PluginManager remains publicly usable, also move the lock into src/praisonai-agents/praisonaiagents/plugins/manager.py so direct manager calls are protected too:

def enable(self, name: str) -> bool: """Enable a plugin.""" - if name in self._enabled: - self._enabled[name] = True - return True - return False + with self._lock: + if name in self._enabled: + self._enabled[name] = True + return True + return False def disable(self, name: str) -> bool: """Disable a plugin.""" - if name in self._enabled: - self._enabled[name] = False - return True - return False + with self._lock: + if name in self._enabled: + self._enabled[name] = False + return True + return False def is_enabled(self, name: str) -> bool: """Check if a plugin is enabled.""" - return self._enabled.get(name, False) + with self._lock: + return self._enabled.get(name, False)

Also applies to: 114-138, 163-173, 237-243

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/praisonai-agents/praisonaiagents/plugins/__init__.py` at line 83, The enable/disable flow is not atomic: _plugins_lock only guards _plugins_enabled/_enabled_plugin_names while PluginManager._enabled is modified outside that lock, causing inconsistent is_enabled(None)/is_enabled(name) states and leaving globals partially set if discovery fails; fix by making plugin enablement a single synchronized transaction—acquire a single lock that covers both the globals (_plugins_enabled, _enabled_plugin_names, _plugins_lock) and the PluginManager._enabled mutations (either by moving the lock into PluginManager as PluginManager._lock and using it inside PluginManager.enable/disable/is_enabled, or by always acquiring both locks in a defined order before making changes), perform discovery and all PluginManager.enable/disable calls while holding the lock, and only update the globals after successful enablement (or roll back on failure) so functions like PluginManager.enable/disable and is_enabled are consistently protected.

MervinPraison · 2026-04-18T16:57:32Z

PR #1459 — ✅ Approve

Local validation on pr-1459 branch:

tests/unit/agent/test_agent_concurrency.py — 11 passed
tests/unit/agent/ (full suite)             — 291 passed, 0 failed

Review against AGENTS.md

§4.5 Async-Safe / Multi-Agent Safe — fix is correct

concurrency.py: replaces the unsafe sem._value private-attr manipulation with explicit semantics — raise RuntimeError in a running loop, create a fresh loop when none exists. The old path silently undercounted slots under contention. The new path is correct and fails loud.
tool_execution.py: agent-level reusable ThreadPoolExecutor (max_workers=2, thread_name_prefix=f"tool-{self.name}") instead of per-call throwaway executors — eliminates the resource leak.
plugins/__init__.py: thread lock around plugin state mutation — removes the cross-thread race.

§4.6 Invariants

✅ Backward compatible public API (acquire_sync signature unchanged; behavior change is documented as moving from unsafe-silent to safe-explicit).
✅ No new deps.
✅ All existing agent unit tests still pass.

Minor observation (non-blocking)

tool_execution.py attaches _tool_executor via hasattr rather than __init__. Works, but consider adding self._tool_executor: Optional[ThreadPoolExecutor] = None at class init and a __del__/close() hook to shut it down cleanly — prevents lingering threads when an Agent instance is garbage-collected mid-run. Not a blocker; can ship as a follow-up.

Verdict

Scope-focused, tests green, root-cause fixes (not workarounds). Ready to merge.

greptile-apps bot reviewed Apr 18, 2026

View reviewed changes

praisonai-triage-agent bot mentioned this pull request Apr 18, 2026

Concurrency safety gaps in Core SDK: semaphore manipulation, tool executor leak, and unprotected plugin state #1458

Closed

Copilot started work on behalf of MervinPraison April 18, 2026 09:29 View session

fix: avoid acquire_sync deadlock under running event loop

ec2c519

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/c9e7c913-a80d-474a-bd5d-731c129ad614 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

greptile-apps bot reviewed Apr 18, 2026

View reviewed changes

test: enforce fail-fast acquire_sync behavior in async contexts

2879179

Agent-Logs-Url: https://github.com/MervinPraison/PraisonAI/sessions/c9e7c913-a80d-474a-bd5d-731c129ad614 Co-authored-by: MervinPraison <454862+MervinPraison@users.noreply.github.com>

greptile-apps bot reviewed Apr 18, 2026

View reviewed changes

Copilot finished work on behalf of MervinPraison April 18, 2026 09:35

Copilot AI requested a review from MervinPraison April 18, 2026 09:35

coderabbitai bot reviewed Apr 18, 2026

View reviewed changes

MervinPraison merged commit cf97450 into main Apr 18, 2026
26 checks passed

MervinPraison deleted the claude/issue-1458-20260418-0924 branch April 18, 2026 18:51

MervinPraison mentioned this pull request Apr 18, 2026

docs: document concurrency safety behaviour from PraisonAI PR #1459 MervinPraison/PraisonAIDocs#182

Open

10 tasks

Uh oh!

Conversation

praisonai-triage-agent bot commented Apr 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

MervinPraison commented Apr 18, 2026

Uh oh!

MervinPraison commented Apr 18, 2026

Uh oh!

coderabbitai bot commented Apr 18, 2026

Uh oh!

coderabbitai bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

MervinPraison commented Apr 18, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 18, 2026

Uh oh!

MervinPraison commented Apr 18, 2026

Uh oh!

praisonai-triage-agent bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final Architecture Review

Phase 1: Architecture Validation per AGENTS.md

Phase 2: Issue Analysis & Fixes

Phase 3: Final Verdict

🏗️ Architecture Review Summary

✅ COMPLIANT - Protocol-Driven Design

✅ COMPLIANT - Backward Compatibility

✅ COMPLIANT - Performance

🔧 Gap Fixes Implemented

Gap 1: Unsafe asyncio.Semaphore Access ✅ FIXED

Gap 2: ThreadPoolExecutor Leak ✅ FIXED

Gap 3: Plugin State Race Conditions ✅ FIXED

🧪 Testing Results

📋 Reviewer Feedback Status

🎯 FINAL VERDICT: APPROVED

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

MervinPraison commented Apr 18, 2026

PR #1459 — ✅ Approve

Review against AGENTS.md

Minor observation (non-blocking)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

praisonai-triage-agent bot commented Apr 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 18, 2026 •

edited

Loading

praisonai-triage-agent bot commented Apr 18, 2026 •

edited

Loading

Gap 1: Unsafe `asyncio.Semaphore` Access ✅ FIXED