You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In-depth architectural analysis of the core SDK (praisonaiagents) identified 3 key gaps that directly violate the project's stated principles (DRY, performance-first, protocol-driven, minimal API). These are distinct from the production robustness issues tracked in #1370 and the handoff unification in #1290 — they concern the execution core (LLM dispatch, tool calling, streaming) rather than session/concurrency/error semantics.
Each gap below includes verified file:line references, user-facing impact, and a concrete protocol-driven fix aligned with PraisonAI's philosophy.
Gap 1: Sync/Async Duplication in Core Execution Paths
Severity: High | Affects: Maintainability, feature parity, bug surface area
Core execution logic is duplicated across parallel sync and async implementations instead of sharing a single async-first core with a thin sync bridge.
Evidence
praisonaiagents/agent/chat_mixin.py — chat() / _chat_impl() and achat() / _achat_impl() re-implement the same business logic (retrieval, context assembly, hook dispatch, guardrail application, tool invocation) in two places.
praisonaiagents/agent/execution_mixin.py — execute_tool() vs execute_tool_async() maintain parallel code paths for the same concern.
praisonaiagents/llm/llm.py — get_response() vs get_response_async() carry overlapping provider dispatch, streaming decisions, retry/error logic, and message building.
Every bug fix must be applied twice. Features silently drift (streaming, guardrails, context truncation, cost tracking) between sync and async surfaces.
Complicates contribution: new providers and features must remember to mirror changes in both paths.
Hot-path code is larger than necessary → higher cognitive load, more places for regressions.
Proposed Fix (protocol-driven, minimal API)
Finish the unification started in unified_chat_mixin.py across the remaining execution surface:
Single async-first core: one async def _execute(...) method per concern (chat, tool execution, LLM dispatch).
Thin sync bridge: sync entry points become asyncio.run(self._execute(...)) or use run_coroutine_threadsafe when already inside an event loop.
Deprecate duplicated sync-only impls once parity is verified.
Keep public API unchanged: agent.chat(...) / agent.achat(...), agent.start(...) / agent.astart(...) remain as-is.
Aligns with: DRY, protocol-driven core, "3-way feature surface" (CLI/YAML/Python all benefit from a single implementation).
Gap 2: LLM Tool Calls Execute Sequentially — No Parallelism for Batched Tool Calls
Severity: High | Affects: Latency, LLM efficiency, user experience
When the LLM returns multiple tool calls in a single assistant message (OpenAI, Anthropic, Gemini all support this), PraisonAI executes them one after another instead of in parallel.
Per-tool timeout infrastructure already exists in praisonaiagents/agent/tool_execution.py (execute_tool with timeout handling around lines 196-223), and concurrent.futures is imported at agent/agent.py:9 and agent/execution_mixin.py:336 — but only used for autonomous runs, not for batched tool calls.
Note: #1290 addressed parallel handoffs to other agents, which is a separate code path from batched LLM tool calls on a single agent — this gap is not covered there.
Impact
If an LLM requests, say, 3 independent tool calls (e.g. fetch_user, fetch_analytics, fetch_config) at 300 ms each, total latency is ~900 ms instead of ~300 ms.
Directly hurts agent workflows that batch I/O (web scraping, multi-source aggregation, parallel API queries) — the exact use-case where agent frameworks should shine.
User perceives the agent as "stuck" during tool phase because no tokens are streamed while tools run serially.
Introduce a ToolCallExecutor protocol with two implementations, selected by a single Agent(parallel_tool_calls=True) flag (default off to preserve current behavior):
# praisonaiagents/tools/call_executor.py (new, lightweight protocol)classToolCallExecutor(Protocol):
defexecute_batch(self, calls: list[ToolCall]) ->list[ToolResult]: ...
classSequentialToolCallExecutor:
# current behavior — safe default fallback
...
classParallelToolCallExecutor:
# thread pool for sync tools, asyncio.gather for async tools# respects existing per-tool timeout; bounded max_workers
...
Aligns with: performance-first, minimal API (one flag), protocol-driven core, no hot-path regressions (off by default).
Gap 3: Streaming Logic Is Fragmented Across Providers — Not Unified Behind a Protocol
Severity: Medium | Affects: Reliability, provider parity, future maintainability
Streaming capability decisions, error handling, and fallback behavior are scattered as provider-specific conditionals inside the main LLM loop rather than isolated behind a streaming-provider protocol.
praisonaiagents/llm/adapters/__init__.py:176-182 — verified: Anthropic streaming is hard-disabled with a comment explaining a litellm/async-generator bug:
defsupports_streaming(self) ->bool:
# litellm.acompletion with stream=True returns a ModelResponse (not async generator)# for Anthropic in the async path, causing 'async for requires __aiter__' errorreturnFalsedefsupports_streaming_with_tools(self) ->bool:
returnFalse
Users who request streaming with Anthropic silently fall back to non-streaming with no signal.
praisonaiagents/streaming/events.py defines StreamEvent types, but the actual streaming deltas emitted from llm.py don't consistently use this protocol — it's used mostly for trace/observability events.
Each provider's streaming error handling lives inline in llm.py rather than on the adapter, so adding a new provider requires editing the core loop instead of dropping in an adapter.
Impact
Inconsistent user expectations: some providers stream, some don't, tools-plus-streaming combos vary by provider — behavior is hard to predict without reading source.
Silent fallback from streaming to non-streaming is invisible to callers who rely on stream=True for perceived latency.
Adding a new LLM provider touches core llm.py instead of just an adapter — violates protocol-driven principle.
The existing StreamEvent protocol is underused, leaving streaming semantics informal.
Proposed Fix (complete the existing adapter pattern)
Extend adapters in praisonaiagents/llm/adapters/ with a full streaming interface:
Move the Gemini/Ollama/Anthropic-specific branches out of llm.py:2080-2300 and onto each adapter.
Standardize streaming deltas on streaming/events.py::StreamEvent so downstream consumers (UI, telemetry, hooks) have a stable contract.
When an adapter cannot stream in the requested configuration, emit an observable StreamUnavailableEvent instead of silent fallback — users can log, warn, or choose a different provider.
Aligns with: protocol-driven core, DRY, production-ready (transparent fallback), extensibility (new providers = new adapter only).
Why these three, and not others
Filtered out by scope:
Not documentation, not tests/coverage, not file size / LOC.
Not generic "add more features" — the philosophy is minimal API.
The remaining highest-leverage gaps are: the core execution paths are still duplicated (Gap 1), the hottest performance path loses 2–3x on batched tool calls (Gap 2), and the provider abstraction leaks into the core loop for streaming (Gap 3). Each of these directly violates a stated pillar and each has a concrete protocol-driven fix.
Suggested Priority
Gap 2 — Parallel tool execution. Smallest change, largest user-visible performance win, opt-in (zero regression risk).
Gap 3 — Streaming protocol. Medium-effort cleanup; unlocks reliable multi-provider streaming and new-provider onboarding.
Acceptance Criteria (high level)
Gap 1: chat/tool-execution sync paths delegate to a shared async core; no duplicated business logic between _chat_impl / _achat_impl and execute_tool / execute_tool_async.
Gap 2: Agent(parallel_tool_calls=True) runs batched LLM tool calls concurrently with bounded workers, per-tool timeout, and identical result ordering semantics; default remains sequential.
Gap 3: Each LLM adapter owns its streaming capability decisions and error recovery; llm.py contains no provider-name conditionals for streaming; Anthropic streaming either works end-to-end via the adapter or surfaces an explicit StreamUnavailableEvent.
Real agentic test per feature: a live agent.start("…") that exercises each fix against an actual LLM, not just object construction.
Summary
In-depth architectural analysis of the core SDK (
praisonaiagents) identified 3 key gaps that directly violate the project's stated principles (DRY, performance-first, protocol-driven, minimal API). These are distinct from the production robustness issues tracked in #1370 and the handoff unification in #1290 — they concern the execution core (LLM dispatch, tool calling, streaming) rather than session/concurrency/error semantics.Each gap below includes verified file:line references, user-facing impact, and a concrete protocol-driven fix aligned with PraisonAI's philosophy.
Gap 1: Sync/Async Duplication in Core Execution Paths
Severity: High | Affects: Maintainability, feature parity, bug surface area
Core execution logic is duplicated across parallel sync and async implementations instead of sharing a single async-first core with a thin sync bridge.
Evidence
praisonaiagents/agent/chat_mixin.py—chat()/_chat_impl()andachat()/_achat_impl()re-implement the same business logic (retrieval, context assembly, hook dispatch, guardrail application, tool invocation) in two places.praisonaiagents/agent/execution_mixin.py—execute_tool()vsexecute_tool_async()maintain parallel code paths for the same concern.praisonaiagents/llm/llm.py—get_response()vsget_response_async()carry overlapping provider dispatch, streaming decisions, retry/error logic, and message building.praisonaiagents/agent/unified_chat_mixin.py(references Core SDK Gap #1: LLM layer monolith with dual execution paths and sync/async duplication #1304) introducesUnifiedLLMDispatcherfor LLM dispatch, but the broader chat / tool-execution surface still has twin sync+async paths.Impact
Proposed Fix (protocol-driven, minimal API)
Finish the unification started in
unified_chat_mixin.pyacross the remaining execution surface:async def _execute(...)method per concern (chat, tool execution, LLM dispatch).asyncio.run(self._execute(...))or userun_coroutine_threadsafewhen already inside an event loop.agent.chat(...)/agent.achat(...),agent.start(...)/agent.astart(...)remain as-is.Aligns with: DRY, protocol-driven core, "3-way feature surface" (CLI/YAML/Python all benefit from a single implementation).
Gap 2: LLM Tool Calls Execute Sequentially — No Parallelism for Batched Tool Calls
Severity: High | Affects: Latency, LLM efficiency, user experience
When the LLM returns multiple tool calls in a single assistant message (OpenAI, Anthropic, Gemini all support this), PraisonAI executes them one after another instead of in parallel.
Evidence
praisonaiagents/llm/llm.py:3305-3323— verified sequential loop:Per-tool timeout infrastructure already exists in
praisonaiagents/agent/tool_execution.py(execute_toolwith timeout handling around lines 196-223), andconcurrent.futuresis imported atagent/agent.py:9andagent/execution_mixin.py:336— but only used for autonomous runs, not for batched tool calls.Note: #1290 addressed parallel handoffs to other agents, which is a separate code path from batched LLM tool calls on a single agent — this gap is not covered there.
Impact
fetch_user,fetch_analytics,fetch_config) at 300 ms each, total latency is ~900 ms instead of ~300 ms.Proposed Fix (opt-in, protocol-driven, backward compatible)
Introduce a
ToolCallExecutorprotocol with two implementations, selected by a singleAgent(parallel_tool_calls=True)flag (default off to preserve current behavior):Replace
llm.py:3305-3323with:Aligns with: performance-first, minimal API (one flag), protocol-driven core, no hot-path regressions (off by default).
Gap 3: Streaming Logic Is Fragmented Across Providers — Not Unified Behind a Protocol
Severity: Medium | Affects: Reliability, provider parity, future maintainability
Streaming capability decisions, error handling, and fallback behavior are scattered as provider-specific conditionals inside the main LLM loop rather than isolated behind a streaming-provider protocol.
Evidence
praisonaiagents/llm/llm.py:~2080-2300contains tangled conditionals (Gemini-specific, Ollama-specific, generic-fallback) deciding whether streaming can proceed.praisonaiagents/llm/adapters/__init__.py:176-182— verified: Anthropic streaming is hard-disabled with a comment explaining a litellm/async-generator bug:Users who request streaming with Anthropic silently fall back to non-streaming with no signal.
praisonaiagents/streaming/events.pydefinesStreamEventtypes, but the actual streaming deltas emitted fromllm.pydon't consistently use this protocol — it's used mostly for trace/observability events.Each provider's streaming error handling lives inline in
llm.pyrather than on the adapter, so adding a new provider requires editing the core loop instead of dropping in an adapter.Impact
stream=Truefor perceived latency.llm.pyinstead of just an adapter — violates protocol-driven principle.StreamEventprotocol is underused, leaving streaming semantics informal.Proposed Fix (complete the existing adapter pattern)
Extend adapters in
praisonaiagents/llm/adapters/with a full streaming interface:Move the Gemini/Ollama/Anthropic-specific branches out of
llm.py:2080-2300and onto each adapter.Standardize streaming deltas on
streaming/events.py::StreamEventso downstream consumers (UI, telemetry, hooks) have a stable contract.When an adapter cannot stream in the requested configuration, emit an observable
StreamUnavailableEventinstead of silent fallback — users can log, warn, or choose a different provider.Aligns with: protocol-driven core, DRY, production-ready (transparent fallback), extensibility (new providers = new adapter only).
Why these three, and not others
Filtered out by scope:
The remaining highest-leverage gaps are: the core execution paths are still duplicated (Gap 1), the hottest performance path loses 2–3x on batched tool calls (Gap 2), and the provider abstraction leaks into the core loop for streaming (Gap 3). Each of these directly violates a stated pillar and each has a concrete protocol-driven fix.
Suggested Priority
UnifiedLLMDispatcher/unified_chat_mixin.py(Core SDK Gap #1: LLM layer monolith with dual execution paths and sync/async duplication #1304) work.Acceptance Criteria (high level)
_chat_impl/_achat_implandexecute_tool/execute_tool_async.Agent(parallel_tool_calls=True)runs batched LLM tool calls concurrently with bounded workers, per-tool timeout, and identical result ordering semantics; default remains sequential.llm.pycontains no provider-name conditionals for streaming; Anthropic streaming either works end-to-end via the adapter or surfaces an explicitStreamUnavailableEvent.agent.start("…")that exercises each fix against an actual LLM, not just object construction.Related