Problem
Deep Agent's task tool serializes multi-tool-call turns. When a Project Manager LLM emits N task(...) calls in a single assistant turn (idiomatic LangChain/LangGraph fan-out), the engine processes them one-by-one rather than concurrently.
Tracer evidence from a 28-minute coding-agent run: three engineer subagents marked for parallel execution by the PM ran strictly sequentially. op:enter/op:leave windows for agent_deepagent_subagent_eng1/2/3 did not overlap; wall time was eng1 + eng2 + eng3 instead of max(eng1, eng2, eng3). Estimated savings on that run: ~350 s (~20% of total).
Root cause
# nodes/agent_deepagent/deepagent.py:357-360
state = agent.invoke(
{'messages': [HumanMessage(content=_safe_str(question.getPrompt() or ''))]},
config={'callbacks': [_SSECallbackHandler(_send_sse)]},
)
agent.invoke() uses LangGraph's default tool-execution node, which iterates tool calls sequentially. The PM system prompt can tell the LLM to "issue all in a single message" but the engine still serializes them.
Proposed fix
Replace the default tool-execution node with one that fans out via asyncio.gather across tool_calls, or use LangGraph's Send API to spawn parallel branches per tool call. The existing _SSECallbackHandler is already async-safe; the SSE stream just needs per-task tagging so the UI doesn't interleave incoherently.
Acceptance
- Multi-
task turns from a Deep Agent show overlapping op:enter windows in apaevt_flow traces.
- Single-
task and other tools (shell/fs/git) are unaffected (they remain sequential by design).
- A regression test covering "PM emits 3
task calls" asserts wall time ≈ max-branch instead of sum-of-branches (with a tolerance).
Suggested labels
enhancement, performance, nodes/agent_deepagent
Problem
Deep Agent's
tasktool serializes multi-tool-call turns. When a Project Manager LLM emits Ntask(...)calls in a single assistant turn (idiomatic LangChain/LangGraph fan-out), the engine processes them one-by-one rather than concurrently.Tracer evidence from a 28-minute coding-agent run: three engineer subagents marked for parallel execution by the PM ran strictly sequentially.
op:enter/op:leavewindows foragent_deepagent_subagent_eng1/2/3did not overlap; wall time waseng1 + eng2 + eng3instead ofmax(eng1, eng2, eng3). Estimated savings on that run: ~350 s (~20% of total).Root cause
agent.invoke()uses LangGraph's default tool-execution node, which iterates tool calls sequentially. The PM system prompt can tell the LLM to "issue all in a single message" but the engine still serializes them.Proposed fix
Replace the default tool-execution node with one that fans out via
asyncio.gatheracrosstool_calls, or use LangGraph'sSendAPI to spawn parallel branches per tool call. The existing_SSECallbackHandleris already async-safe; the SSE stream just needs per-task tagging so the UI doesn't interleave incoherently.Acceptance
taskturns from a Deep Agent show overlappingop:enterwindows inapaevt_flowtraces.taskand other tools (shell/fs/git) are unaffected (they remain sequential by design).taskcalls" asserts wall time ≈ max-branch instead of sum-of-branches (with a tolerance).Suggested labels
enhancement,performance,nodes/agent_deepagent