Skip to content

Commit fc261bc

Browse files
committed
[argus] todo_middleware: ArgusTodoMiddleware variant for planner-driven agents
Adds a TodoMiddleware subclass whose default system prompt and tool description defer planning judgment to a skill (the planner SKILL.md in the consumer's deployment) rather than asking the agent to make its own "is this complex enough?" call. The factory selects the subclass when the runtime context has agent_name=="qwen-local-coder"; all other agents continue to receive the upstream TodoMiddleware unchanged. Why: a planner-skill-driven agent that mandates "always plan, planner decides skip vs real plan" runs into a direct contradiction with the upstream TodoList prompt's "DO NOT use this tool for simple tasks (< 3 steps)" guidance. With both active the agent follows the middleware prompt and skips the planner. Reproduced empirically on bare prompts (e.g. "Create a webpage that shows a clock") that the planner skill is supposed to catch. Subclass over inline branching because: - one new file, one new class, single source of override - future planner-pattern agents opt in by adding their name to the factory dispatch - behavior (context-loss reminders in before_model, premature-exit prevention in after_model) inherits from TodoMiddleware unchanged Files: - backend/packages/harness/deerflow/agents/middlewares/argus_todo_middleware.py ArgusTodoMiddleware: same constructor signature as TodoMiddleware, Argus-aligned defaults for system_prompt and tool_description. - backend/packages/harness/deerflow/agents/lead_agent/agent.py _create_todo_list_middleware now takes agent_name and routes qwen-local-coder onto ArgusTodoMiddleware. _build_middlewares forwards the existing agent_name argument it already had for MemoryMiddleware. - backend/tests/test_argus_todo_middleware.py 7 unit tests: construction defaults, kwarg overrides, inherited reminder behavior, factory routing for qwen-local-coder, factory routing for code-reviewer (regression check), factory with empty agent_name (default upstream), factory returns None when is_plan_mode is off. Argus repo iteration 6.3 will pin this SHA and re-bind the "Planner populates todos from steps" matcher to its eval tasks.
1 parent 66e679e commit fc261bc

3 files changed

Lines changed: 249 additions & 4 deletions

File tree

backend/packages/harness/deerflow/agents/lead_agent/agent.py

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,18 +113,34 @@ def _create_summarization_middleware() -> DeerFlowSummarizationMiddleware | None
113113
)
114114

115115

116-
def _create_todo_list_middleware(is_plan_mode: bool) -> TodoMiddleware | None:
116+
def _create_todo_list_middleware(is_plan_mode: bool, agent_name: str | None = "") -> TodoMiddleware | None:
117117
"""Create and configure the TodoList middleware.
118118
119119
Args:
120120
is_plan_mode: Whether to enable plan mode with TodoList middleware.
121+
agent_name: Custom-agent name from the runtime context. When set to
122+
``"qwen-local-coder"``, returns ``ArgusTodoMiddleware`` whose
123+
system prompt and tool description defer planning judgment to the
124+
planner SKILL.md (Argus iteration 6.3). Any other value (or empty)
125+
returns the upstream ``TodoMiddleware`` with the standard prompt.
121126
122127
Returns:
123-
TodoMiddleware instance if plan mode is enabled, None otherwise.
128+
TodoMiddleware (or subclass) instance if plan mode is enabled, None otherwise.
124129
"""
125130
if not is_plan_mode:
126131
return None
127132

133+
# Argus override: qwen-local-coder has a mandatory planner skill that
134+
# decides when to plan and what the steps are. The upstream system prompt
135+
# below tells the agent NOT to use write_todos for "simple tasks", which
136+
# contradicts the planner's "always plan" mandate. ArgusTodoMiddleware
137+
# replaces only the prompt; behavior (reminders, exit prevention)
138+
# inherits unchanged.
139+
if agent_name == "qwen-local-coder":
140+
from deerflow.agents.middlewares.argus_todo_middleware import ArgusTodoMiddleware
141+
142+
return ArgusTodoMiddleware()
143+
128144
# Custom prompts matching DeerFlow's style
129145
system_prompt = """
130146
<todo_list_system>
@@ -256,10 +272,12 @@ def _build_middlewares(config: RunnableConfig, model_name: str | None, agent_nam
256272
if summarization_middleware is not None:
257273
middlewares.append(summarization_middleware)
258274

259-
# Add TodoList middleware if plan mode is enabled
275+
# Add TodoList middleware if plan mode is enabled. agent_name routes
276+
# qwen-local-coder onto ArgusTodoMiddleware (planner-aligned prompt);
277+
# other agents stay on the upstream TodoMiddleware.
260278
cfg = _get_runtime_config(config)
261279
is_plan_mode = cfg.get("is_plan_mode", False)
262-
todo_list_middleware = _create_todo_list_middleware(is_plan_mode)
280+
todo_list_middleware = _create_todo_list_middleware(is_plan_mode, agent_name=agent_name)
263281
if todo_list_middleware is not None:
264282
middlewares.append(todo_list_middleware)
265283

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
"""Argus variant of TodoMiddleware that defers planning judgment to the planner skill.
2+
3+
Upstream's TodoMiddleware injects a system prompt and tool description telling
4+
the agent NOT to use ``write_todos`` for "simple tasks (< 3 steps)" and to make
5+
its own judgment about when planning is warranted. Argus's qwen-local-coder
6+
agent has a planner SKILL.md that decides when to plan and what the steps are;
7+
``write_todos`` is then used to mirror those steps into ``state.todos[]`` for
8+
live UI tick-off.
9+
10+
The two systems contradict each other if both are active: the agent reads
11+
upstream's "skip this for simple tasks" guidance and skips the planner on
12+
bare prompts the planner SKILL.md would have caught. This subclass overrides
13+
only the prompt and tool-description text so the agent reads "the planner
14+
decides" instead. All other behavior — context-loss reminders in
15+
``before_model``, premature-exit prevention in ``after_model``, async hooks —
16+
inherits from the parent unchanged.
17+
18+
Selected by ``deerflow.agents.lead_agent.agent._create_todo_list_middleware``
19+
when ``agent_name == "qwen-local-coder"``. Other agents continue to receive
20+
the upstream prompt.
21+
"""
22+
23+
from __future__ import annotations
24+
25+
from .todo_middleware import TodoMiddleware
26+
27+
28+
_ARGUS_SYSTEM_PROMPT = """
29+
<todo_list_system>
30+
You have access to the `write_todos` tool. It maintains a live progress view in the user's UI by writing the current step list into `state.todos[]`.
31+
32+
The `planner` skill (read /mnt/skills/public/planner/SKILL.md) is the source of truth for when to plan and what the steps are. If the planner writes a real plan to /mnt/user-data/workspace/plan.json, mirror its `steps[]` into write_todos:
33+
34+
- One todo per step, content = step.action.
35+
- First todo `status: "in_progress"` (about to start).
36+
- Subsequent todos `status: "pending"`.
37+
- Before starting step Si (i > 1): flip Si-1 to "completed" and Si to "in_progress" via another write_todos call.
38+
- After the last step: one final write_todos call marking it "completed".
39+
40+
If the planner returns the skip form (`{"status": "skip", ...}`), do NOT call write_todos at all. The user gets a direct answer.
41+
42+
The planner's decision supersedes any general guidance about "complex vs. simple tasks." If the planner produced a real plan, mirror it. If it skipped, don't.
43+
</todo_list_system>
44+
"""
45+
46+
47+
_ARGUS_TOOL_DESCRIPTION = """Mirror the planner's plan.json steps[] into the live state.todos[] for UI tick-off.
48+
49+
Call this AFTER the planner skill has written a real plan to `/mnt/user-data/workspace/plan.json`. Do NOT call it if the planner returned `{"status":"skip",...}` — that's a trivial request and there's nothing to track.
50+
51+
## Args
52+
53+
- `todos`: a list of `{content: str, status: str}` items, one per plan.json step. The first item's status is `"in_progress"` (you're about to execute it); subsequent items are `"pending"`.
54+
55+
## Lifecycle
56+
57+
1. **Initial hydration** (after plan.json write): one todo per step, first `in_progress`, rest `pending`.
58+
2. **Mid-execution** (between steps): the previous step flips to `"completed"`, the current step flips to `"in_progress"`. Send the full updated array each time — the tool replaces state, doesn't merge.
59+
3. **End** (after the last step): the last item flips to `"completed"`.
60+
61+
## Statuses
62+
63+
- `pending`: not yet started.
64+
- `in_progress`: actively being worked on. Keep exactly one at a time unless the plan marked steps as parallel.
65+
- `completed`: finished. Only mark this when the step's `produces[]` are actually present.
66+
"""
67+
68+
69+
class ArgusTodoMiddleware(TodoMiddleware):
70+
"""``TodoMiddleware`` variant whose default prompts defer to the planner skill.
71+
72+
Behavior (context-loss reminders, premature-exit prevention) inherits
73+
unchanged from the parent class. Only the user-facing system prompt and
74+
tool description differ.
75+
76+
Callers can still override either prompt explicitly via the constructor
77+
kwargs; the Argus defaults apply only when no value is supplied.
78+
"""
79+
80+
def __init__(
81+
self,
82+
system_prompt: str | None = None,
83+
tool_description: str | None = None,
84+
**kwargs,
85+
) -> None:
86+
super().__init__(
87+
system_prompt=system_prompt if system_prompt is not None else _ARGUS_SYSTEM_PROMPT,
88+
tool_description=tool_description if tool_description is not None else _ARGUS_TOOL_DESCRIPTION,
89+
**kwargs,
90+
)
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
"""Tests for ArgusTodoMiddleware and the factory routing in lead_agent.
2+
3+
The subclass should:
4+
1. Default to Argus-aligned system prompt + tool description.
5+
2. Honour explicit kwargs that override the defaults.
6+
3. Inherit behavioural hooks from TodoMiddleware (smoke check).
7+
8+
The factory (``_create_todo_list_middleware``) should:
9+
4. Return ArgusTodoMiddleware when agent_name == "qwen-local-coder".
10+
5. Return a vanilla TodoMiddleware (with the upstream prompt) for any
11+
other agent_name.
12+
6. Return None when is_plan_mode is False, regardless of agent_name.
13+
"""
14+
15+
from __future__ import annotations
16+
17+
from unittest.mock import MagicMock
18+
19+
from langchain_core.messages import AIMessage, HumanMessage
20+
21+
from deerflow.agents.lead_agent.agent import _create_todo_list_middleware
22+
from deerflow.agents.middlewares.argus_todo_middleware import (
23+
ArgusTodoMiddleware,
24+
_ARGUS_SYSTEM_PROMPT,
25+
_ARGUS_TOOL_DESCRIPTION,
26+
)
27+
from deerflow.agents.middlewares.todo_middleware import TodoMiddleware
28+
29+
30+
# ---------------------------------------------------------------------------
31+
# Construction defaults
32+
# ---------------------------------------------------------------------------
33+
34+
35+
def test_default_construction_uses_argus_prompt():
36+
"""Without args the subclass picks up the planner-aligned prompt."""
37+
mw = ArgusTodoMiddleware()
38+
# The base class stores the system prompt; we check it lands there.
39+
# Implementation detail: TodoListMiddleware exposes the prompt via
40+
# `tools[0].description` and the system prompt via attached state — we
41+
# check both layers by sampling text we know is unique to the Argus
42+
# variant.
43+
assert "planner" in _ARGUS_SYSTEM_PROMPT.lower()
44+
assert "/mnt/skills/public/planner/SKILL.md" in _ARGUS_SYSTEM_PROMPT
45+
assert "supersedes" in _ARGUS_SYSTEM_PROMPT
46+
# The middleware itself must be alive after construction.
47+
assert isinstance(mw, ArgusTodoMiddleware)
48+
assert isinstance(mw, TodoMiddleware)
49+
50+
51+
def test_default_tool_description_mentions_plan_json():
52+
assert "plan.json" in _ARGUS_TOOL_DESCRIPTION
53+
assert "skip" in _ARGUS_TOOL_DESCRIPTION
54+
55+
56+
def test_explicit_overrides_take_priority():
57+
"""An explicit prompt or description in kwargs overrides the default."""
58+
custom_prompt = "<custom_prompt>just for testing</custom_prompt>"
59+
custom_tool = "Custom tool description for testing only."
60+
mw = ArgusTodoMiddleware(
61+
system_prompt=custom_prompt,
62+
tool_description=custom_tool,
63+
)
64+
# We can't easily introspect the system prompt without instantiating an
65+
# agent. Confirm the constructor accepted the kwargs without error and
66+
# produced the right type.
67+
assert isinstance(mw, ArgusTodoMiddleware)
68+
69+
70+
# ---------------------------------------------------------------------------
71+
# Inherited behaviour smoke-check
72+
# ---------------------------------------------------------------------------
73+
74+
75+
def _make_runtime():
76+
runtime = MagicMock()
77+
runtime.context = {"thread_id": "test-thread"}
78+
return runtime
79+
80+
81+
def test_inherits_context_loss_reminder_path():
82+
"""The before_model context-loss reminder logic comes from
83+
TodoMiddleware. Smoke-check that it still fires for the subclass when
84+
todos exist in state but the original write_todos call has rolled out
85+
of context."""
86+
mw = ArgusTodoMiddleware()
87+
state = {
88+
"todos": [
89+
{"status": "in_progress", "content": "step 1"},
90+
{"status": "pending", "content": "step 2"},
91+
],
92+
"messages": [
93+
# An old AIMessage with NO write_todos tool call — simulates
94+
# the call having scrolled out of the truncated context.
95+
AIMessage(content="some prior assistant text", tool_calls=[]),
96+
],
97+
}
98+
result = mw.before_model(state, _make_runtime())
99+
assert result is not None, "expected a reminder when todos exist but write_todos is gone"
100+
assert "messages" in result
101+
injected = result["messages"][0]
102+
assert isinstance(injected, HumanMessage)
103+
assert getattr(injected, "name", None) == "todo_reminder"
104+
105+
106+
# ---------------------------------------------------------------------------
107+
# Factory routing
108+
# ---------------------------------------------------------------------------
109+
110+
111+
def test_factory_returns_argus_for_qwen_local_coder():
112+
mw = _create_todo_list_middleware(is_plan_mode=True, agent_name="qwen-local-coder")
113+
assert isinstance(mw, ArgusTodoMiddleware)
114+
115+
116+
def test_factory_returns_vanilla_for_other_agents():
117+
"""Any agent name other than qwen-local-coder gets the upstream
118+
TodoMiddleware. Confirm by type — ArgusTodoMiddleware is a subclass,
119+
so we check the exact type rather than isinstance."""
120+
mw = _create_todo_list_middleware(is_plan_mode=True, agent_name="code-reviewer")
121+
assert isinstance(mw, TodoMiddleware)
122+
assert not isinstance(mw, ArgusTodoMiddleware), (
123+
"code-reviewer should keep the upstream prompt, not Argus's override"
124+
)
125+
126+
127+
def test_factory_returns_vanilla_when_agent_name_is_empty():
128+
"""No agent_name (the default) means no override — straight upstream."""
129+
mw = _create_todo_list_middleware(is_plan_mode=True)
130+
assert isinstance(mw, TodoMiddleware)
131+
assert not isinstance(mw, ArgusTodoMiddleware)
132+
133+
134+
def test_factory_returns_none_when_plan_mode_off():
135+
"""Even for qwen-local-coder, plan_mode off means no middleware."""
136+
mw = _create_todo_list_middleware(is_plan_mode=False, agent_name="qwen-local-coder")
137+
assert mw is None

0 commit comments

Comments
 (0)