Skip to content

Commit ef623b5

Browse files
committed
[argus] lead_agent/prompt: add <debugging_when_stuck> block
Smaller models (e.g. Qwen3-27B) tend to apply progressively weaker fixes to the same suspect area when their initial mental model of a bug is wrong, eventually rewriting from scratch and introducing new bugs. Frontier models tend to step back and instrument when the observable result doesn't change between fixes; smaller models often don't have that instinct. The new <debugging_when_stuck> block sits next to <file_editing> inside <working_directory> and gives an explicit decision rule: 1. Two failed fixes that didn't change the observable result is a signal that the model of the bug is wrong, not a reason to keep guessing. 2. Instrument first, fix second — add logging, inspect compile/link status, intermediate values, draw counts. 3. If instrumentation doesn't pinpoint it, reduce the test surface to the simplest version that should still fail and add complexity back one piece at a time. Do not rewrite from scratch as a debugging strategy. Adds a test asserting block presence, contents, and placement after <file_editing> within <working_directory>. PR-candidate: maybe Upstream-issue: none Reason: Like the <file_editing> block, this is opinionated phrasing targeting a specific observed failure mode. Worth proposing upstream after we have data on whether it actually changes agent behaviour on a similar reproduction.
1 parent 78467e5 commit ef623b5

2 files changed

Lines changed: 47 additions & 0 deletions

File tree

backend/packages/harness/deerflow/agents/lead_agent/prompt.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,18 @@ def _build_subagent_section(max_concurrent: int) -> str:
463463
464464
**For deliverables in `/mnt/user-data/outputs`:** write the file once with `write_file`. If you find a bug after writing, use `str_replace` to fix in place. Do not re-run a HEREDOC `cat > ...` to rewrite the whole file.
465465
</file_editing>
466+
<debugging_when_stuck>
467+
**Two failed fixes in a row that don't change the observable result is a signal — your model of the bug is wrong.**
468+
A third blind fix is the most expensive thing you can do: it costs tokens, takes time, and probably won't work either. Stop fixing and start instrumenting.
469+
470+
**Instrument first, fix second.** Add `console.log` / `print` for the values you're assuming.
471+
Inspect program/shader compile status, return codes, intermediate variables, draw counts. Read the new output before the next code change.
472+
The bug is almost always somewhere your assumptions don't reach — you find it by widening the lens, not by tweaking the same area harder.
473+
474+
**If instrumentation doesn't pinpoint it, reduce the test surface.** Replace the complex artifact with the simplest version that should still fail.
475+
If the simple version works, add complexity back one piece at a time until it breaks. The first piece that breaks it is your bug.
476+
**Do not "rewrite from scratch" as a debugging strategy** — rewriting hides the bug rather than finding it, and usually introduces new ones.
477+
</debugging_when_stuck>
466478
</working_directory>
467479
468480
<response_style>

backend/tests/test_lead_agent_prompt.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,41 @@ def test_apply_prompt_template_includes_file_editing_block(monkeypatch):
100100
assert wd_open < fe_open < wd_close
101101

102102

103+
def test_apply_prompt_template_includes_debugging_when_stuck_block(monkeypatch):
104+
"""The <debugging_when_stuck> block must be present in the rendered system
105+
prompt so smaller models that lack the instinct to instrument-when-stuck
106+
have an explicit decision rule to fall back on. Without it, models tend
107+
to apply progressively weaker fixes to the same suspect area instead of
108+
widening the diagnostic lens, eventually rewriting from scratch and
109+
introducing new bugs.
110+
"""
111+
config = SimpleNamespace(
112+
sandbox=SimpleNamespace(mounts=[]),
113+
skills=SimpleNamespace(container_path="/mnt/skills"),
114+
)
115+
monkeypatch.setattr("deerflow.config.get_app_config", lambda: config)
116+
monkeypatch.setattr(prompt_module, "_get_enabled_skills", lambda: [])
117+
monkeypatch.setattr(prompt_module, "get_deferred_tools_prompt_section", lambda: "")
118+
monkeypatch.setattr(prompt_module, "_build_acp_section", lambda: "")
119+
monkeypatch.setattr(prompt_module, "_get_memory_context", lambda agent_name=None: "")
120+
monkeypatch.setattr(prompt_module, "get_agent_soul", lambda agent_name=None: "")
121+
122+
prompt = prompt_module.apply_prompt_template()
123+
124+
assert "<debugging_when_stuck>" in prompt
125+
assert "</debugging_when_stuck>" in prompt
126+
# Three core principles must all be present
127+
assert "Two failed fixes in a row" in prompt
128+
assert "Instrument first, fix second" in prompt
129+
assert "reduce the test surface" in prompt
130+
# The block must sit inside <working_directory> and after <file_editing>
131+
wd_open = prompt.index("<working_directory")
132+
wd_close = prompt.index("</working_directory>")
133+
fe_close = prompt.index("</file_editing>")
134+
dws_open = prompt.index("<debugging_when_stuck>")
135+
assert wd_open < fe_close < dws_open < wd_close
136+
137+
103138
def test_refresh_skills_system_prompt_cache_async_reloads_immediately(monkeypatch, tmp_path):
104139
def make_skill(name: str) -> Skill:
105140
skill_dir = tmp_path / name

0 commit comments

Comments
 (0)