Skip to content

Commit 089fdc5

Browse files
committed
[argus] lead_agent/prompt: add <debugging_when_stuck> block
Smaller models (e.g. Qwen3-27B) tend to apply progressively weaker fixes to the same suspect area when their initial mental model of a bug is wrong, eventually rewriting from scratch and introducing new bugs. Frontier models tend to step back and instrument when the observable result doesn't change between fixes; smaller models often don't have that instinct. The new <debugging_when_stuck> block sits next to <file_editing> inside <working_directory> and gives an explicit decision rule: 1. Two failed fixes that didn't change the observable result is a signal that the model of the bug is wrong, not a reason to keep guessing. 2. Instrument first, fix second — add logging, inspect compile/link status, intermediate values, draw counts. 3. If instrumentation doesn't pinpoint it, reduce the test surface to the simplest version that should still fail and add complexity back one piece at a time. Do not rewrite from scratch as a debugging strategy. Adds a test asserting block presence, contents, and placement after <file_editing> within <working_directory>. PR-candidate: maybe Upstream-issue: none Reason: Like the <file_editing> block, this is opinionated phrasing targeting a specific observed failure mode. Worth proposing upstream after we have data on whether it actually changes agent behaviour on a similar reproduction.
1 parent 184d3c9 commit 089fdc5

2 files changed

Lines changed: 47 additions & 0 deletions

File tree

backend/packages/harness/deerflow/agents/lead_agent/prompt.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -478,6 +478,18 @@ def _build_subagent_section(max_concurrent: int, *, app_config: AppConfig | None
478478
479479
**For deliverables in `/mnt/user-data/outputs`:** write the file once with `write_file`. If you find a bug after writing, use `str_replace` to fix in place. Do not re-run a HEREDOC `cat > ...` to rewrite the whole file.
480480
</file_editing>
481+
<debugging_when_stuck>
482+
**Two failed fixes in a row that don't change the observable result is a signal — your model of the bug is wrong.**
483+
A third blind fix is the most expensive thing you can do: it costs tokens, takes time, and probably won't work either. Stop fixing and start instrumenting.
484+
485+
**Instrument first, fix second.** Add `console.log` / `print` for the values you're assuming.
486+
Inspect program/shader compile status, return codes, intermediate variables, draw counts. Read the new output before the next code change.
487+
The bug is almost always somewhere your assumptions don't reach — you find it by widening the lens, not by tweaking the same area harder.
488+
489+
**If instrumentation doesn't pinpoint it, reduce the test surface.** Replace the complex artifact with the simplest version that should still fail.
490+
If the simple version works, add complexity back one piece at a time until it breaks. The first piece that breaks it is your bug.
491+
**Do not "rewrite from scratch" as a debugging strategy** — rewriting hides the bug rather than finding it, and usually introduces new ones.
492+
</debugging_when_stuck>
481493
</working_directory>
482494
483495
<response_style>

backend/tests/test_lead_agent_prompt.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,41 @@ def test_apply_prompt_template_includes_file_editing_block(monkeypatch):
255255
assert wd_open < fe_open < wd_close
256256

257257

258+
def test_apply_prompt_template_includes_debugging_when_stuck_block(monkeypatch):
259+
"""The <debugging_when_stuck> block must be present in the rendered system
260+
prompt so smaller models that lack the instinct to instrument-when-stuck
261+
have an explicit decision rule to fall back on. Without it, models tend
262+
to apply progressively weaker fixes to the same suspect area instead of
263+
widening the diagnostic lens, eventually rewriting from scratch and
264+
introducing new bugs.
265+
"""
266+
config = SimpleNamespace(
267+
sandbox=SimpleNamespace(mounts=[]),
268+
skills=SimpleNamespace(container_path="/mnt/skills"),
269+
)
270+
monkeypatch.setattr("deerflow.config.get_app_config", lambda: config)
271+
monkeypatch.setattr(prompt_module, "_get_enabled_skills", lambda: [])
272+
monkeypatch.setattr(prompt_module, "get_deferred_tools_prompt_section", lambda: "")
273+
monkeypatch.setattr(prompt_module, "_build_acp_section", lambda: "")
274+
monkeypatch.setattr(prompt_module, "_get_memory_context", lambda agent_name=None: "")
275+
monkeypatch.setattr(prompt_module, "get_agent_soul", lambda agent_name=None: "")
276+
277+
prompt = prompt_module.apply_prompt_template()
278+
279+
assert "<debugging_when_stuck>" in prompt
280+
assert "</debugging_when_stuck>" in prompt
281+
# Three core principles must all be present
282+
assert "Two failed fixes in a row" in prompt
283+
assert "Instrument first, fix second" in prompt
284+
assert "reduce the test surface" in prompt
285+
# The block must sit inside <working_directory> and after <file_editing>
286+
wd_open = prompt.index("<working_directory")
287+
wd_close = prompt.index("</working_directory>")
288+
fe_close = prompt.index("</file_editing>")
289+
dws_open = prompt.index("<debugging_when_stuck>")
290+
assert wd_open < fe_close < dws_open < wd_close
291+
292+
258293
def test_refresh_skills_system_prompt_cache_async_reloads_immediately(monkeypatch, tmp_path):
259294
def make_skill(name: str) -> Skill:
260295
skill_dir = tmp_path / name

0 commit comments

Comments
 (0)