88 * hard limits / access denials retried instead of routed around (#175)
99 * an unreachable MCP server looped on health checks (#176)
1010 * spirals that eventually hit the max-iteration abort (#143)
11+ * mono-tool spirals where the agent fixates on ONE tool category (#432)
1112
1213Mechanism (deliberately conservative — advisory, never blocking):
1314inspect the most recent CONSECUTIVE assistant tool-call turns. If the SAME tool
1718to stop, re-check the goal, and change strategy. A real loop is broken; a rare
1819false positive costs one advisory message.
1920
20- Pure functions over the `messages` list → fully unit-testable, no agent state
21+ Tools are split into two categories for thresholding:
22+ - Mutating tools (terminal, write_file, patch, execute_code, etc.) get LOWER
23+ thresholds because a fixation on these is more costly and the model should
24+ be stopped sooner (#432).
25+ - Idempotent tools (read_file, search_files, web_search, etc.) use the default
26+ higher thresholds since re-reading data is less harmful and sometimes needed.
27+
28+ At higher call counts, the nudge escalates from advisory to a DIRECTIVE that
29+ requires the model to explain progress before continuing (#432).
30+
31+ Pure functions over the ``messages`` list -> fully unit-testable, no agent state
2132required (the caller tracks "already nudged this run" to avoid spamming).
2233"""
2334
5869_NON_RETRYABLE = frozenset ({"timeout" , "permission" , "missing_command" , "limit" })
5970_NONRETRY_THRESHOLD = 2
6071
72+ # Mutating tools get LOWER thresholds than idempotent tools because a fixation
73+ # on mutating operations (writing files, running commands) is more costly and
74+ # indicates a deeper strategy problem (#432).
75+ _IDEMPOTENT_TOOLS = frozenset (
76+ {
77+ "read_file" ,
78+ "search_files" ,
79+ "web_search" ,
80+ "web_extract" ,
81+ "session_search" ,
82+ "browser_snapshot" ,
83+ "browser_console" ,
84+ "browser_get_images" ,
85+ "mcp_filesystem_read_file" ,
86+ "mcp_filesystem_read_text_file" ,
87+ "mcp_filesystem_read_multiple_files" ,
88+ "mcp_filesystem_list_directory" ,
89+ "mcp_filesystem_list_directory_with_sizes" ,
90+ "mcp_filesystem_directory_tree" ,
91+ "mcp_filesystem_get_file_info" ,
92+ "mcp_filesystem_search_files" ,
93+ }
94+ )
95+ _MUTATING_TOOLS = frozenset (
96+ {
97+ "terminal" ,
98+ "execute_code" ,
99+ "write_file" ,
100+ "patch" ,
101+ "todo" ,
102+ "memory" ,
103+ "skill_manage" ,
104+ "browser_click" ,
105+ "browser_type" ,
106+ "browser_press" ,
107+ "browser_scroll" ,
108+ "browser_navigate" ,
109+ "send_message" ,
110+ "cronjob" ,
111+ "delegate_task" ,
112+ "process" ,
113+ }
114+ )
115+ # Default thresholds: lower for mutating tools, higher for idempotent (#432).
116+ # Mutating: repeat at 4, fail at 2, escalate at 8
117+ # Idempotent: repeat at 8, fail at 4, escalate at 15
118+ _MUTATING_REPEAT_THRESHOLD = 4
119+ _IDEMPOTENT_REPEAT_THRESHOLD = 8
120+ _MUTATING_FAIL_THRESHOLD = 2
121+ _IDEMPOTENT_FAIL_THRESHOLD = 4
122+ _MUTATING_ESCALATE_THRESHOLD = 8
123+ _IDEMPOTENT_ESCALATE_THRESHOLD = 15
124+
61125
62126def _failure_category (content : Any ) -> Optional [str ]:
63127 """The tool_diagnostics failure class of a result, or None if not a failure.
@@ -126,22 +190,72 @@ def _recent_tool_runs(messages: List[Dict[str, Any]]) -> List[Tuple[str, bool, O
126190 return runs
127191
128192
193+ def _tool_category (tool_name : str ) -> str :
194+ """Return 'mutating', 'idempotent', or 'unknown' for a tool name."""
195+ if tool_name in _MUTATING_TOOLS :
196+ return "mutating"
197+ if tool_name in _IDEMPOTENT_TOOLS :
198+ return "idempotent"
199+ return "unknown"
200+
201+
202+ def _tool_spiral_score (tool_name : str , count : int , base : int ) -> Optional [str ]:
203+ """Compute a diversity-awareness score for the nudge message.
204+
205+ Returns a one-line annotation like 'spiral-index: 5' when the number of
206+ consecutive calls is meaningfully above the base threshold, or None for
207+ short runs.
208+ """
209+ if count <= base :
210+ return None
211+ excess = count - base
212+ intensity = min (excess // 2 , 5 ) # cap at 5 for readability
213+ if intensity >= 2 :
214+ return f"spiral-intensity: { intensity } of 5"
215+ return None
216+
217+
129218def maybe_nudge (
130219 messages : List [Dict [str , Any ]],
131220 * ,
132- repeat_threshold : int = 6 ,
133- fail_threshold : int = 3 ,
221+ repeat_threshold : Optional [ int ] = None ,
222+ fail_threshold : Optional [ int ] = None ,
134223) -> Optional [str ]:
135224 """Return a nudge string if the trailing single-tool run is stuck, else None.
136225
137- Two triggers (failure takes precedence — it's the higher-signal one):
138- * the same tool's last `fail_threshold` results all look like failures
139- * the same tool was called `repeat_threshold`+ times in a row
226+ Three trigger levels (each is lower for mutating tools than idempotent):
227+ 1. Non-retryable failure class repeated twice (highest priority, #231)
228+ 2. Generic failures >= fail_threshold
229+ 3. Same tool called >= repeat_threshold times in a row
230+ 4. Escalated interrupt at higher counts (#432)
231+
232+ Returns None when the agent is making varied progress (not stuck).
140233 """
141234 runs = _recent_tool_runs (messages )
142235 if not runs :
143236 return None
144237 tool = runs [0 ][0 ]
238+
239+ # Pick thresholds based on tool category (#432).
240+ # Unknown tools get mutating thresholds as the safer default.
241+ cat = _tool_category (tool )
242+ is_mutating = cat == "mutating"
243+ is_unknown = cat == "unknown"
244+ if repeat_threshold is None :
245+ repeat_threshold = (
246+ _MUTATING_REPEAT_THRESHOLD if (is_mutating or is_unknown )
247+ else _IDEMPOTENT_REPEAT_THRESHOLD
248+ )
249+ if fail_threshold is None :
250+ fail_threshold = (
251+ _MUTATING_FAIL_THRESHOLD if (is_mutating or is_unknown )
252+ else _IDEMPOTENT_FAIL_THRESHOLD
253+ )
254+ escalate_threshold = (
255+ _MUTATING_ESCALATE_THRESHOLD if (is_mutating or is_unknown )
256+ else _IDEMPOTENT_ESCALATE_THRESHOLD
257+ )
258+
145259 # All entries in `runs` share the same tool (run breaks on tool change),
146260 # but guard anyway:
147261 same = [r for r in runs if r [0 ] == tool ]
@@ -165,6 +279,14 @@ def maybe_nudge(
165279 else :
166280 counting_nonretry = False
167281
282+ # Category label for nudge messages.
283+ if is_mutating :
284+ cat_label = "mutating"
285+ elif is_unknown :
286+ cat_label = "unknown"
287+ else :
288+ cat_label = "idempotent"
289+
168290 # Highest-priority: a DETERMINISTIC failure repeated even once (#231). These
169291 # reproduce on a near-identical retry, so the generic 3-strike threshold is
170292 # too lenient — two in a row is already a spiral (terminal timeouts, denied
@@ -181,21 +303,41 @@ def maybe_nudge(
181303
182304 if consec_fail >= fail_threshold :
183305 return (
184- f"[loop-guard] The `{ tool } ` tool has failed { consec_fail } times in a "
185- f"row with the same approach. STOP repeating it. Diagnose the actual "
186- f"blocker first (check prerequisites / environment / the exact error "
187- f"class), then either switch to a different tool or strategy, or — if "
188- f"the blocker can't be resolved — report it concisely instead of "
189- f"retrying. Do not call `{ tool } ` again the same way."
306+ f"[loop-guard] The `{ tool } ` tool ({ cat_label } ) has failed "
307+ f"{ consec_fail } times in a row with the same approach. STOP repeating "
308+ f"it. Diagnose the actual blocker first (check prerequisites / "
309+ f"environment / the exact error class), then either switch to a "
310+ f"different tool or strategy, or — if the blocker can't be resolved "
311+ f"— report it concisely instead of retrying. Do not call `{ tool } ` "
312+ f"again the same way."
190313 )
314+
191315 if count >= repeat_threshold :
316+ # Build diversity score for the nudge.
317+ score = _tool_spiral_score (tool , count , repeat_threshold )
318+ score_line = f"\n { score } " if score else ""
319+
320+ if count >= escalate_threshold :
321+ return (
322+ f"[loop-guard] You have called `{ tool } ` ({ cat_label } ) { count } "
323+ f"times in a row without resolving the task.{ score_line } \n "
324+ f"⚠️ ESCALATED INTERRUPT: This is a deep mono-tool spiral. "
325+ f"PAUSE and summarize in one paragraph the concrete progress "
326+ f"these { count } calls have made toward the goal. If no measurable "
327+ f"progress exists, state the actual blocker explicitly and "
328+ f"propose a fundamentally different strategy — do NOT call "
329+ f"`{ tool } ` again until you have provided this summary."
330+ )
331+
192332 return (
193- f"[loop-guard] You have called `{ tool } ` { count } times in a row without "
194- f"resolving the task. Pause and re-read the goal: what concrete "
195- f"progress have these calls made? Check your plan/success criterion, "
196- f"then either change strategy, move to the next step, or report the "
197- f"blocker. Avoid another near-identical `{ tool } ` call."
333+ f"[loop-guard] You have called `{ tool } ` ({ cat_label } ) { count } times "
334+ f"in a row without resolving the task.{ score_line } Pause and re-read "
335+ f"the goal: what concrete progress have these calls made? Check your "
336+ f"plan/success criterion, then either change strategy, move to the "
337+ f"next step, or report the blocker. Avoid another near-identical "
338+ f"`{ tool } ` call."
198339 )
340+
199341 return None
200342
201343
0 commit comments