feat: add opt-in loop detection for repetitive tool calls#219
Closed
sheeki03 wants to merge 9 commits intoGitHubSecurityLab:mainfrom
Closed
feat: add opt-in loop detection for repetitive tool calls#219sheeki03 wants to merge 9 commits intoGitHubSecurityLab:mainfrom
sheeki03 wants to merge 9 commits intoGitHubSecurityLab:mainfrom
Conversation
Adds _tool_call_counter, _auto_save_interval, _auto_save_dir, _write_auto_save, and _read_tool_log to run_main. When AUTO_SAVE_DIR and AUTO_SAVE_INTERVAL are set, tool results are periodically appended to an NDJSON log file. Disabled by default (interval=0).
Moves write_auto_save() and read_tool_log() from closures inside run_main() to module-level functions with explicit parameters. Tests now exercise the real implementation instead of duplicating the logic.
- Add encoding="utf-8" to open() in write_auto_save and read_tool_log - Catch ValueError on non-numeric AUTO_SAVE_INTERVAL with fallback to 0 - Soften docstring from "crash-safe" to "append-only"
Tracks consecutive same-tool-name calls in on_tool_end_hook. When the count reaches the threshold, raises LoopDetectedError to abort the task. Configured per-task via max_consecutive_same_tool or globally via LOOP_MAX_CONSECUTIVE env var. Disabled by default. Skipped for async tasks. Counter resets between prompts in repeat_prompt mode.
…_loop Moves the consecutive-tool tracking logic from the on_tool_end_hook closure into a module-level function with explicit state parameters. Tests now exercise the real implementation. Adds tests for negative threshold, counter reset on tool change, and state mutation visibility.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During a 45-minute security audit, the agent entered a loop of 60+ consecutive
search_codecalls with no progress. There was no mechanism to detect or abort this — the task ran untilmax_turnswas exhausted, wasting API credits and time.Depends on: #216 (auto-save scaffolding)
Changes
Loop detection engine (
runner.py)check_consecutive_tool_loop()— Module-level function that tracks consecutive same-tool-name calls. When the count reaches the threshold, raisesLoopDetectedError. Extracted as a testable function with explicit state parameters;on_tool_end_hookdelegates to it.LoopDetectedError— New exception withtool_nameandcountattributes. Caught in the retry loop as non-retryable (immediately breaks, saves session, re-raises).Behavior:
repeat_promptmode (prevents false positives when prompt B's first tool matches prompt A's last)async_task=True(concurrent tool calls make consecutive counting meaningless)Configuration
Per-task via
max_consecutive_same_toolfield, or globally viaLOOP_MAX_CONSECUTIVEenv var:nullLOOP_MAX_CONSECUTIVEenv; if unset or0, disabled0Invalid (non-numeric)
LOOP_MAX_CONSECUTIVEvalues are caught and fall back to 0 with a warning, matchingAUTO_SAVE_INTERVALbehavior.Model (
models.py)Adds
max_consecutive_same_tool: int | None = NonetoTaskDefinition. Extra fields are already allowed (model_config = ConfigDict(extra="allow")), so existing YAML files without this field parse without changes.Documentation
doc/GRAMMAR.md: Full section with semantics table and YAML exampleREADME.md: Brief mention in Taskflows section with link to GRAMMAR.mdTests
Unit tests (
TestLoopDetection, 10 tests): threshold triggering, alternating tools reset counter, disabled via 0 and negative, counter reset on tool change, state mutation visibility, TaskDefinition field acceptance/defaultsIntegration tests (
TestLoopDetectionIntegration, 4 tests): Driverun_main()→on_tool_end_hook→check_consecutive_tool_loop()through the real callback path by mockingdeploy_task_agents:LOOP_MAX_CONSECUTIVEenv var fallback0disables even with env setasync=Truebypasses detection entirely