feat: add opt-in loop detection for repetitive tool calls by sheeki03 · Pull Request #219 · GitHubSecurityLab/seclab-taskflow-agent

sheeki03 · 2026-04-10T12:05:22Z

Problem

During a 45-minute security audit, the agent entered a loop of 60+ consecutive search_code calls with no progress. There was no mechanism to detect or abort this — the task ran until max_turns was exhausted, wasting API credits and time.

Depends on: #216 (auto-save scaffolding)

Changes

Loop detection engine (`runner.py`)

check_consecutive_tool_loop() — Module-level function that tracks consecutive same-tool-name calls. When the count reaches the threshold, raises LoopDetectedError. Extracted as a testable function with explicit state parameters; on_tool_end_hook delegates to it.

LoopDetectedError — New exception with tool_name and count attributes. Caught in the retry loop as non-retryable (immediately breaks, saves session, re-raises).

Behavior:

Counter increments when the same tool fires consecutively
Counter resets to 1 when a different tool fires
Counter resets between prompts in repeat_prompt mode (prevents false positives when prompt B's first tool matches prompt A's last)
Skipped entirely for async_task=True (concurrent tool calls make consecutive counting meaningless)
On detection: saves triggering call to auto-save log (if enabled), then raises

Configuration

Per-task via max_consecutive_same_tool field, or globally via LOOP_MAX_CONSECUTIVE env var:

Setting	Behavior
Omitted / `null`	Inherits from `LOOP_MAX_CONSECUTIVE` env; if unset or `0`, disabled
`0`	Explicitly disabled, even when env var is set
Positive integer	Enabled at that threshold

Invalid (non-numeric) LOOP_MAX_CONSECUTIVE values are caught and fall back to 0 with a warning, matching AUTO_SAVE_INTERVAL behavior.

Model (`models.py`)

Adds max_consecutive_same_tool: int | None = None to TaskDefinition. Extra fields are already allowed (model_config = ConfigDict(extra="allow")), so existing YAML files without this field parse without changes.

Documentation

doc/GRAMMAR.md: Full section with semantics table and YAML example
README.md: Brief mention in Taskflows section with link to GRAMMAR.md

Tests

Unit tests (TestLoopDetection, 10 tests): threshold triggering, alternating tools reset counter, disabled via 0 and negative, counter reset on tool change, state mutation visibility, TaskDefinition field acceptance/defaults

Integration tests (TestLoopDetectionIntegration, 4 tests): Drive run_main() → on_tool_end_hook → check_consecutive_tool_loop() through the real callback path by mocking deploy_task_agents:

Task field triggers detection
LOOP_MAX_CONSECUTIVE env var fallback
Explicit 0 disables even with env set
async=True bypasses detection entirely

Adds _tool_call_counter, _auto_save_interval, _auto_save_dir, _write_auto_save, and _read_tool_log to run_main. When AUTO_SAVE_DIR and AUTO_SAVE_INTERVAL are set, tool results are periodically appended to an NDJSON log file. Disabled by default (interval=0).

Moves write_auto_save() and read_tool_log() from closures inside run_main() to module-level functions with explicit parameters. Tests now exercise the real implementation instead of duplicating the logic.

- Add encoding="utf-8" to open() in write_auto_save and read_tool_log - Catch ValueError on non-numeric AUTO_SAVE_INTERVAL with fallback to 0 - Soften docstring from "crash-safe" to "append-only"

Tracks consecutive same-tool-name calls in on_tool_end_hook. When the count reaches the threshold, raises LoopDetectedError to abort the task. Configured per-task via max_consecutive_same_tool or globally via LOOP_MAX_CONSECUTIVE env var. Disabled by default. Skipped for async tasks. Counter resets between prompts in repeat_prompt mode.

…_loop Moves the consecutive-tool tracking logic from the on_tool_end_hook closure into a module-level function with explicit state parameters. Tests now exercise the real implementation. Adds tests for negative threshold, counter reset on tool change, and state mutation visibility.

sheeki03 added 9 commits April 10, 2026 14:25

docs: document AUTO_SAVE_DIR and AUTO_SAVE_INTERVAL in README

c562128

refactor: extract autosave helpers to module-level testable functions

09ca05b

Moves write_auto_save() and read_tool_log() from closures inside run_main() to module-level functions with explicit parameters. Tests now exercise the real implementation instead of duplicating the logic.

fix: add explicit UTF-8 encoding and validate AUTO_SAVE_INTERVAL

72d3da5

- Add encoding="utf-8" to open() in write_auto_save and read_tool_log - Catch ValueError on non-numeric AUTO_SAVE_INTERVAL with fallback to 0 - Soften docstring from "crash-safe" to "append-only"

fix: use mock-based write failure test for cross-platform CI

5172eb5

docs: mention max_consecutive_same_tool in README Taskflows section

95908d9

fix: guard LOOP_MAX_CONSECUTIVE against non-numeric env values

13a6f34

sheeki03 requested review from JarLob, Kwstubbs, anticomputer, kevinbackhouse, m-y-mo, p- and sylwia-budzynska as code owners April 10, 2026 12:05

kevinbackhouse closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opt-in loop detection for repetitive tool calls#219

feat: add opt-in loop detection for repetitive tool calls#219
sheeki03 wants to merge 9 commits intoGitHubSecurityLab:mainfrom
sheeki03:feat/loop-detection

sheeki03 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sheeki03 commented Apr 10, 2026

Problem

Changes

Loop detection engine (runner.py)

Configuration

Model (models.py)

Documentation

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Loop detection engine (`runner.py`)

Model (`models.py`)