Skip to content

Allow agent loop to recover from tool call errors#592

Merged
nickmisasi merged 3 commits into
masterfrom
cursor/agent-tool-error-retries-e7d3
Apr 6, 2026
Merged

Allow agent loop to recover from tool call errors#592
nickmisasi merged 3 commits into
masterfrom
cursor/agent-tool-error-retries-e7d3

Conversation

@nickmisasi
Copy link
Copy Markdown
Collaborator

@nickmisasi nickmisasi commented Apr 3, 2026

Summary

This change feeds tool execution errors back into the agent loop instead of terminating immediately when every tool call in a batch fails. It also tracks consecutive failed tool executions and, after the third failure in a row, disables further tool use for that follow-up turn while instructing the agent to explain the latest error and ask the user for guidance.

This follow-up also ensures private tool-result KV entries are always cleaned up, even if the continuation turn itself fails.

Ticket Link

NONE

Screenshots

NONE

Release Note

Improved agent tool-calling recovery by returning tool execution errors to the model for up to three consecutive retries before asking the user for guidance.
Open in Web Open in Cursor 

Summary by CodeRabbit

  • New Features

    • Added a tool retry limit: after 3 consecutive tool failures, tools are disabled and the system asks for guidance.
  • Bug Fixes

    • Treat tool errors as actionable so continuations occur when only errors exist.
    • Ensure temporary data is cleaned up even if follow-up processing fails.
    • Prevent runaway tool loops by inserting a retry-limit system instruction and disabling tools.
  • Tests

    • Expanded tests for retry-limit behavior, continuation on errors, and cleanup on failure.

Co-authored-by: Nick Misasi <nick13misasi@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 18/19 tests passed (94.7%)

Provider Total Passed Failed Pass Rate
⚠️ OPENAI 19 18 1 94.7%

❌ Failed Evaluations

Show 1 failures

OPENAI

1. TestReactEval/[openai]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text "heart_eyes_cat", which is not an actual cat emoji or a heart/love emoji character.

Anthropic

⚠️ Overall: 18/19 tests passed (94.7%)

Provider Total Passed Failed Pass Rate
⚠️ ANTHROPIC 19 18 1 94.7%

❌ Failed Evaluations

Show 1 failures

ANTHROPIC

1. TestReactEval/[anthropic]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text string "heart_eyes_cat", not an actual cat emoji (e.g., 😺) or a heart/love emoji (e.g., ❤️).

This comment was automatically generated by the eval CI pipeline.

@nickmisasi nickmisasi marked this pull request as ready for review April 6, 2026 15:32
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Security Review: No Issues Found

Reviewed the changes for tool error retry/recovery logic. The PR is well-bounded and does not introduce security vulnerabilities:

  • Resource exhaustion: Tool retries are capped at MaxConsecutiveToolCallFailures = 3, after which tools are disabled at the API level via WithToolsDisabled() (not just a prompt instruction). The existing MaxToolResolutionDepth = 10 also applies.
  • Permission model: The continuation path reuses the same already-validated user, channel, and llmContext. No new data access paths or endpoints are introduced.
  • Channel isolation: No changes to channel membership checks or data scoping. Tool error results were already stored on post props; this PR only allows the LLM to respond to them.
  • Prompt injection surface: The injected system message (ToolRetryLimitSystemMessage) is a hardcoded constant — no user-controlled data flows into it.
  • Auto-run wrapper hardening: The new cfg.ToolsDisabled early-return in AutoRunToolsWrapper.ChatCompletion is a defensive improvement that prevents entering the auto-run loop when tools are already disabled.
Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Change introduces tooling to detect repeated failed tool-call executions, inject a retry-limit system message, and disable tools for subsequent LLM invocations. Adjustments span conversation tool handling, the auto-run loop, new retry utilities, and tests to validate continuation and KV cleanup.

Changes

Cohort / File(s) Summary
Tool Retry Module
llm/tool_retry.go
New file adding MaxConsecutiveToolCallFailures, ToolRetryLimitSystemMessage, CountTrailingFailedToolCalls(posts []Post) int, and EnsureToolRetryLimitSystemMessage(posts []Post) []Post to detect trailing tool-call errors and manage the retry-limit system message.
Auto-run / Run Loop
llm/auto_run_tools.go, llm/auto_run_tools_test.go
ChatCompletion now also delegates when cfg.ToolsDisabled is set. runToolLoop uses per-iteration mutable copies of opts and auto-run-tools; when trailing failures reach the max, it injects the retry-limit system message, appends WithToolsDisabled() to options, and stops further auto-run resolution. New test verifies behavior after three consecutive failures.
Conversation Tool Handling
conversations/tool_handling.go, conversations/tool_handling_test.go
Continuation gating broadened to treat ToolCallStatusError like ToolCallStatusSuccess for deciding to continue. HandleToolResult defers KV cleanup to ensure deletes on early returns. completeAndStreamToolResponse rewrites posts with the retry-limit system message and sets toolsDisabled = true when trailing failures hit the limit. Tests updated to assert continuation on errored tool calls and verify KV cleanup when continuation fails; test doubles now capture full LLM completion requests.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant LLM as LLM
    participant Handler as Tool Handler
    participant Tools as Tool Resolver
    participant KV as KV Store

    rect rgba(100,150,200,0.5)
    User->>LLM: user prompt
    LLM->>Handler: emits tool call events
    Handler->>Tools: resolve/execute tool
    Tools-->>Handler: result (success/error)
    Handler->>LLM: append tool-result post and re-invoke LLM
    end

    rect rgba(200,100,100,0.5)
    Handler->>Handler: CountTrailingFailedToolCalls(posts)
    Handler->>Handler: EnsureToolRetryLimitSystemMessage(posts)
    Handler->>Handler: set toolsDisabled = true
    end

    rect rgba(100,200,100,0.5)
    Handler->>LLM: re-invoke with tools disabled
    LLM-->>User: assistant explains error / requests guidance
    KV->>Handler: deferred KV deletions occur regardless of continuation outcome
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Setup Cloud Test Server

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Allow agent loop to recover from tool call errors' accurately captures the main objective: enabling the agent to handle and recover from tool execution failures rather than terminating immediately.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/agent-tool-error-retries-e7d3

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
llm/auto_run_tools_test.go (1)

436-496: Assert the fourth invocation actually had tools disabled.

This proves the retry-limit prompt was injected, but it doesn't prove WithToolsDisabled() reached the wrapped model. If that option were dropped and the stubbed fourth response stayed text, this test would still pass. Capturing the per-call LanguageModelConfig and asserting ToolsDisabled == true on the last call would make the regression this test is targeting explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@llm/auto_run_tools_test.go` around lines 436 - 496, In
TestAutoRunToolsWrapper_DisablesToolsAfterThreeConsecutiveFailures capture the
per-call LanguageModelConfig passed into the wrapped model (extend or use
capturingLLM to record each call's config), then after calling
wrapper.ChatCompletion assert that the last recorded config has ToolsDisabled ==
true (verifying WithToolsDisabled() reached the inner model); reference the
capturingLLM used to wrap inner and the WithToolsDisabled option so you validate
the fourth invocation (inner.callCount / final recorded call) had ToolsDisabled
set.
conversations/tool_handling_test.go (1)

1063-1066: Verify the continuation request actually contains the tool error.

This test only checks that a follow-up post was streamed. Because capturingLanguageModel.ChatCompletion ignores CompletionRequest, it won't catch a regression where the follow-up turn runs with redacted or empty tool data instead of "tool execution failed". Capturing the request posts and asserting the last bot post carries the error status/result would cover the core behavior directly.

Also applies to: 1129-1140

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@conversations/tool_handling_test.go` around lines 1063 - 1066, The test
currently only verifies a follow-up post was streamed but doesn't assert that
the continuation request actually includes the tool error because
capturingLanguageModel.ChatCompletion ignores the CompletionRequest; update the
test to capture the outgoing CompletionRequest (or the streamed posts from
fakeStreamingService) and assert the final bot continuation contains the tool
error text "tool execution failed" (or the expected error status/result).
Specifically, modify or extend capturingLanguageModel.ChatCompletion to record
the passed CompletionRequest (or inspect fakeStreamingService's captured posts)
and add an assertion that the last bot post / continuation request includes the
tool error, ensuring the check covers bots.NewBot-created bot behavior during
continuation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@conversations/tool_handling.go`:
- Around line 437-443: completeAndStreamToolResponse currently treats
ToolCallStatusError as actionable, which can cause an early return before
cleaning up private KV entries; ensure c.deleteToolCallKVEntries(post.Id,
resultKVKey, toolCallKVKey) always runs even on error-only batches by invoking
the cleanup unconditionally (move the call before any early return or use defer)
after computing hasActionableResult; reference symbols:
completeAndStreamToolResponse, hasActionableResult, llm.ToolCallStatusError, and
c.deleteToolCallKVEntries to locate and update the code so tool_result_private /
tool_call_private are always removed.

---

Nitpick comments:
In `@conversations/tool_handling_test.go`:
- Around line 1063-1066: The test currently only verifies a follow-up post was
streamed but doesn't assert that the continuation request actually includes the
tool error because capturingLanguageModel.ChatCompletion ignores the
CompletionRequest; update the test to capture the outgoing CompletionRequest (or
the streamed posts from fakeStreamingService) and assert the final bot
continuation contains the tool error text "tool execution failed" (or the
expected error status/result). Specifically, modify or extend
capturingLanguageModel.ChatCompletion to record the passed CompletionRequest (or
inspect fakeStreamingService's captured posts) and add an assertion that the
last bot post / continuation request includes the tool error, ensuring the check
covers bots.NewBot-created bot behavior during continuation.

In `@llm/auto_run_tools_test.go`:
- Around line 436-496: In
TestAutoRunToolsWrapper_DisablesToolsAfterThreeConsecutiveFailures capture the
per-call LanguageModelConfig passed into the wrapped model (extend or use
capturingLLM to record each call's config), then after calling
wrapper.ChatCompletion assert that the last recorded config has ToolsDisabled ==
true (verifying WithToolsDisabled() reached the inner model); reference the
capturingLLM used to wrap inner and the WithToolsDisabled option so you validate
the fourth invocation (inner.callCount / final recorded call) had ToolsDisabled
set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: eecef550-0b71-454f-aa44-26b2bf9a5b0b

📥 Commits

Reviewing files that changed from the base of the PR and between 605e0db and 8f237be.

📒 Files selected for processing (5)
  • conversations/tool_handling.go
  • conversations/tool_handling_test.go
  • llm/auto_run_tools.go
  • llm/auto_run_tools_test.go
  • llm/tool_retry.go

Comment thread conversations/tool_handling.go
Co-authored-by: Nick Misasi <nick13misasi@gmail.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Security Review: No Issues Found

Reviewed the latest revision (commits 8f237be and 3e77f92) for tool error retry/recovery logic. The PR is well-bounded and does not introduce security vulnerabilities:

  • Resource exhaustion: Tool retries are capped at MaxConsecutiveToolCallFailures = 3, after which tools are disabled at the API level via WithToolsDisabled() (not just a prompt instruction). The existing MaxToolResolutionDepth = 10 loop bound also applies in the auto-run path, limiting total tool executions.
  • Permission model: The continuation path reuses the same already-validated user, channel, and llmContext. No new data access paths or endpoints are introduced.
  • Channel isolation: No changes to channel membership checks or data scoping. Tool error results were already stored on post props; this PR only allows the LLM to respond to them.
  • Prompt injection surface: The injected system message (ToolRetryLimitSystemMessage) is a hardcoded constant — no user-controlled data flows into it. Tool error text flowing back to the LLM is the same trust level as tool success results, which is pre-existing behavior.
  • KV cleanup (new commit 3e77f92): Moving KV cleanup to defer ensures tool_result_private / tool_call_private entries are always removed even when the follow-up LLM call fails, preventing stale KV entry accumulation. This is a correctness improvement.
  • Auto-run wrapper hardening: The cfg.ToolsDisabled early-return in AutoRunToolsWrapper.ChatCompletion prevents entering the auto-run tool loop when tools are already disabled — a defensive improvement.
Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

@nickmisasi nickmisasi requested a review from crspeller April 6, 2026 17:34
Copy link
Copy Markdown
Member

@crspeller crspeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One non-blocking observation.

Comment thread llm/tool_retry.go
// CountTrailingFailedToolCalls counts consecutive trailing tool executions that
// failed. A successful tool execution resets the streak. Posts without executed
// tool results stop the scan because they represent a new agent turn.
func CountTrailingFailedToolCalls(posts []Post) int {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like they could be easily unit tested, but have not unit tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cursor add unit test coverage for this file. Only write valuable tests, don't overly rely on mocks or go crazy with edge case coverage

Copy link
Copy Markdown
Contributor

@cursor cursor Bot Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

  • Added focused table-driven unit tests for CountTrailingFailedToolCalls() covering consecutive trailing failures, reset-on-success behavior, and stopping on non-executed tool batches. tool_retry_test.go:12-69
  • Added focused table-driven unit tests for EnsureToolRetryLimitSystemMessage() covering prepend, append-to-existing-system-prompt, and no-op behavior when the retry message already exists. tool_retry_test.go:71-125

Testing

  • go test -v ./llm/... -run '^(TestCountTrailingFailedToolCalls|TestEnsureToolRetryLimitSystemMessage)$' — targeted just the new helper coverage so the tests stay valuable and directly exercise the behavior requested in review.
Open in Web Open in Cursor 

Co-authored-by: Nick Misasi <nick13misasi@gmail.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review: No Issues Found

Reviewed the full PR (through 03e2753) for the tool error retry/recovery logic. No security vulnerabilities identified:

  • Resource exhaustion: Tool retries are hard-capped at MaxConsecutiveToolCallFailures = 3, enforced server-side via WithToolsDisabled() (not just a prompt instruction). The existing MaxToolResolutionDepth = 10 loop bound also applies in the auto-run path.
  • Permission model: The continuation path reuses the same already-validated user, channel, and llmContext. No new data access paths or endpoints are introduced.
  • Channel isolation: No changes to channel membership checks or data scoping. Tool error results were already stored on post props; this PR only allows the LLM to respond to them.
  • Prompt injection surface: The injected system message (ToolRetryLimitSystemMessage) is a hardcoded constant — no user-controlled data flows into it. Tool error text flowing back to the LLM has the same trust level as tool success results (pre-existing behavior).
  • System prompt growth: EnsureToolRetryLimitSystemMessage has an idempotency check, and both call sites disable tools after injection, preventing recursive prompt growth across iterations.
  • KV cleanup: Moving KV cleanup to defer ensures tool_result_private / tool_call_private entries are always removed even when the follow-up LLM call fails — a correctness improvement.
  • Auto-run wrapper hardening: The cfg.ToolsDisabled early-return in AutoRunToolsWrapper.ChatCompletion prevents entering the auto-run tool loop when tools are already disabled.
Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
llm/tool_retry_test.go (1)

103-113: Add an idempotency case for prompts that already include the retry message.

Current coverage only checks the exact-equality path (Message == ToolRetryLimitSystemMessage). Add a case where the system prompt already contains the retry message alongside other prompt text to prevent duplicate appends on repeated calls.

✅ Proposed test case addition
@@
 		{
 			name: "returns posts unchanged when retry message already exists",
 			posts: []Post{
 				{Role: PostRoleSystem, Message: ToolRetryLimitSystemMessage},
 				{Role: PostRoleUser, Message: "hello"},
 			},
 			expected: []Post{
 				{Role: PostRoleSystem, Message: ToolRetryLimitSystemMessage},
 				{Role: PostRoleUser, Message: "hello"},
 			},
 		},
+		{
+			name: "returns posts unchanged when retry message is already appended to system prompt",
+			posts: []Post{
+				{Role: PostRoleSystem, Message: "base prompt\n\n" + ToolRetryLimitSystemMessage},
+				{Role: PostRoleUser, Message: "hello"},
+			},
+			expected: []Post{
+				{Role: PostRoleSystem, Message: "base prompt\n\n" + ToolRetryLimitSystemMessage},
+				{Role: PostRoleUser, Message: "hello"},
+			},
+		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@llm/tool_retry_test.go` around lines 103 - 113, Add an idempotency test case
in llm/tool_retry_test.go that covers a system prompt which already contains
ToolRetryLimitSystemMessage as part of a larger string (not exact-equal); create
a new test entry in the existing cases array using the Post struct (fields Role
and Message) where a system Post.Message contains extra text plus
ToolRetryLimitSystemMessage and assert the expected output is unchanged (no
duplicate append), referencing ToolRetryLimitSystemMessage and the
Post/PostRoleSystem names so the test targets the same code path that appends
the retry message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@llm/tool_retry_test.go`:
- Around line 103-113: Add an idempotency test case in llm/tool_retry_test.go
that covers a system prompt which already contains ToolRetryLimitSystemMessage
as part of a larger string (not exact-equal); create a new test entry in the
existing cases array using the Post struct (fields Role and Message) where a
system Post.Message contains extra text plus ToolRetryLimitSystemMessage and
assert the expected output is unchanged (no duplicate append), referencing
ToolRetryLimitSystemMessage and the Post/PostRoleSystem names so the test
targets the same code path that appends the retry message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 19be01c7-cff3-4285-bd79-5daa61719827

📥 Commits

Reviewing files that changed from the base of the PR and between 3e77f92 and 03e2753.

📒 Files selected for processing (1)
  • llm/tool_retry_test.go

@nickmisasi nickmisasi merged commit 6ddcfde into master Apr 6, 2026
35 checks passed
@nickmisasi nickmisasi deleted the cursor/agent-tool-error-retries-e7d3 branch April 6, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants