Skip to content

fix(hook): PendingToolRecoveryHook false positive on historical ToolResultBlocks in multi-turn (#1555)#1566

Open
Sparkle6979 wants to merge 3 commits into
agentscope-ai:mainfrom
Sparkle6979:fix/pending-tool-recovery-multi-turn
Open

fix(hook): PendingToolRecoveryHook false positive on historical ToolResultBlocks in multi-turn (#1555)#1566
Sparkle6979 wants to merge 3 commits into
agentscope-ai:mainfrom
Sparkle6979:fix/pending-tool-recovery-multi-turn

Conversation

@Sparkle6979

Copy link
Copy Markdown
Contributor

Summary

PendingToolRecoveryHook is silently non-functional in multi-turn conversations.

Root Cause

AgentBase.notifyPreCall() (line 672-678) merges the full memory snapshot with new callArgs into PreCallEvent.inputMessages. The hook's guard condition at line 110 checks this merged list for any ToolResultBlock:

boolean userProvidedResults =
    inputMessages.stream().anyMatch(m -> m.hasContentBlocks(ToolResultBlock.class));
if (userProvidedResults) return;  // skip recovery

In multi-turn scenarios, memory always contains ToolResultBlocks from earlier successful tool calls, so userProvidedResults is always true — the hook always skips recovery, and ReActAgent.doCall() throws IllegalStateException.

Fix

Narrow the check to only match ToolResultBlock IDs present in the pendingIds set:

boolean userProvidedResults =
    inputMessages.stream()
        .flatMap(m -> m.getContentBlocks(ToolResultBlock.class).stream())
        .anyMatch(tr -> pendingIds.contains(tr.getId()));

Historical ToolResultBlocks (e.g. call_1) are for already-resolved tool calls, so their IDs are never in pendingIds — no false positive. HITL results (user-provided ToolResultBlock for a pending call) do match pendingIds, so the hook correctly defers to doCall().

Why existing tests pass

HookStopAgentTest.testNewMsgWithPendingToolUseContinuesActing only covers single-turn: memory has no historical ToolResultBlock, so the false positive never triggers.

Closes #1555

Test plan

  • Core bug: multi-turn plain text recovery with historical ToolResultBlocks
  • Accumulated history: 3+ successful tool turns → interrupted → recovery
  • HITL single-turn and multi-turn: user provides matching result
  • Unrelated ToolResultBlock ID does not block auto-recovery
  • Single-turn regression: classic recovery still works
  • Normal multi-turn without pending calls unaffected
  • Multiple pending calls: all auto-recovered (single-turn and multi-turn)
  • Existing HookStopAgentTest passes (no regression)

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Sparkle6979

Copy link
Copy Markdown
Contributor Author

Design note: why this approach vs suggested alternatives

The issue description proposed two options:

Option A (filter by role): filter out ASSISTANT/TOOL messages, then check for ToolResultBlock. This is fragile — in HITL scenarios, user-provided ToolResultBlocks also carry TOOL roles, so the filter does not reliably distinguish historical results from user-provided results.

Option B (expose snapshotSize on PreCallEvent): requires an API change to PreCallEvent, which is more invasive.

This PR approach (match pendingIds): directly checks whether any ToolResultBlock ID matches a pending tool call. Historical results are never in pendingIds, so no false positive. HITL results for a pending call DO match, so they correctly defer to doCall(). No API change, 3 lines modified.

@Sparkle6979 Sparkle6979 changed the title fix: PendingToolRecoveryHook false positive on historical ToolResultBlocks in multi-turn (#1555) fix(hook): PendingToolRecoveryHook false positive on historical ToolResultBlocks in multi-turn (#1555) Jun 1, 2026
@AgentScopeJavaBot AgentScopeJavaBot added bug Something isn't working area/core/agent Agent runtime, pipeline, hooks, plan labels Jun 2, 2026

@AgentScopeJavaBot AgentScopeJavaBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Review

The fix is precise and correctly addresses the multi-turn false positive described in #1555. By narrowing the HITL guard from "any ToolResultBlock present" to "ToolResultBlock id ∈ pendingIds", the hook now correctly distinguishes user-supplied results for the currently pending tool call from historical results already resolved in earlier turns. Since pendingIds is built from ToolUseBlock ids on the last assistant message that have no matching result, historical (already-resolved) ids cannot collide with it, so the new condition is logically tight. The HITL deferral path remains intact: when the user sends a matching id the hook still returns the event unmodified for ReActAgent.doCall() to consume. No new null-safety risk of practical concern — pendingIds is sourced from non-null ToolUseBlock::getId, and getContentBlocks always returns a non-null list. Test coverage is thorough and well-partitioned (core bug, HITL, unrelated id, single-turn regression, multi-pending). No issues found that would block merge.

this.executed = executed;
}

@Tool(name = "test_tool", description = "A test tool")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] createConditionalStopHook(counter, stopAt) couples the test to internal PostReasoningEvent firing order (the stopAt=3/stopAt=7 magic numbers). It works today but will be brittle if reasoning-event semantics evolve. Consider stopping based on a structural signal (e.g. once a specific ToolUseBlock id appears) in a follow-up to make these tests more refactor-resilient. Non-blocking.

inputMessages.stream().anyMatch(m -> m.hasContentBlocks(ToolResultBlock.class));
inputMessages.stream()
.flatMap(m -> m.getContentBlocks(ToolResultBlock.class).stream())
.anyMatch(tr -> pendingIds.contains(tr.getId()));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Defensive consideration only (not a real bug today): ToolResultBlock.getId() is documented as nullable, and Set#contains(null) is a legal lookup. In normal flows pendingIds is sourced from non-null ToolUseBlock ids so a null result-id cannot match, but if a malformed user message ever ships a null-id ToolResultBlock the hook will silently ignore it (treat as not-HITL and auto-patch). That is arguably the safer behavior, so no change required — flagging only so the assumption is explicit.

@LearningGp

Copy link
Copy Markdown
Member

FYI #1409

@Sparkle6979

Copy link
Copy Markdown
Contributor Author

Noticed that PendingToolRecoveryHook.java has been removed from main and replaced by LegacyHookDispatcher. Closing this PR as the target file no longer exists.

@Sparkle6979 Sparkle6979 closed this Jun 4, 2026
@Sparkle6979 Sparkle6979 reopened this Jun 4, 2026
@Sparkle6979

Copy link
Copy Markdown
Contributor Author

Checked the main branch — PendingToolRecoveryHook.java no longer exists under agentscope-core/src/main/java/io/agentscope/core/hook/. The hook directory now contains LegacyHookDispatcher.java instead.

If this file is still used somewhere (e.g. on a release branch), happy to rebase. Otherwise feel free to close.

@LearningGp

Copy link
Copy Markdown
Member

PTAL @chickenlj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core/agent Agent runtime, pipeline, hooks, plan bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: PendingToolRecoveryHook always skips recovery for multi-turn ReActAgent (Supervisor) due to false positive userProvidedResults check

3 participants