Fix toolTokens over-counting when Anthropic Tool Search is enabled#4834
Merged
Fix toolTokens over-counting when Anthropic Tool Search is enabled#4834
Conversation
When TST (Tool Search Tool) is enabled, most tools are sent with defer_loading: true and don't count against the context window until the model loads them via tool_search. However, the toolTokens calculation in agentIntent counted ALL available tools, over-estimating by ~25-30K tokens and causing premature compaction. Fix: filter availableTools to only non-deferred tools (via IToolDeferralService) before calling countToolTokens() when TST is enabled. This gives an accurate budget that reflects what the API actually charges against the context window.
3f487b2 to
bd00295
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adjusts agent prompt budgeting to avoid over-counting tool schema tokens when Anthropic Tool Search defers most tools, preventing premature compaction and overly constrained message budgets.
Changes:
- Inject
IToolDeferralServiceintoAgentIntentInvocationand filter out deferred tools before callingcountToolTokens()when Anthropic Tool Search is enabled. - Thread
IToolDeferralServicethrough intent invocation subclass constructors (AskAgent,EditCode2,NotebookEditor) to satisfy the new base-class dependency. - Expand debug logging to include total vs non-deferred tool counts when Tool Search is enabled.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/extension/intents/node/agentIntent.ts | Filters tools used for token counting under Anthropic Tool Search and logs total vs non-deferred tool counts. |
| src/extension/intents/node/askAgentIntent.ts | Threads IToolDeferralService through to the base invocation. |
| src/extension/intents/node/editCodeIntent2.ts | Threads IToolDeferralService through to the base invocation. |
| src/extension/intents/node/notebookEditorIntent.ts | Threads IToolDeferralService through to the base invocation. |
roblourens
approved these changes
Mar 31, 2026
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided"). - Count all tools in agentIntent budget calculation - Reserve tool token budget in summarization prompt rendering - Add modelMaxPromptTokens to summarization telemetry - Add priority to summarization UserMessage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 6, 2026
…ools (#4992) Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
bhavyaus
added a commit
that referenced
this pull request
Apr 6, 2026
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided").
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When TST (Tool Search Tool) is enabled, most tools are sent to the Anthropic API with
defer_loading: trueand don't count against the context window until the model loads them viatool_search. However, thetoolTokenscalculation inagentIntent.tscounted ALL available tools (~40+ tools, ~40K tokens), over-estimating by ~25-30K tokens. This artificially reduced the message budget and inflated the context ratio, causing premature conversation compaction.Fix
Filter
availableToolsto only non-deferred tools (viaIToolDeferralService.isNonDeferredTool()) before callingcountToolTokens()when TST is enabled. Non-deferred tools (~18 tools, ~10-15K tokens) are the ones that are always immediately available and actually charged against the context window by the API.Changes
agentIntent.ts: ImportIToolDeferralServiceandisAnthropicToolSearchEnabled, inject the service, filter tools before token counting, enhance debug log to show total vs non-deferred tool countsaskAgentIntent.ts,editCodeIntent2.ts,notebookEditorIntent.ts: ThreadIToolDeferralServicethrough subclass constructors and super() calls