Fix: count all tool tokens in budget including deferred tools#4990
Fix: count all tool tokens in budget including deferred tools#4990
Conversation
…ools Deferred tools (defer_loading: true) still count against the API context window. The 3/30 change (#4834) excluded them from toolTokens, causing the message budget to be ~31K tokens too generous and leading to context_length_exceeded errors followed by summarization failures ("No messages provided"). - Count all tools in agentIntent budget calculation - Reserve tool token budget in summarization prompt rendering - Add modelMaxPromptTokens to summarization telemetry - Add priority to summarization UserMessage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates agent prompt budgeting to count all tool schema tokens (including deferred tools) against the model context window, and adjusts conversation summarization so its prompt rendering reserves token budget for tools. It also extends summarization telemetry and adds a unit test capturing a zero-messages rendering edge case.
Changes:
- Revert tool token counting in
AgentIntentInvocationto include deferred tools when computing the message budget. - Reserve tool token budget when rendering the summarization prompt in Full mode, and add
modelMaxPromptTokensto summarization telemetry. - Add a unit test reproducing the “No messages provided” failure mode via an empty rendered prompt.
Show a summary per file
| File | Description |
|---|---|
| src/extension/prompts/node/agent/test/summarization.spec.tsx | Adds a repro test where summarization prompt rendering produces zero messages under an extremely small token budget. |
| src/extension/prompts/node/agent/summarizedConversationHistory.tsx | Reserves message budget for tools in Full summarization mode; tweaks message priority; adds modelMaxPromptTokens to telemetry. |
| src/extension/intents/node/agentIntent.ts | Counts tool tokens across all available tools (no deferral filtering) and removes tool-deferral plumbing from the invocation. |
Copilot's findings
Comments suppressed due to low confidence (1)
src/extension/prompts/node/agent/summarizedConversationHistory.tsx:689
- After rendering the summarization prompt,
summarizationPromptcan legitimately be empty (0 messages) when the token budget is too small (see the new repro test). The current code proceeds tomakeChatRequest2withmessages=[], which will fail validation (“No messages provided”) and may prevent a clean fallback path while also producing noisy telemetry. Add an explicit guard after render (e.g. ifsummarizationPrompt.length===0, throw a BudgetExceededError or a dedicated error) to skip the request and force the intended fallback/handling.
let summarizationPrompt: ChatMessage[];
const associatedRequestId = this.props.promptContext.conversation?.getLatestTurn().id;
try {
summarizationPrompt = (await renderPromptElement(this.instantiationService, endpoint, ConversationHistorySummarizationPrompt, { ...propsInfo.props, simpleMode: mode === SummaryMode.Simple }, undefined, this.token)).messages;
this.logInfo(`summarization prompt rendered in ${stopwatch.elapsed()}ms.`, mode);
} catch (e) {
const budgetExceeded = e instanceof BudgetExceededError;
const outcome = budgetExceeded ? 'budget_exceeded' : 'renderError';
this.logInfo(`Error rendering summarization prompt in mode: ${mode}. ${e.stack}`, mode);
this.sendSummarizationTelemetry(outcome, '', this.props.endpoint.model, mode, stopwatch.elapsed(), undefined);
- Files reviewed: 3/3 changed files
- Comments generated: 2
| @@ -379,7 +378,6 @@ export class AgentIntentInvocation extends EditCodeIntentInvocation implements I | |||
| @IExperimentationService private readonly expService: IExperimentationService, | |||
| @IAutomodeService private readonly automodeService: IAutomodeService, | |||
| @IOTelService override readonly otelService: IOTelService, | |||
There was a problem hiding this comment.
AgentIntentInvocation’s constructor no longer accepts IToolDeferralService, but several subclasses (e.g. AskAgentIntentInvocation/EditCode2IntentInvocation/NotebookEditorIntentInvocation) still inject toolDeferralService and pass it to super(...). This will fail TypeScript compilation due to an argument count mismatch. Update those subclasses to remove the extra DI parameter + super argument (and remove now-unused imports) to keep constructors consistent.
| @IOTelService override readonly otelService: IOTelService, | |
| @IOTelService override readonly otelService: IOTelService, | |
| _toolDeferralService?: unknown, |
| // In Full mode, tools are sent alongside the summarization prompt with | ||
| // tool_choice: 'none'. Reserve budget for them so the rendered messages | ||
| // plus tools don't exceed the model's context window. | ||
| const tools = this.props.tools; | ||
| const toolTokens = mode === SummaryMode.Full && tools?.length | ||
| ? await this.props.endpoint.acquireTokenizer().countToolTokens(tools) | ||
| : 0; | ||
| const endpoint = toolTokens > 0 | ||
| ? this.props.endpoint.cloneWithTokenOverride( | ||
| Math.max(1, Math.floor((this.props.endpoint.modelMaxPromptTokens - toolTokens) * 0.9))) | ||
| : this.props.endpoint; |
There was a problem hiding this comment.
modelMaxPromptTokens telemetry is documented as “the … budget used for the summarization prompt rendering”, but the value sent is this.props.endpoint.modelMaxPromptTokens (the pre-reservation budget). Since getSummary may clone the endpoint with a reduced token budget after reserving tool tokens, telemetry will be misleading. Consider reporting the effective budget actually used for rendering (e.g. the cloned endpoint’s modelMaxPromptTokens / computed message budget), or report both original and effective budgets.
No description provided.