Skip to content

Commit d07799d

Browse files
timvisher-ddclaudeSteffenDE
authored
fix: use current model's context window for usage_update size (#412)
The `usage_update` notification reports `size: 200000` even when the active model has a 1M context window (e.g. `opus[1m]`), causing clients to display incorrect context utilization (e.g. `689k/200k (344.3%)` instead of `689k/1000k (68.9%)`). Four bugs fixed: - **Min across all models**: The original code used `Math.min` across all `modelUsage` entries, so subagent models (Sonnet/Haiku with 200k windows) dragged down the reported size for the main Opus 1M model. Now tracks the top-level assistant model and looks up its context window specifically. - **Model name mismatch**: The SDK's streaming path keys `modelUsage` by the requested model alias (e.g. `claude-opus-4-6`) while `BetaMessage.model` on assistant messages has the resolved API response model (e.g. `claude-opus-4-6-20250514`). The exact-match lookup always missed, falling back to the hardcoded 200k default. Now falls back to prefix matching, preferring the longest/most-specific match. - **Synthetic messages corrupt model tracking**: `/compact` and similar commands emit assistant messages with `model: "<synthetic>"`. These were updating `lastAssistantModel`, causing the next `usage_update` to miss the `modelUsage` lookup and fall back to the 200k default. Now filters out `<synthetic>` models. - **Stale usage after compaction**: No `usage_update` was sent on `compact_boundary`, so clients kept showing the pre-compaction context size (e.g. `944k/1m`) right after "Compacting completed" until the next full turn. Now sends `used: 0` immediately on compaction. This is a deliberate approximation — the exact post-compaction size isn't known until the SDK's next API call, which replaces it within seconds. The alternative (no update) is worse UX: showing a full context bar right after compaction. Eight new tests cover: token sum correctness, current-model context window lookup, model switching, subagent isolation, prefix matching in both directions, and synthetic message filtering. Note: `bin/test` (local CI validation script) is cherry-picked from #353. Would fix `agent-shell`'s usage indicator which currently has to defend against this broken math: xenodium/agent-shell#364 ## Testing - [x] Manual riding with it for a bit. --------- Co-authored-by: Tim Visher <194828183+timvisher-dd@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Steffen Deusch <steffen@deusch.me>
1 parent 7506223 commit d07799d

2 files changed

Lines changed: 510 additions & 6 deletions

File tree

src/acp-agent.ts

Lines changed: 67 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ import {
4848
listSessions,
4949
McpServerConfig,
5050
ModelInfo,
51+
ModelUsage,
5152
Options,
5253
PermissionMode,
5354
Query,
@@ -468,6 +469,8 @@ export class ClaudeAcpAgent implements Agent {
468469
};
469470

470471
let lastAssistantTotalUsage: number | null = null;
472+
let lastAssistantModel: string | null = null;
473+
let lastContextWindowSize: number = 200000;
471474

472475
const userMessage = promptToClaude(params);
473476

@@ -527,9 +530,26 @@ export class ClaudeAcpAgent implements Agent {
527530
break;
528531
}
529532
case "compact_boundary": {
530-
// We don't know the exact size, but since we compacted,
531-
// we set it to zero. The client gets the exact size on the next message.
533+
// Send used:0 immediately so the client doesn't keep showing
534+
// the stale pre-compaction context size until the next turn.
535+
//
536+
// This is a deliberate approximation: we don't know the exact
537+
// post-compaction token count (only the SDK's next API call
538+
// reveals that). But used:0 is directionally correct — context
539+
// just dropped dramatically — and the real value replaces it
540+
// within seconds when the next result message arrives.
541+
// The alternative (no update) leaves the client showing e.g.
542+
// "944k/1m" right after the user sees "Compacting completed",
543+
// which is confusing and wrong.
532544
lastAssistantTotalUsage = 0;
545+
await this.client.sessionUpdate({
546+
sessionId: message.session_id,
547+
update: {
548+
sessionUpdate: "usage_update",
549+
used: 0,
550+
size: lastContextWindowSize,
551+
},
552+
});
533553
await this.client.sessionUpdate({
534554
sessionId: message.session_id,
535555
update: {
@@ -578,10 +598,11 @@ export class ClaudeAcpAgent implements Agent {
578598
session.accumulatedUsage.cachedReadTokens += message.usage.cache_read_input_tokens;
579599
session.accumulatedUsage.cachedWriteTokens += message.usage.cache_creation_input_tokens;
580600

581-
// Calculate context window size from modelUsage (minimum across all models used)
582-
const contextWindows = Object.values(message.modelUsage).map((m) => m.contextWindow);
583-
const contextWindowSize =
584-
contextWindows.length > 0 ? Math.min(...contextWindows) : 200000;
601+
const matchingModelUsage = lastAssistantModel
602+
? getMatchingModelUsage(message.modelUsage, lastAssistantModel)
603+
: null;
604+
const contextWindowSize = matchingModelUsage?.contextWindow ?? 200000;
605+
lastContextWindowSize = contextWindowSize;
585606

586607
// Send usage_update notification
587608
if (lastAssistantTotalUsage !== null) {
@@ -707,6 +728,11 @@ export class ClaudeAcpAgent implements Agent {
707728
}
708729

709730
// Store latest assistant usage (excluding subagents)
731+
// Sum all token types as a proxy for post-turn context occupancy:
732+
// current turn's output will become next turn's input.
733+
// Note: per the Anthropic API, input_tokens excludes cache tokens —
734+
// cache_read and cache_creation are reported separately, so summing
735+
// all four fields is not double-counting.
710736
if ((message.message as any).usage && message.parent_tool_use_id === null) {
711737
const messageWithUsage = message.message as unknown as SDKResultMessage;
712738
lastAssistantTotalUsage =
@@ -715,6 +741,16 @@ export class ClaudeAcpAgent implements Agent {
715741
messageWithUsage.usage.cache_read_input_tokens +
716742
messageWithUsage.usage.cache_creation_input_tokens;
717743
}
744+
// Track the current top-level model for context window size lookup
745+
// (exclude subagent messages to stay in sync with lastAssistantTotalUsage)
746+
if (
747+
message.type === "assistant" &&
748+
message.parent_tool_use_id === null &&
749+
message.message.model &&
750+
message.message.model !== "<synthetic>"
751+
) {
752+
lastAssistantModel = message.message.model;
753+
}
718754

719755
// Slash commands like /compact can generate invalid output... doesn't match
720756
// their own docs: https://docs.anthropic.com/en/docs/claude-code/sdk/sdk-slash-commands#%2Fcompact-compact-conversation-history
@@ -2109,3 +2145,28 @@ export function runAcp() {
21092145
const stream = ndJsonStream(input, output);
21102146
new AgentSideConnection((client) => new ClaudeAcpAgent(client), stream);
21112147
}
2148+
2149+
function commonPrefixLength(a: string, b: string) {
2150+
let i = 0;
2151+
while (i < a.length && i < b.length && a[i] === b[i]) {
2152+
i++;
2153+
}
2154+
return i;
2155+
}
2156+
2157+
function getMatchingModelUsage(modelUsage: Record<string, ModelUsage>, currentModel: string) {
2158+
let bestKey: string | null = null;
2159+
let bestLen = 0;
2160+
2161+
for (const key of Object.keys(modelUsage)) {
2162+
const len = commonPrefixLength(key, currentModel);
2163+
if (len > bestLen) {
2164+
bestLen = len;
2165+
bestKey = key;
2166+
}
2167+
}
2168+
2169+
if (bestKey) {
2170+
return modelUsage[bestKey];
2171+
}
2172+
}

0 commit comments

Comments
 (0)