Skip to content

Commit b9c7f0d

Browse files
authored
feat: support richer token cost tracking for ask (#1353)
* feat(web): track estimated per-tool output token usage in chat Estimate the input-token footprint of each tool call's output (the cost the result imposes when fed back to the model on subsequent steps) using a local length-based estimator, persist it per tool call in the chat message metadata, and surface it inline in each tool call row next to the Details toggle. Estimates are ~-prefixed to keep them distinct from the authoritative billed token totals. * feat(web): track per-step token usage and fix tool output estimation Record the provider-reported input/output token usage of each agent step in the chat message metadata and display it per step group in the thinking steps view (joined to UI step groups via the step index now tagged on each tool token usage entry). Also fix the tool output estimator to measure the model-visible payload: tools with a toModelOutput mapping (all builtins) send only their output text to the model, so estimating the raw ToolResult object was counting UI-only metadata the model never sees. The bytes-per-token ratio is now a uniform ~2 chars/token, calibrated against provider-reported per-step usage of code-heavy tool results. * improve detailsCard display for the step token usage * refactor(web): derive token usage post-stream and join steps by position Collect usage from researchStream.steps and response.messages after the stream completes (covers approval-gated and failed tool calls, off the hot path), nest tool estimates under their step in a single stepTokenUsage array, and join UI steps to entries by stepIndex. * docs: update changelog for token cost tracking * fix(web): prevent caller metadata from overwriting derived token fields Spread caller-supplied metadata before the derived token fields so stepTokenUsage and the totals can't be clobbered, which would desync the UI's index-based step join. * refactor(web): move token estimation into EE chat --------- Co-authored-by: Jack Minnetian <270441393+BlueBottleLatte@users.noreply.github.com>
1 parent 26435a4 commit b9c7f0d

13 files changed

Lines changed: 486 additions & 116 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- Added per-step token cost tracking and estimated tool call token usage to Ask Sourcebot chat history. [#1353](https://github.com/sourcebot-dev/sourcebot/pull/1353)
12+
1013
## [5.0.4] - 2026-06-18
1114

1215
### Changed

packages/web/src/ee/features/chat/agent.test.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,8 @@ const createAssistantMessage = (parts: SBChatMessagePart[]): SBChatMessage => ({
137137
});
138138

139139
const createFakeStreamResult = () => ({
140-
response: Promise.resolve(new Response()),
140+
response: Promise.resolve({ messages: [] }),
141+
steps: Promise.resolve([]),
141142
totalUsage: Promise.resolve({
142143
inputTokens: 1,
143144
outputTokens: 1,

packages/web/src/ee/features/chat/agent.ts

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
import { SBChatMessage, SBChatMessageMetadata } from "@/features/chat/types";
1+
import { SBChatMessage, SBChatMessageMetadata, StepTokenUsageEntry, ToolTokenUsageEntry } from "@/features/chat/types";
2+
import { estimateModelToolOutputTokens } from "@/ee/features/chat/tokenEstimation";
23
import { getFileSource } from '@/features/git';
34
import { isServiceError } from "@/lib/utils";
45
import { LanguageModelV3 as AISDKLanguageModelV3 } from "@ai-sdk/provider";
@@ -190,19 +191,76 @@ export const createMessageStream = async ({
190191
});
191192

192193
const totalUsage = await researchStream.totalUsage;
194+
const steps = await researchStream.steps;
195+
const response = await researchStream.response;
196+
197+
// Tool output estimates are derived from `response.messages` rather
198+
// than per-step `toolResults` because the response messages cover
199+
// tool calls that never run inside a step — approval-gated tools
200+
// execute before the step loop, and thrown tool errors are recorded
201+
// as `tool-error` parts that `toolResults` excludes. Their
202+
// `tool-result` parts also carry the output in model-visible form
203+
// (`toModelOutput` already applied), which is exactly the payload
204+
// whose token footprint we want to estimate.
205+
const toolUsageByToolCallId = new Map<string, ToolTokenUsageEntry>(
206+
response.messages.flatMap((message) =>
207+
message.role !== 'tool' ? [] : message.content.flatMap((part) =>
208+
part.type !== 'tool-result' ? [] : [[part.toolCallId, {
209+
toolCallId: part.toolCallId,
210+
toolName: part.toolName,
211+
estimatedOutputTokens: estimateModelToolOutputTokens(part.output),
212+
}] as const]
213+
)
214+
)
215+
);
216+
217+
// One entry per step, in step order. The UI joins its step groups
218+
// to these entries by array position, so the order and count must
219+
// mirror the stream's steps exactly. Tool calls nest under the
220+
// step they ran in; `content` is matched rather than `toolResults`
221+
// so that thrown tool errors (`tool-error` parts, which
222+
// `toolResults` excludes) are still attributed to their step.
223+
const stepTokenUsage: StepTokenUsageEntry[] = steps.map(({ usage, content }) => ({
224+
inputTokens: usage.inputTokens,
225+
outputTokens: usage.outputTokens,
226+
cacheReadTokens: usage.inputTokenDetails?.cacheReadTokens,
227+
tools: content.flatMap((part) => {
228+
if (part.type !== 'tool-result' && part.type !== 'tool-error') {
229+
return [];
230+
}
231+
const entry = toolUsageByToolCallId.get(part.toolCallId);
232+
if (!entry) {
233+
return [];
234+
}
235+
toolUsageByToolCallId.delete(part.toolCallId);
236+
return [entry];
237+
}),
238+
}));
239+
240+
// Any estimates left unclaimed belong to tool calls that executed
241+
// before the step loop (approval continuations). Their output
242+
// enters the context as input to this phase's first step, so nest
243+
// them under it.
244+
if (toolUsageByToolCallId.size > 0 && stepTokenUsage.length > 0) {
245+
stepTokenUsage[0].tools.unshift(...toolUsageByToolCallId.values());
246+
}
193247

194248
writer.write({
195249
type: 'message-metadata',
196250
messageMetadata: {
251+
// Spread first so the derived fields below can't be overwritten by caller metadata.
252+
...metadata,
197253
totalTokens: (priorMetadata?.totalTokens ?? 0) + (totalUsage.totalTokens ?? 0),
198254
totalInputTokens: (priorMetadata?.totalInputTokens ?? 0) + (totalUsage.inputTokens ?? 0),
199255
totalOutputTokens: (priorMetadata?.totalOutputTokens ?? 0) + (totalUsage.outputTokens ?? 0),
200256
totalCacheReadTokens: (priorMetadata?.totalCacheReadTokens ?? 0) + (totalUsage.inputTokenDetails?.cacheReadTokens ?? 0),
201257
totalCacheWriteTokens: (priorMetadata?.totalCacheWriteTokens ?? 0) + (totalUsage.inputTokenDetails?.cacheWriteTokens ?? 0),
202258
totalResponseTimeMs: (priorMetadata?.totalResponseTimeMs ?? 0) + (new Date().getTime() - startTime.getTime()),
259+
// Concatenated (not summed) across approval-continuation
260+
// phases so earlier phases' steps are preserved in order.
261+
stepTokenUsage: [...(priorMetadata?.stepTokenUsage ?? []), ...stepTokenUsage],
203262
modelName,
204263
traceId,
205-
...metadata,
206264
}
207265
});
208266

@@ -430,6 +488,13 @@ const createAgentStream = async ({
430488
logger.warn(`Tool call repair failed for "${toolCall.toolName}": ${error.message}`);
431489
return null;
432490
},
491+
// Token usage collection deliberately does NOT happen here: the SDK
492+
// awaits this callback before starting the next step, so it must
493+
// stay cheap, and `toolResults` misses tool calls that never run
494+
// inside a step (approval-gated tools execute before the step loop)
495+
// as well as thrown tool errors (recorded as `tool-error` parts).
496+
// Both are instead derived post-stream in `createMessageStream`
497+
// from `steps` and `response.messages`.
433498
onStepFinish: ({ toolResults }) => {
434499
toolResults.forEach(({ output, dynamic }) => {
435500
if (dynamic || isServiceError(output)) {

packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx

Lines changed: 51 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -91,33 +91,57 @@ const ChatThreadListItemComponent = forwardRef<HTMLDivElement, ChatThreadListIte
9191
// should be visible to the user. By "steps", we mean parts that originated
9292
// from the same LLM invocation. By "visibile", we mean parts that have some
9393
// visual representation in the UI (e.g., text, reasoning, tool calls, etc.).
94-
const uiVisibleThinkingSteps = useMemo(() => {
95-
const steps = groupMessageIntoSteps(assistantMessage?.parts ?? []);
96-
97-
// Filter out the answerPart and empty steps
98-
return steps
99-
.map(
100-
(step) => step
101-
// First, filter out any parts that are not text
102-
.filter((part) => {
103-
if (part.type === 'text') {
104-
return !part.text.includes(ANSWER_TAG);
105-
}
106-
107-
return true;
108-
})
109-
.filter((part) => {
110-
// Only include text, reasoning, and tool parts
111-
return (
112-
part.type === 'text' ||
113-
part.type === 'reasoning' ||
114-
part.type.startsWith('tool-') ||
115-
part.type === 'dynamic-tool'
116-
)
117-
})
118-
)
94+
//
95+
// Each step is tagged with its stepIndex — the invocation's position in
96+
// the turn, which indexes into `metadata.stepTokenUsage`. Indices are
97+
// assigned by counting 'step-start' markers (one per invocation) BEFORE
98+
// any filtering, so dropping empty or answer-only steps below cannot
99+
// shift the indices of the steps that remain.
100+
const { uiVisibleThinkingSteps, answerStepIndex } = useMemo(() => {
101+
const groupedParts = groupMessageIntoSteps(assistantMessage?.parts ?? []);
102+
103+
// Parts written before the first step-start (e.g. data parts) don't
104+
// belong to any step; they get stepIndex -1 and never survive the
105+
// visibility filters below.
106+
let stepIndex = -1;
107+
let answerStepIndex: number | undefined = undefined;
108+
109+
const steps = groupedParts
110+
.map((stepParts) => {
111+
if (stepParts[0]?.type === 'step-start') {
112+
stepIndex++;
113+
}
114+
115+
if (stepParts.some((part) => part.type === 'text' && part.text.includes(ANSWER_TAG))) {
116+
answerStepIndex = stepIndex;
117+
}
118+
119+
return {
120+
stepIndex,
121+
parts: stepParts
122+
// First, filter out the answer text
123+
.filter((part) => {
124+
if (part.type === 'text') {
125+
return !part.text.includes(ANSWER_TAG);
126+
}
127+
128+
return true;
129+
})
130+
.filter((part) => {
131+
// Only include text, reasoning, and tool parts
132+
return (
133+
part.type === 'text' ||
134+
part.type === 'reasoning' ||
135+
part.type.startsWith('tool-') ||
136+
part.type === 'dynamic-tool'
137+
)
138+
}),
139+
};
140+
})
119141
// Then, filter out any steps that are empty
120-
.filter(step => step.length > 0);
142+
.filter((step) => step.parts.length > 0);
143+
144+
return { uiVisibleThinkingSteps: steps, answerStepIndex };
121145
}, [assistantMessage?.parts]);
122146

123147
// "thinking" is when the agent is generating output that is not the answer.
@@ -379,6 +403,7 @@ const ChatThreadListItemComponent = forwardRef<HTMLDivElement, ChatThreadListIte
379403
isNetworkActive={isNetworkActive}
380404
isAwaitingToolApproval={isAwaitingToolApproval}
381405
thinkingSteps={uiVisibleThinkingSteps}
406+
answerStepIndex={answerStepIndex}
382407
metadata={assistantMessage?.metadata}
383408
/>
384409

packages/web/src/ee/features/chat/components/chatThread/detailsCard.test.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ describe('DetailsCard', () => {
111111
isTurnInProgress={true}
112112
isNetworkActive={false}
113113
isAwaitingToolApproval={false}
114-
thinkingSteps={[[failedActivationPart]]}
114+
thinkingSteps={[{ stepIndex: 0, parts: [failedActivationPart] }]}
115115
/>
116116
</TooltipProvider>
117117
);

0 commit comments

Comments
 (0)