Skip to content

Commit 3317d14

Browse files
authored
feat: recognize provider cost data from usage.cost_details block (#439)
## Summary Some providers include detailed cost breakdowns directly in the response `usage` object rather than via SSE `: cost` comment lines. This PR adds support for recognizing and using that cost data when available. ### The new format Providers may return a `usage` block like: ```json "usage": { "prompt_tokens": 23, "total_tokens": 66, "completion_tokens": 43, "cost": 0.00017465, "cost_details": { "total_cost": 0.00017465, "input_cost": 0.00002415, "output_cost": 0.0001505, "cached_input_cost": 0, "cache_write_input_cost": 0, "upstream_inference_cost": 0.00017465, "request_cost": 0, "web_search_cost": 0, "data_storage_cost": 0.00000106 } } ``` When `cost_details` is present, we use the provider's actual per-bucket breakdown (`input_cost`, `output_cost`, `cached_input_cost`, `cache_write_input_cost`) directly instead of proportionally distributing a total. ## Changes - **`utils/usage-normalizer.ts`**: Add `extractUsageCostDetails()` — safely extracts `cost_details` from provider usage blocks; returns `null` when absent (providers that don't use this format are unaffected). Also updated `normalizeOpenAIChatUsage()` to extract `cache_write_tokens` from `prompt_tokens_details`. - **`utils/provider-cost.ts`**: Add `applyUsageCostDetails()` — applies per-bucket breakdown when available, falls back to proportional distribution otherwise. - **`services/inspectors/usage-logging.ts`**: Wire `cost_details` extraction into the streaming cost path (only applies if no SSE-reported cost was found). - **`services/response-handler.ts`**: Same for the non-streaming (unary) path. ## Key design decisions - **Fully optional/defensive**: `extractUsageCostDetails()` returns `null` when `usage.cost_details` doesn't exist — providers that don't use this format are completely unaffected - **SSE `: cost` comments take precedence**: The `!providerReportedCost` guard ensures `cost_details` only applies when no SSE-reported cost was found - **Per-bucket breakdown preferred**: When the provider gives explicit `input_cost`/`output_cost`/etc., we use those directly instead of proportional splitting ## Test plan - [x] `extractUsageCostDetails` — extracts from the new format, falls back to `usage.cost`/`usage.estimated_cost`, returns null for missing/invalid data - [x] `applyUsageCostDetails` — uses per-bucket breakdown, falls back to proportional, handles zero/null costs - [x] `normalizeOpenAIChatUsage` — extracts `cache_write_tokens` from `prompt_tokens_details` - [x] Precedence: SSE `: cost` comments > `cost_details` > calculated costs - [x] All 1367 existing tests pass
1 parent e7eb887 commit 3317d14

6 files changed

Lines changed: 629 additions & 5 deletions

File tree

packages/backend/src/services/inspectors/usage-logging.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ import {
1010
normalizeGeminiUsage,
1111
normalizeOpenAIChatUsage,
1212
normalizeOpenAIResponsesUsage,
13+
extractUsageCostDetails,
1314
} from '../../utils/usage-normalizer';
1415
import { estimateKwhUsed } from '../inference-energy';
15-
import { applyProviderReportedCost } from '../../utils/provider-cost';
16+
import { applyProviderReportedCost, applyUsageCostDetails } from '../../utils/provider-cost';
1617
import { DEFAULT_MODEL, DEFAULT_GPU_PARAMS } from '@plexus/shared';
1718
import { recordQuotaUsage } from '../quota/quota-middleware';
1819

@@ -149,6 +150,15 @@ export class UsageInspector extends PassThrough {
149150
applyProviderReportedCost(this.usageRecord, reconstructed.providerReportedCost);
150151
}
151152

153+
// Override with provider-reported cost from usage.cost_details if available
154+
// Some providers include detailed cost breakdowns in the usage block
155+
if (!this.usageRecord.providerReportedCost && reconstructed?.usage) {
156+
const usageCostDetails = extractUsageCostDetails(reconstructed.usage);
157+
if (usageCostDetails) {
158+
applyUsageCostDetails(this.usageRecord, usageCostDetails);
159+
}
160+
}
161+
152162
// Use provider-reported energy if available, otherwise estimate
153163
// Some providers emit `: energy {"energy_kwh": ...}` as SSE comments
154164
if (reconstructed?.providerReportedEnergy?.energy_kwh != null) {

packages/backend/src/services/response-handler.ts

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ import { DebugLoggingInspector, UsageInspector } from './inspectors';
1010
import { Readable } from 'stream';
1111
import { DebugManager } from './debug-manager';
1212
import { estimateKwhUsed } from './inference-energy';
13-
import { applyProviderReportedCost } from '../utils/provider-cost';
13+
import { applyProviderReportedCost, applyUsageCostDetails } from '../utils/provider-cost';
14+
import { extractUsageCostDetails } from '../utils/usage-normalizer';
1415
import { StallInspector, type StallConfig } from './inspectors/stall-inspector';
1516
import { DEFAULT_GPU_PARAMS, DEFAULT_MODEL } from '@plexus/shared';
1617
import type { GpuParams } from '@plexus/shared';
@@ -502,6 +503,14 @@ async function finalizeUsage(
502503
if (reconstructed?.providerReportedCost) {
503504
applyProviderReportedCost(usageRecord, reconstructed.providerReportedCost);
504505
}
506+
507+
// Also check for cost_details in the usage block (some providers embed costs there)
508+
if (!usageRecord.providerReportedCost && reconstructed?.usage) {
509+
const usageCostDetails = extractUsageCostDetails(reconstructed.usage);
510+
if (usageCostDetails) {
511+
applyUsageCostDetails(usageRecord, usageCostDetails);
512+
}
513+
}
505514
usageRecord.responseStatus = 'success';
506515
usageRecord.durationMs = Date.now() - startTime;
507516

0 commit comments

Comments
 (0)