You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add price mode: emit computed dollar cost instead of token counts
New optional pricing_mode ("tokens" default | "price"). In price mode the SDK
emits one llm_cost event per call whose value is Σ(unit_price × tokens) × markup,
with a full per-field breakdown (tokens, unit_price, cost, base_cost, markup,
price_source) in properties.
Pricing sources (public, no API keys):
- OpenRouter /api/v1/models for native anthropic/openai/mistral/gemini (USD/token)
- AWS Bedrock Price List Bulk API (per-region offer files) for Bedrock
Design:
- pricing.py: PricingProvider with a TTL cache + injectable fetcher; lookup()
is pure in-memory and never blocks the call; maybe_refresh() does the HTTP on
the queue's background thread. Fork-safe via a PID self-heal (no
register_at_fork — that tripped macOS's objc fork-safety abort).
- Conservative, vendor-gated model matching; Bedrock parser keys on
inferenceType and rejects priority/flex/batch tiers, scales per-1K units.
- Money in Decimal floored to 12 dp; identical output to the JS BigInt impl
(locked by a cross-repo golden fixture).
- Fallback: unavailable price -> emit token events + on_error (never under-bill).
- mode + markup overridable per-call via extra_lago; global via LagoConfig.
Default mode is "tokens" -> zero behavior change. New config: pricing_mode,
markup, cost_metric_code, pricing_ttl_seconds, bedrock_default_region,
pricing_provider. 28 new pricing unit tests + env-gated live test.
Gate: ruff + format + mypy clean; 346 unit tests; coverage 88.27%.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,9 @@ All notable changes to this project will be documented here. Format follows [Kee
4
4
5
5
## [Unreleased]
6
6
7
+
### Added
8
+
- **Price mode — emit computed dollar cost instead of token counts.** New `pricing_mode` config (`"tokens"` default | `"price"`), plus `markup`, `cost_metric_code` (default `llm_cost`), `pricing_ttl_seconds`, and `bedrock_default_region`. In price mode the SDK emits one `llm_cost` event per call carrying a top-level `precise_total_amount_cents` (cost in cents, after markup) for Lago's **dynamic charge model**, with a full per-field breakdown in `properties` (value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (`/api/v1/models`) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List **Bulk** API for Bedrock. Prices are fetched + cached on the background queue thread (never blocking the customer's call); a missing price falls back to token events and calls `on_error` (never silently under-bills). Mode and markup are overridable per-call via `extra_lago={"mode": "price", "markup": 1.5}`. Money is computed with `Decimal` floored to 12 dp, identical to the JS implementation (cross-repo golden fixture). New `pricing.py` module + `PricingProvider`; default `pricing_mode="tokens"` keeps existing behavior unchanged.
9
+
7
10
### Fixed
8
11
-**Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
9
12
-**Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.
Copy file name to clipboardExpand all lines: README.md
+51Lines changed: 51 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -189,6 +189,57 @@ For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are *
189
189
190
190
OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
191
191
192
+
## Pricing mode — send dollar cost instead of tokens
193
+
194
+
By default the SDK emits **token counts** (`pricing_mode="tokens"`). You can instead have it
195
+
compute and emit the **dollar cost** of each call: `Σ(unit_price_per_token × tokens) × markup`.
-**AWS Bedrock Price List Bulk API** (public) for Bedrock — parsed per region.
230
+
231
+
Prices are fetched and cached in the background (TTL `pricing_ttl_seconds`, default 1h); the
232
+
refresh runs on the SDK's background thread, so **your LLM call is never blocked on pricing**.
233
+
234
+
**Fallback (never under-bill):** if a price is unavailable (table not warm on the first call,
235
+
or the model isn't found in the source), the SDK **falls back to emitting token-count events**
236
+
and calls `on_error` so it's visible — it never silently drops the usage.
237
+
238
+
**Bedrock note:** AWS's public bulk data lists many models (Titan, Llama, Mistral, Cohere, and
239
+
older Claude) but, at time of writing, **not the current Claude 3.5/3.7/4 models**. Bedrock
240
+
calls for models absent from AWS's data fall back to token events. Native Anthropic clients are
241
+
priced via OpenRouter and unaffected.
242
+
192
243
## Error policy
193
244
194
245
The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.
0 commit comments