Skip to content

Commit 39775f0

Browse files
authored
Merge pull request #8 from getlago/feature/pricing-mode
Add price mode: emit computed dollar cost instead of token counts
2 parents a9ff2fb + 8934ca8 commit 39775f0

16 files changed

Lines changed: 1526 additions & 92 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ All notable changes to this project will be documented here. Format follows [Kee
44

55
## [Unreleased]
66

7+
### Added
8+
- **Price mode — emit computed dollar cost instead of token counts.** New `pricing_mode` config (`"tokens"` default | `"price"`), plus `markup`, `cost_metric_code` (default `llm_cost`), `pricing_ttl_seconds`, and `bedrock_default_region`. In price mode the SDK emits one `llm_cost` event per call carrying a top-level `precise_total_amount_cents` (cost in cents, after markup) for Lago's **dynamic charge model**, with a full per-field breakdown in `properties` (value in USD, base, markup, source, per-field tokens/unit_price/cost). Live unit prices come from public, no-auth sources: OpenRouter (`/api/v1/models`) for native anthropic/openai/mistral/gemini, and the AWS Bedrock Price List **Bulk** API for Bedrock. Prices are fetched + cached on the background queue thread (never blocking the customer's call); a missing price falls back to token events and calls `on_error` (never silently under-bills). Mode and markup are overridable per-call via `extra_lago={"mode": "price", "markup": 1.5}`. Money is computed with `Decimal` floored to 12 dp, identical to the JS implementation (cross-repo golden fixture). New `pricing.py` module + `PricingProvider`; default `pricing_mode="tokens"` keeps existing behavior unchanged.
9+
710
### Fixed
811
- **Anthropic `messages.create(stream=True)` under-billed input tokens.** The stream wrapper read only top-level `usage`, which on a basic stream appears only on `message_delta` as `{output_tokens: N}` — the authoritative `input_tokens` / `cache_*` counts arrive nested under `message.usage` on the `message_start` event and were ignored, so input billed 0. The wrapper now merges usage from `message_start` (input/cache) and `message_delta` (cumulative output). Sync + async paths; regression tests use the realistic wire shape (delta carries no input echo).
912
- **Legacy `google-generativeai` SDK silently emitted no events.** The detector matched both the new `google-genai` and the deprecated `google-generativeai` SDKs, but the wrapper only instruments the unified `Client.models` / `.aio` surface — a legacy `GenerativeModel` routed through and wrapped nothing. `wrap()` now rejects legacy clients with a clear pointer to migrate to `google-genai`.

README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,57 @@ For both OpenAI and Gemini, `cache_read`, `audio_input`, and `image_input` are *
189189

190190
OpenAI's Predicted Outputs tokens (`accepted_prediction_tokens`, `rejected_prediction_tokens`) are not surfaced — see the OpenAI adapter docstring for details on this intentional gap.
191191

192+
## Pricing mode — send dollar cost instead of tokens
193+
194+
By default the SDK emits **token counts** (`pricing_mode="tokens"`). You can instead have it
195+
compute and emit the **dollar cost** of each call: `Σ(unit_price_per_token × tokens) × markup`.
196+
197+
```python
198+
from lago_agent_sdk import LagoSDK, LagoConfig
199+
200+
sdk = LagoSDK(api_key="...", config=LagoConfig(
201+
api_key="...",
202+
default_subscription_id="sub_123",
203+
pricing_mode="price", # "tokens" (default) | "price"
204+
markup=1.2, # optional cost multiplier (1.2 = +20%)
205+
))
206+
client = sdk.wrap(anthropic_client)
207+
# ... use the client normally ...
208+
```
209+
210+
In **price mode** the SDK emits **one event per call** with code `llm_cost`. The event carries a
211+
top-level `precise_total_amount_cents` (the total cost in cents, after markup) for Lago's
212+
**dynamic charge model**, plus a breakdown in `properties`: `unit` (total tokens), `value` (USD
213+
total), `base_cost` (pre-markup), `markup`, `price_source`, and per-field `*_tokens` /
214+
`*_unit_price` / `*_cost`. Set up in Lago a `sum`-aggregation billable metric `llm_cost` on
215+
`field_name: "unit"` and a **dynamic** charge on it — Lago sums each event's
216+
`precise_total_amount_cents` into a single fee (`unit` is the displayed usage quantity). See
217+
`testing/lago_setup_pricing_plan.py` for a script that creates this.
218+
219+
Per-call override via `extra_lago` (mode and markup, in addition to subscription/dimensions):
220+
221+
```python
222+
client.messages.create(model="claude-...", messages=[...],
223+
extra_lago={"mode": "price", "markup": 1.5})
224+
```
225+
226+
**Live, public pricing sources (no API keys):**
227+
- **OpenRouter** (`/api/v1/models`) for native `anthropic` / `openai` / `mistral` / `gemini`
228+
clients — USD per token.
229+
- **AWS Bedrock Price List Bulk API** (public) for Bedrock — parsed per region.
230+
231+
Prices are fetched and cached in the background (TTL `pricing_ttl_seconds`, default 1h); the
232+
refresh runs on the SDK's background thread, so **your LLM call is never blocked on pricing**.
233+
234+
**Fallback (never under-bill):** if a price is unavailable (table not warm on the first call,
235+
or the model isn't found in the source), the SDK **falls back to emitting token-count events**
236+
and calls `on_error` so it's visible — it never silently drops the usage.
237+
238+
**Bedrock note:** AWS's public bulk data lists many models (Titan, Llama, Mistral, Cohere, and
239+
older Claude) but, at time of writing, **not the current Claude 3.5/3.7/4 models**. Bedrock
240+
calls for models absent from AWS's data fall back to token events. Native Anthropic clients are
241+
priced via OpenRouter and unaffected.
242+
192243
## Error policy
193244

194245
The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.

src/lago_agent_sdk/__init__.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
"""Lago Agent SDK — Python."""
22

33
from .canonical import CanonicalUsage
4-
from .config import DEFAULT_METRIC_CODES, LagoConfig
4+
from .config import DEFAULT_COST_METRIC_CODE, DEFAULT_METRIC_CODES, LagoConfig
55
from .exceptions import (
66
LagoApiError,
77
LagoConfigError,
88
LagoSDKError,
9+
PricingUnavailableError,
910
UnknownClientError,
1011
)
12+
from .pricing import HttpPricingFetcher, ModelPrice, PricingProvider, compute_cost
1113
from .sdk import LagoSDK
1214

1315
__all__ = [
@@ -17,7 +19,13 @@
1719
"LagoApiError",
1820
"LagoConfigError",
1921
"LagoSDKError",
22+
"PricingUnavailableError",
2023
"UnknownClientError",
2124
"DEFAULT_METRIC_CODES",
25+
"DEFAULT_COST_METRIC_CODE",
26+
"PricingProvider",
27+
"HttpPricingFetcher",
28+
"ModelPrice",
29+
"compute_cost",
2230
]
2331
__version__ = "0.1.0"

src/lago_agent_sdk/config.py

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from collections.abc import Callable
66
from dataclasses import dataclass, field
7+
from typing import Any, Literal
78

89
DEFAULT_METRIC_CODES: dict[str, str] = {
910
"input": "llm_input_tokens",
@@ -19,6 +20,13 @@
1920
"audio_output": "llm_audio_output_tokens",
2021
}
2122

23+
# Metric code for the single per-call dollar-cost event emitted in price mode.
24+
DEFAULT_COST_METRIC_CODE = "llm_cost"
25+
26+
# Pricing mode: emit raw token counts (default, backward-compatible) or a single
27+
# computed dollar-cost event per call.
28+
PricingMode = Literal["tokens", "price"]
29+
2230

2331
def _mask_api_key(api_key: str) -> str:
2432
"""Render an api key safe for logs/repr: keeps a 4-char tail for debuggability."""
@@ -42,6 +50,21 @@ class LagoConfig:
4250
max_retry_seconds: float = 60.0
4351
on_error: Callable[[Exception, str], None] | None = None
4452

53+
# --- pricing (price mode) ---
54+
# Global default mode. "tokens" preserves the existing behavior exactly.
55+
pricing_mode: PricingMode = "tokens"
56+
# Multiplier applied to the computed cost (1.0 = no markup, 1.2 = +20%).
57+
markup: float = 1.0
58+
# Metric code for the single dollar-cost event emitted in price mode.
59+
cost_metric_code: str = DEFAULT_COST_METRIC_CODE
60+
# How long a fetched pricing table stays fresh before a background refresh.
61+
pricing_ttl_seconds: float = 3600.0
62+
# Region used for Bedrock pricing when the model id carries no region prefix.
63+
bedrock_default_region: str = "us-east-1"
64+
# Optional injected PricingProvider (or a stub) — primarily for tests/overrides.
65+
# Typed Any to avoid a config→pricing import cycle.
66+
pricing_provider: Any | None = field(default=None, repr=False)
67+
4568
def __repr__(self) -> str:
4669
return (
4770
f"LagoConfig(api_key={_mask_api_key(self.api_key)!r}, "
@@ -51,5 +74,10 @@ def __repr__(self) -> str:
5174
f"max_batch_size={self.max_batch_size}, "
5275
f"max_buffer_size={self.max_buffer_size}, "
5376
f"request_timeout_seconds={self.request_timeout_seconds}, "
54-
f"max_retry_seconds={self.max_retry_seconds})"
77+
f"max_retry_seconds={self.max_retry_seconds}, "
78+
f"pricing_mode={self.pricing_mode!r}, "
79+
f"markup={self.markup}, "
80+
f"cost_metric_code={self.cost_metric_code!r}, "
81+
f"pricing_ttl_seconds={self.pricing_ttl_seconds}, "
82+
f"bedrock_default_region={self.bedrock_default_region!r})"
5583
)

src/lago_agent_sdk/exceptions.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,14 @@ def __init__(self, status: int, body: str) -> None:
2222

2323
class UnknownClientError(LagoConfigError):
2424
"""`wrap()` received a client kind the SDK does not recognize."""
25+
26+
27+
class PricingUnavailableError(LagoSDKError):
28+
"""Price mode could not resolve a price (table not warm yet, or model not
29+
matched). Surfaced via on_error; the SDK falls back to emitting token events."""
30+
31+
def __init__(self, provider: str, model: str, api: str) -> None:
32+
super().__init__(f"no price for provider={provider!r} model={model!r} api={api!r}")
33+
self.provider = provider
34+
self.model = model
35+
self.api = api

0 commit comments

Comments
 (0)