Skip to content

Commit d12b1ec

Browse files
author
amabito
committed
feat(budget): R5 -- limit_unit, budget_id, ModelPricing, unknown_model_behavior
Phase C: - C1: replace limit/limit_tokens dual-ceiling with limit + limit_unit (usd_cents|tokens) - C2: add budget_id:str='default' to BudgetEvaluatorConfig; same id shares bucket state, different id is fully isolated - C3: drop _config_key hash-based store key; registry now keyed by f'budget:{budget_id}' - C4: introduce ModelPricing(EvaluatorConfig) with input_per_1k/output_per_1k; pricing field is now dict[str,ModelPricing]; require pricing when any rule uses limit_unit='usd_cents' - C5: store contract redesign -- InMemoryBudgetStore owns bucket state only; rules are passed per call so same budget_id pools share buckets while each evaluator uses its own rules Phase D: - D1: add unknown_model_behavior:Literal['block','warn']='block' to BudgetEvaluatorConfig - D2: block/warn triggers only for cost-based rules with pricing configured and model absent; token-only rules are unaffected - D3: README rewrite with complete config example, scope/group_by, budget pools, pricing, dual-ceiling pattern, single-process-only caveat Tests: 100 passing (was 91 in R4)
1 parent cd473e8 commit d12b1ec

7 files changed

Lines changed: 664 additions & 243 deletions

File tree

Lines changed: 135 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,136 @@
1-
# Budget Evaluator
1+
# agent-control-evaluator-budget
22

3-
Cumulative LLM cost and token budget tracking for agent-control.
3+
Budget evaluator for agent-control that tracks cumulative LLM token and cost usage per scope and time window.
4+
5+
## Install
6+
7+
```bash
8+
pip install agent-control-evaluator-budget
9+
```
10+
11+
## Quickstart
12+
13+
```python
14+
from agent_control_evaluator_budget.budget import (
15+
BudgetEvaluatorConfig,
16+
BudgetLimitRule,
17+
ModelPricing,
18+
)
19+
20+
config = BudgetEvaluatorConfig(
21+
budget_id="support-daily",
22+
limits=[
23+
BudgetLimitRule(
24+
scope={"agent": "support"},
25+
group_by="user_id",
26+
window_seconds=86_400,
27+
limit=500,
28+
limit_unit="usd_cents",
29+
),
30+
BudgetLimitRule(
31+
scope={"agent": "support"},
32+
group_by="user_id",
33+
window_seconds=86_400,
34+
limit=50_000,
35+
limit_unit="tokens",
36+
),
37+
],
38+
pricing={
39+
"gpt-4.1-mini": ModelPricing(input_per_1k=0.04, output_per_1k=0.16),
40+
},
41+
model_path="model",
42+
metadata_paths={
43+
"agent": "metadata.agent",
44+
"user_id": "metadata.user_id",
45+
},
46+
unknown_model_behavior="block",
47+
)
48+
```
49+
50+
The evaluator reads token usage from standard fields such as `usage.input_tokens` and `usage.output_tokens`. Configure `token_path` only when your event shape uses a custom location.
51+
52+
## Scope and group_by
53+
54+
Each `BudgetLimitRule` has a static `scope` and an optional `group_by` field.
55+
56+
`scope` filters which events a rule applies to. A rule with `scope={"agent": "support"}` only applies when extracted metadata contains `agent="support"`. An empty scope is global.
57+
58+
`group_by` creates independent buckets per extracted metadata value. The common per-user pattern is:
59+
60+
```python
61+
BudgetLimitRule(
62+
scope={"agent": "support"},
63+
group_by="user_id",
64+
window_seconds=86_400,
65+
limit=500,
66+
limit_unit="usd_cents",
67+
)
68+
```
69+
70+
With `metadata_paths={"user_id": "metadata.user_id"}`, each user gets a separate daily budget inside the support scope.
71+
72+
## Budget pools
73+
74+
`budget_id` identifies the accumulated budget pool.
75+
76+
Evaluators with the same `budget_id` share accumulated spend and token totals across all evaluator instances. Each evaluator still evaluates using its own configured rules -- the shared state is the bucket (the rolling sum), not the rule set. Evaluators with different `budget_id` values are fully isolated.
77+
78+
Use stable names such as `support-daily`, `billing-global`, or `tenant-acme-monthly`. Avoid generating a new `budget_id` per request unless each request should have an isolated budget.
79+
80+
## Pricing
81+
82+
`ModelPricing` stores cost rates in cents per 1K tokens:
83+
84+
```python
85+
ModelPricing(input_per_1k=0.04, output_per_1k=0.16)
86+
```
87+
88+
`input_per_1k` is applied to input tokens. `output_per_1k` is applied to output tokens.
89+
90+
Pricing is required when any rule uses `limit_unit="usd_cents"`. Token-only rules can omit pricing. If an event uses a model that is not in the pricing table and a cost rule exists, `unknown_model_behavior="block"` fails closed. Use `"warn"` to log a warning and treat the cost as 0.
91+
92+
## Dual Ceiling Pattern
93+
94+
Use two evaluators when cost and token ceilings need independent control records or different `budget_id` pools:
95+
96+
```python
97+
cost_config = BudgetEvaluatorConfig(
98+
budget_id="support-cost-daily",
99+
limits=[
100+
BudgetLimitRule(
101+
scope={"agent": "support"},
102+
group_by="user_id",
103+
window_seconds=86_400,
104+
limit=500,
105+
limit_unit="usd_cents",
106+
)
107+
],
108+
pricing={
109+
"gpt-4.1-mini": ModelPricing(input_per_1k=0.04, output_per_1k=0.16),
110+
},
111+
model_path="model",
112+
metadata_paths={"agent": "metadata.agent", "user_id": "metadata.user_id"},
113+
)
114+
115+
token_config = BudgetEvaluatorConfig(
116+
budget_id="support-token-daily",
117+
limits=[
118+
BudgetLimitRule(
119+
scope={"agent": "support"},
120+
group_by="user_id",
121+
window_seconds=86_400,
122+
limit=50_000,
123+
limit_unit="tokens",
124+
)
125+
],
126+
metadata_paths={"agent": "metadata.agent", "user_id": "metadata.user_id"},
127+
)
128+
```
129+
130+
This pattern lets cost and token budgets reset, alert, and roll out independently. A single evaluator can also contain both rules when one shared pool and one control result are sufficient.
131+
132+
## Limitations
133+
134+
`InMemoryBudgetStore` is single-process only. State is lost on restart and is not shared across workers or pods.
135+
136+
Use a distributed store for production deployments that run multiple processes, multiple workers, or multiple pods.

evaluators/contrib/budget/src/agent_control_evaluator_budget/budget/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
"""Budget evaluator for per-agent LLM cost and token tracking."""
22

3-
from agent_control_evaluator_budget.budget.config import BudgetEvaluatorConfig
3+
from agent_control_evaluator_budget.budget.config import (
4+
BudgetEvaluatorConfig,
5+
BudgetLimitRule,
6+
ModelPricing,
7+
)
48
from agent_control_evaluator_budget.budget.evaluator import BudgetEvaluator
59
from agent_control_evaluator_budget.budget.memory_store import InMemoryBudgetStore
610
from agent_control_evaluator_budget.budget.store import BudgetSnapshot, BudgetStore
@@ -12,7 +16,9 @@
1216
__all__ = [
1317
"BudgetEvaluator",
1418
"BudgetEvaluatorConfig",
19+
"BudgetLimitRule",
1520
"BudgetSnapshot",
1621
"BudgetStore",
1722
"InMemoryBudgetStore",
23+
"ModelPricing",
1824
]

evaluators/contrib/budget/src/agent_control_evaluator_budget/budget/config.py

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
from __future__ import annotations
44

5-
from enum import Enum
5+
from typing import Literal
66

77
from agent_control_evaluators._base import EvaluatorConfig
88
from pydantic import Field, field_validator, model_validator
@@ -17,12 +17,11 @@
1717
WINDOW_MONTHLY = 2592000 # 30 days
1818

1919

20-
class Currency(str, Enum):
21-
"""Supported budget currencies."""
20+
class ModelPricing(EvaluatorConfig):
21+
"""Per-model token pricing in cents per 1K tokens."""
2222

23-
USD = "usd"
24-
EUR = "eur"
25-
TOKENS = "tokens"
23+
input_per_1k: float = 0.0
24+
output_per_1k: float = 0.0
2625

2726

2827
class BudgetLimitRule(EvaluatorConfig):
@@ -43,39 +42,24 @@ class BudgetLimitRule(EvaluatorConfig):
4342
each user gets their own budget. None = shared/global limit.
4443
window_seconds: Time window for accumulation in seconds.
4544
None = cumulative (no reset). See WINDOW_* constants.
46-
limit: Maximum spend in the window, in minor units (e.g. cents
47-
for USD). None = uncapped on this dimension.
48-
currency: Currency for the limit. Defaults to USD.
49-
limit_tokens: Maximum tokens in the window. None = uncapped.
45+
limit: Maximum usage in the window. Interpreted by limit_unit.
46+
limit_unit: Unit for limit. usd_cents checks spend; tokens checks
47+
input + output tokens.
5048
"""
5149

5250
scope: dict[str, str] = Field(default_factory=dict)
5351
group_by: str | None = None
5452
window_seconds: int | None = None
55-
limit: int | None = None
56-
currency: Currency = Currency.USD
57-
limit_tokens: int | None = None
58-
59-
@model_validator(mode="after")
60-
def at_least_one_limit(self) -> "BudgetLimitRule":
61-
if self.limit is None and self.limit_tokens is None:
62-
raise ValueError("At least one of limit or limit_tokens must be set")
63-
return self
53+
limit: int
54+
limit_unit: Literal["usd_cents", "tokens"] = "usd_cents"
6455

6556
@field_validator("limit")
6657
@classmethod
67-
def validate_limit(cls, v: int | None) -> int | None:
68-
if v is not None and v <= 0:
58+
def validate_limit(cls, v: int) -> int:
59+
if v <= 0:
6960
raise ValueError("limit must be a positive integer")
7061
return v
7162

72-
@field_validator("limit_tokens")
73-
@classmethod
74-
def validate_limit_tokens(cls, v: int | None) -> int | None:
75-
if v is not None and v <= 0:
76-
raise ValueError("limit_tokens must be positive")
77-
return v
78-
7963
@field_validator("window_seconds")
8064
@classmethod
8165
def validate_window_seconds(cls, v: int | None) -> int | None:
@@ -89,9 +73,13 @@ class BudgetEvaluatorConfig(EvaluatorConfig):
8973
9074
Attributes:
9175
limits: List of budget limit rules. Each is checked independently.
92-
pricing: Optional model pricing table. Maps model name to per-1K
93-
token rates. Used to derive cost in USD from token counts and
94-
model name.
76+
budget_id: Unique budget pool identifier. Same budget_id shares
77+
accumulated spend. Different budget_id is fully isolated.
78+
unknown_model_behavior: What to do when a model is not found in the
79+
pricing table and a cost-based rule exists. block=fail closed,
80+
warn=log warning and treat cost as 0.
81+
pricing: Optional model pricing table. Maps model name to ModelPricing.
82+
Used to derive cost in USD from token counts and model name.
9583
token_path: Dot-notation path to extract token usage from step
9684
data (e.g. "usage.total_tokens"). If None, looks for standard
9785
fields (input_tokens, output_tokens, total_tokens, usage).
@@ -101,7 +89,27 @@ class BudgetEvaluatorConfig(EvaluatorConfig):
10189
"""
10290

10391
limits: list[BudgetLimitRule] = Field(min_length=1)
104-
pricing: dict[str, dict[str, float]] | None = None
92+
budget_id: str = Field(
93+
default="default",
94+
description=(
95+
"Unique budget pool identifier. Same budget_id shares accumulated spend. "
96+
"Different budget_id is fully isolated."
97+
),
98+
)
99+
unknown_model_behavior: Literal["block", "warn"] = Field(
100+
default="block",
101+
description=(
102+
"What to do when a model is not found in the pricing table and a cost-based "
103+
"rule exists. block=fail closed, warn=log warning and treat cost as 0."
104+
),
105+
)
106+
pricing: dict[str, ModelPricing] | None = None
105107
token_path: str | None = None
106108
model_path: str | None = None
107109
metadata_paths: dict[str, str] = Field(default_factory=dict)
110+
111+
@model_validator(mode="after")
112+
def require_pricing_for_cost_rules(self) -> "BudgetEvaluatorConfig":
113+
if self.pricing is None and any(rule.limit_unit == "usd_cents" for rule in self.limits):
114+
raise ValueError('pricing is required when any rule uses limit_unit="usd_cents"')
115+
return self

0 commit comments

Comments
 (0)