The SDK surfaces usage_metadata on every ChatResponse — prompt_token_count, candidates_token_count, thinking_token_count, total_token_count. The hook system gives us AfterModelHook to inspect and BeforeModelHook to gate. SessionContext.state persists across turns.
All the building blocks for budget enforcement are there. But there's no built-in way to say "stop this agent after 500K tokens" or "kill the session if it loops."
Every team running /goal tasks or subagent trees ends up building this themselves. The patterns are always the same:
- Token ceiling — hard limit,
HookResult(allow=False) when breached
- Cost ceiling — dollar-based, same mechanism
- Velocity guard — sliding window anomaly detection (spike = probable loop)
- Loop detector — identical tool+args called N times in a row via
PostToolCallHook
- Reasoning guard —
thinking_token_count / total_token_count ratio sustained above 80% = model going in circles
Proposed API
Following the existing deny() / allow() / ask_user() pattern — functional primitives that compose:
from google.antigravity.hooks.budget import token_budget, cost_budget, velocity_guard
config = LocalAgentConfig(
hooks=[
token_budget(500_000),
cost_budget(5.00),
velocity_guard(window=5, threshold=2.5),
]
)
Each function returns a pair of hooks (one AfterModelHook to accumulate, one BeforeModelHook to gate). They share state via SessionContext.state under namespaced keys.
Reference implementation
I built a framework-agnostic version of these patterns as a zero-dependency Node.js library: token-budgets. Same concepts, validates the approach. Happy to port this to Python and open a PR if external contributions become accepted.
Why this matters
deny() controls what an agent can do. Budget hooks would control how much. Together they're the governance stack that makes autonomous agents safe for production.
The SDK surfaces
usage_metadataon everyChatResponse—prompt_token_count,candidates_token_count,thinking_token_count,total_token_count. The hook system gives usAfterModelHookto inspect andBeforeModelHookto gate.SessionContext.statepersists across turns.All the building blocks for budget enforcement are there. But there's no built-in way to say "stop this agent after 500K tokens" or "kill the session if it loops."
Every team running
/goaltasks or subagent trees ends up building this themselves. The patterns are always the same:HookResult(allow=False)when breachedPostToolCallHookthinking_token_count / total_token_countratio sustained above 80% = model going in circlesProposed API
Following the existing
deny()/allow()/ask_user()pattern — functional primitives that compose:Each function returns a pair of hooks (one
AfterModelHookto accumulate, oneBeforeModelHookto gate). They share state viaSessionContext.stateunder namespaced keys.Reference implementation
I built a framework-agnostic version of these patterns as a zero-dependency Node.js library: token-budgets. Same concepts, validates the approach. Happy to port this to Python and open a PR if external contributions become accepted.
Why this matters
deny()controls what an agent can do. Budget hooks would control how much. Together they're the governance stack that makes autonomous agents safe for production.