dgenio
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/context_firewall.md‎
Lines changed: 56 additions & 0 deletions b/‎docs/context_firewall.md‎
Lines changed: 56 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/agent_kernel/__init__.py‎
Lines changed: 12 additions & 2 deletions b/‎src/agent_kernel/__init__.py‎
Lines changed: 12 additions & 2 deletions
diff --git a/‎src/agent_kernel/errors.py‎
Lines changed: 21 additions & 0 deletions b/‎src/agent_kernel/errors.py‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎src/agent_kernel/firewall/__init__.py‎
Lines changed: 11 additions & 1 deletion b/‎src/agent_kernel/firewall/__init__.py‎
Lines changed: 11 additions & 1 deletion
@@ -8,6 +8,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- Cross-invocation context budget manager (`BudgetManager`) tracks cumulative token usage across
+  multiple `Kernel.invoke()` calls within a session. When attached to a `Kernel` via the new
+  `budget_manager` keyword argument, the kernel reserves a budget slice before each invocation
+  and reconciles actual frame-payload usage afterwards. As the remaining budget shrinks the
+  requested `response_mode` is auto-escalated to a more aggressive tier (> 50% remaining keeps
+  the caller's mode; 20–50% downgrades `raw` to `table`; 5–20% floors at `summary`; < 5% forces
+  `handle_only`). `Kernel.invoke(..., dry_run=True)` now also reports `budget_remaining` and the
+  escalated `response_mode` when a manager is configured. The `BudgetManager` is optional and
+  off by default — existing kernels are unchanged. (#44)
+- `TokenCounter` protocol and `default_token_counter` (character-based `len(json.dumps(...))//4`
+  approximation) provide pluggable token counting without runtime dependencies. A new optional
+  `[tiktoken]` extra is reserved for callers that want to plug in `tiktoken`-based counting.
+- `BudgetExhausted(AgentKernelError)` raised by `BudgetManager.allocate()` (and by
+  `Kernel.invoke()` before driver execution) when the cumulative session budget is fully spent.
+- `BudgetConfigError(AgentKernelError)` raised by `BudgetManager` for invalid configuration or
+  validation failures (non-positive budgets, negative allocate/record/release amounts), replacing
+  bare `ValueError` so callers can catch budget mistakes via the `AgentKernelError` hierarchy
+  per `AGENTS.md` ("never raise bare ValueError to callers").
+- New public exports: `BudgetManager`, `BudgetExhausted`, `BudgetConfigError`, `TokenCounter`,
+  `default_token_counter`, and `Kernel.budget` accessor property.
 - LLM tool-format adapters and middleware (`agent_kernel.adapters`): `OpenAIMiddleware` (OpenAI
   Responses API + Chat Completions, auto-detected on input) and `AnthropicMiddleware` (Anthropic
   Messages with `cache_control` support). Both translate `Capability` objects to vendor tool
 
@@ -62,3 +62,59 @@ Summaries are produced deterministically:
 - **dict** → key list + per-value type/value
 - **string** → truncated to 500 chars
 - **other** → repr() truncated to 200 chars
+
+## Cross-invocation budgets
+
+The per-invocation `Budgets` above cap a single Frame. A separate
+`BudgetManager` tracks cumulative token usage *across* invocations within a
+session. It is optional — if you don't attach one, kernel behavior is
+unchanged.
+
+```python
+from agent_kernel import BudgetManager, Kernel
+
+manager = BudgetManager(total_budget=100_000)
+kernel = Kernel(registry, budget_manager=manager)
+```
+
+Per `invoke()` the kernel:
+
+1. Reserves a slice of the remaining budget (default 4,000 tokens). If the
+   budget is empty, `BudgetExhausted` is raised before the driver runs.
+2. Consults `manager.suggested_mode(requested)` to escalate the requested
+   `response_mode` to a more aggressive tier as the remaining budget shrinks.
+3. After the firewall produces a Frame, counts the actual tokens in the
+   LLM-facing payload and reconciles them against the reservation.
+
+Escalation table:
+
+| Budget remaining | Suggested mode (effective `response_mode`)     |
+|-----------------:|------------------------------------------------|
+| > 50%            | Caller's requested mode (no change)            |
+| 20% – 50%        | `table` (when caller requested `raw`)          |
+| 5% – 20% (≥ 5%)  | `summary` (floor — never *relaxes* to `table`) |
+| < 5%             | `handle_only`                                  |
+
+Boundaries land in the more-conservative tier — exactly 50% remaining
+downgrades `raw` to `table`, exactly 20% floors at `summary`, and only when
+remaining drops *below* 5% does `handle_only` take over.
+
+`Kernel.invoke(..., dry_run=True)` mirrors the escalation and reports
+`budget_remaining` in the returned `DryRunResult`, so callers can preview
+what their next live invocation would actually return.
+
+Plug a different token counter (for example, a `tiktoken`-based one) via the
+`TokenCounter` protocol:
+
+```python
+import tiktoken                         # pip install weaver-kernel[tiktoken]
+enc = tiktoken.encoding_for_model("gpt-4o")
+
+def tiktoken_counter(value):
+    return len(enc.encode(str(value)))
+
+manager = BudgetManager(total_budget=128_000, token_counter=tiktoken_counter)
+```
+
+The default counter (`default_token_counter`) is a character-based
+`len(json.dumps(value)) // 4` approximation with no extra dependencies.
@@ -52,6 +52,7 @@ policy = [
     "pyyaml>=6.0",
     "tomli>=2.0; python_version<'3.11'",
 ]
+tiktoken = ["tiktoken>=0.6"]
 
 [tool.hatch.build.targets.wheel]
 packages = ["src/agent_kernel"]
 
@@ -19,7 +19,7 @@
 
 Firewall::
 
-    from agent_kernel import Firewall, Budgets
+    from agent_kernel import Firewall, Budgets, BudgetManager
 
 Handles & traces::
 
@@ -35,6 +35,7 @@
         AgentKernelError,
         TokenExpired, TokenInvalid, TokenScopeError,
         PolicyDenied, PolicyConfigError, DriverError, FirewallError,
+        BudgetExhausted, BudgetConfigError,
         CapabilityNotFound, HandleNotFound, HandleExpired,
     )
 """
@@ -48,6 +49,8 @@
 from .errors import (
     AdapterParseError,
     AgentKernelError,
+    BudgetConfigError,
+    BudgetExhausted,
     CapabilityAlreadyRegistered,
     CapabilityNotFound,
     DriverError,
@@ -61,7 +64,9 @@
     TokenRevoked,
     TokenScopeError,
 )
+from .firewall.budget_manager import BudgetManager
 from .firewall.budgets import Budgets
+from .firewall.token_counting import TokenCounter, default_token_counter
 from .firewall.transform import Firewall
 from .handles import HandleStore
 from .kernel import Kernel
@@ -125,6 +130,8 @@
     # errors
     "AdapterParseError",
     "AgentKernelError",
+    "BudgetConfigError",
+    "BudgetExhausted",
     "CapabilityAlreadyRegistered",
     "CapabilityNotFound",
     "DriverError",
@@ -156,8 +163,11 @@
     "MCPDriver",
     "make_billing_driver",
     # firewall
-    "Firewall",
+    "BudgetManager",
     "Budgets",
+    "Firewall",
+    "TokenCounter",
+    "default_token_counter",
     # stores
     "HandleStore",
     "TraceStore",
 
@@ -49,6 +49,27 @@ class FirewallError(AgentKernelError):
     """Raised when the context firewall cannot transform a raw result."""
 
 
+class BudgetExhausted(AgentKernelError):
+    """Raised when a :class:`~agent_kernel.firewall.budget_manager.BudgetManager` has
+    no remaining cross-invocation context budget.
+
+    Distinct from :class:`FirewallError`: this error fires *before* the
+    firewall transforms data, signalling that the caller has consumed the
+    entire session-level context budget. The current invocation never runs
+    the driver.
+    """
+
+
+class BudgetConfigError(AgentKernelError):
+    """Raised when a :class:`~agent_kernel.firewall.budget_manager.BudgetManager` is
+    constructed with invalid parameters, or asked to allocate/record/release
+    a negative amount.
+
+    Used in place of bare :class:`ValueError` so callers can catch budget
+    configuration mistakes without swallowing unrelated stdlib errors.
+    """
+
+
 # ── Adapter errors ────────────────────────────────────────────────────────────
 
 
 
@@ -1,8 +1,18 @@
 """Firewall sub-package exports."""
 
+from .budget_manager import BudgetManager
 from .budgets import Budgets
 from .redaction import redact
 from .summarize import summarize
+from .token_counting import TokenCounter, default_token_counter
 from .transform import Firewall
 
-__all__ = ["Budgets", "Firewall", "redact", "summarize"]
+__all__ = [
+    "BudgetManager",
+    "Budgets",
+    "Firewall",
+    "TokenCounter",
+    "default_token_counter",
+    "redact",
+    "summarize",
+]
Original file line number	Diff line number	Diff line change
`@@ -52,6 +52,7 @@ policy = [`
`52`	`52`	`"pyyaml>=6.0",`
`53`	`53`	`"tomli>=2.0; python_version<'3.11'",`
`54`	`54`	`]`
	`55`	`+tiktoken = ["tiktoken>=0.6"]`
`55`	`56`
`56`	`57`	`[tool.hatch.build.targets.wheel]`
`57`	`58`	`packages = ["src/agent_kernel"]`