@@ -62,3 +62,59 @@ Summaries are produced deterministically:
6262- ** dict** → key list + per-value type/value
6363- ** string** → truncated to 500 chars
6464- ** other** → repr() truncated to 200 chars
65+
66+ ## Cross-invocation budgets
67+
68+ The per-invocation ` Budgets ` above cap a single Frame. A separate
69+ ` BudgetManager ` tracks cumulative token usage * across* invocations within a
70+ session. It is optional — if you don't attach one, kernel behavior is
71+ unchanged.
72+
73+ ``` python
74+ from agent_kernel import BudgetManager, Kernel
75+
76+ manager = BudgetManager(total_budget = 100_000 )
77+ kernel = Kernel(registry, budget_manager = manager)
78+ ```
79+
80+ Per ` invoke() ` the kernel:
81+
82+ 1 . Reserves a slice of the remaining budget (default 4,000 tokens). If the
83+ budget is empty, ` BudgetExhausted ` is raised before the driver runs.
84+ 2 . Consults ` manager.suggested_mode(requested) ` to escalate the requested
85+ ` response_mode ` to a more aggressive tier as the remaining budget shrinks.
86+ 3 . After the firewall produces a Frame, counts the actual tokens in the
87+ LLM-facing payload and reconciles them against the reservation.
88+
89+ Escalation table:
90+
91+ | Budget remaining | Suggested mode (effective ` response_mode ` ) |
92+ | -----------------:| ------------------------------------------------|
93+ | > 50% | Caller's requested mode (no change) |
94+ | 20% – 50% | ` table ` (when caller requested ` raw ` ) |
95+ | 5% – 20% (≥ 5%) | ` summary ` (floor — never * relaxes* to ` table ` ) |
96+ | < 5% | ` handle_only ` |
97+
98+ Boundaries land in the more-conservative tier — exactly 50% remaining
99+ downgrades ` raw ` to ` table ` , exactly 20% floors at ` summary ` , and only when
100+ remaining drops * below* 5% does ` handle_only ` take over.
101+
102+ ` Kernel.invoke(..., dry_run=True) ` mirrors the escalation and reports
103+ ` budget_remaining ` in the returned ` DryRunResult ` , so callers can preview
104+ what their next live invocation would actually return.
105+
106+ Plug a different token counter (for example, a ` tiktoken ` -based one) via the
107+ ` TokenCounter ` protocol:
108+
109+ ``` python
110+ import tiktoken # pip install weaver-kernel[tiktoken]
111+ enc = tiktoken.encoding_for_model(" gpt-4o" )
112+
113+ def tiktoken_counter (value ):
114+ return len (enc.encode(str (value)))
115+
116+ manager = BudgetManager(total_budget = 128_000 , token_counter = tiktoken_counter)
117+ ```
118+
119+ The default counter (` default_token_counter ` ) is a character-based
120+ ` len(json.dumps(value)) // 4 ` approximation with no extra dependencies.
0 commit comments