Skip to content

Commit 8ad5bdd

Browse files
committed
agent status/handoff note
1 parent c4e40a9 commit 8ad5bdd

1 file changed

Lines changed: 371 additions & 0 deletions

File tree

Lines changed: 371 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,371 @@
1+
# Batched Private Kernels Handoff
2+
3+
## Current Branch
4+
5+
Branch: `lde/n1-apps`
6+
7+
Base observed locally: `merge-train/barretenberg` at `f2fd2bfbcc`
8+
9+
Commits on top of the base:
10+
11+
- `f172996cf5 additional test`
12+
- `ab8c0cfdc9 PoC N=3 test suite`
13+
- `cf50be9ea1 init_3 prototype`
14+
- `46529d485a inner_3`
15+
- `fd2b57bed1 share logic`
16+
- `4771dd8ee2 tests showing equivalence of *_3 kernels with old equivalents`
17+
- `ed18d04d03 more failure tests`
18+
- `c4e40a9c35 inner_6 PoC - no surprises`
19+
20+
The shared next-call execution cleanup, `_3` equivalence tests, extra `_3` negative tests, and explicit `inner_6`
21+
compile/profile spike are all committed on the branch. This handoff doc is currently untracked.
22+
23+
## How We Got Here
24+
25+
The initial design direction was to make the first app call in the init kernel mirror the later app calls more closely:
26+
construct an output unconstrained, then validate that constructed output with constrained logic. The goal was to reduce
27+
the conceptual difference between init slot 0 and subsequent inner-call slots.
28+
29+
That path was tried and then discarded. It increased gate counts and did not materially simplify the implementation.
30+
The asymmetry in init is real: slot 0 establishes transaction-wide state, handles protocol nullifier injection, and
31+
sets init-only fields. Treating it as just another app-slot transition obscured more than it helped.
32+
33+
After dropping that path, the work moved to direct fixed-width prototypes:
34+
35+
1. Build tests around init-kernel behavior with multiple apps.
36+
2. Extend the tests from two apps to three apps.
37+
3. Add a concrete `private_kernel_init_3` prototype.
38+
4. Add a concrete `private_kernel_inner_3` prototype.
39+
5. Notice that the post-slot-0 transition logic is identical for `init_3` slots 1 and 2 and all `inner_3` slots.
40+
6. Extract that common transition into a shared helper.
41+
42+
## What The Branch Adds
43+
44+
### Test Coverage
45+
46+
`f172996cf5` adds an inner output-composition test:
47+
48+
- `expiration_timestamp_pick_contract_update_horizon`
49+
50+
This pins the rule that expiration timestamp reduction includes the contract update horizon derived from the anchor
51+
block timestamp plus `DEFAULT_UPDATE_DELAY - 1`.
52+
53+
`ab8c0cfdc9` adds `private_kernel_batch_spike` tests. These exercise fixed three-call behavior without changing PXE
54+
scheduling:
55+
56+
- `batch_3_accumulates_side_effects_across_slots`
57+
- `batch_3_linear_chain_consumes_all_calls`
58+
- `batch_3_linear_chain_matches_sequential_kernels`
59+
- `batch_3_depth_first_child_keeps_sibling_on_stack`
60+
- `batch_3_depth_first_with_sibling_matches_sequential_kernels`
61+
- `batch_3_second_call_must_match_first_call_stack_fails`
62+
- `batch_3_third_call_must_match_second_call_stack_fails`
63+
- `batch_3_fee_payer_conflict_fails`
64+
- `batch_3_public_teardown_conflict_fails`
65+
- `batch_3_min_revertible_side_effect_counter_conflict_fails`
66+
- `batch_3_static_call_requires_static_nested_private_call_fails`
67+
- `batch_3_static_call_restrictions_apply_to_next_slot_fails`
68+
- `inner_3_accumulates_side_effects_after_previous_kernel`
69+
- `inner_3_with_previous_side_effects_matches_sequential_kernels`
70+
- `inner_3_linear_chain_consumes_all_calls`
71+
- `inner_3_linear_chain_matches_sequential_kernels`
72+
73+
The tests cover the main properties that first make batching interesting: accumulated side effects across slots,
74+
slot-to-slot private-call-stack chaining, depth-first ordering with a sibling left on the stack, and an intra-batch
75+
set-once aggregate conflicts.
76+
77+
The negative tests now pin the main cross-slot constraints for the fixed `N = 3` prototype:
78+
79+
- slot 1 must consume the request produced or exposed by slot 0;
80+
- slot 2 must consume the request produced or exposed by slot 1;
81+
- fee payer cannot be set twice in one batch;
82+
- public teardown request cannot be set twice in one batch;
83+
- non-zero `min_revertible_side_effect_counter` cannot be set twice in one batch;
84+
- static calls can only create static nested private calls;
85+
- static-call side-effect restrictions apply to a later slot reached through a static request.
86+
87+
The relation-equivalence tests compare full `PrivateKernelCircuitPublicInputs` field-by-field:
88+
89+
- `init_3(private_call_0, private_call_1, private_call_2)` equals existing `init(private_call_0)` followed by two
90+
existing `inner` executions;
91+
- `inner_3(previous_kernel, private_call_0, private_call_1, private_call_2)` equals three existing `inner` executions;
92+
- coverage includes a linear chain, a depth-first shape with a sibling left on the stack, and an inner case with
93+
previous accumulated side effects.
94+
95+
Validation performed:
96+
97+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/nargo test --package private_kernel_lib --silence-warnings --skip-brillig-constraints-check`
98+
- result: 833 tests passed.
99+
100+
### Init 3 Prototype
101+
102+
`cf50be9ea1` adds:
103+
104+
- `crates/private-kernel-init-3/Nargo.toml`
105+
- `crates/private-kernel-init-3/src/main.nr`
106+
- `private_kernel_init_3.nr`
107+
- workspace wiring in `Nargo.template.toml`
108+
109+
The `private-kernel-init-3` entrypoint accepts:
110+
111+
- init scalars: `tx_request`, `vk_tree_root`, `protocol_contracts`, `is_private_only`,
112+
`first_nullifier_hint`, and `revertible_counter_hint`;
113+
- three `PrivateCallDataWithoutPublicInputs` values;
114+
- three app public input databus columns: `call_data(1)`, `call_data(2)`, and `call_data(3)`.
115+
116+
The library implementation runs the existing one-app init kernel for `private_call_0`, then applies two inner-call
117+
transitions for `private_call_1` and `private_call_2`.
118+
119+
### Inner 3 Prototype
120+
121+
`46529d485a` adds:
122+
123+
- `crates/private-kernel-inner-3/Nargo.toml`
124+
- `crates/private-kernel-inner-3/src/main.nr`
125+
- `private_kernel_inner_3.nr`
126+
- workspace wiring in `Nargo.template.toml`
127+
128+
The `private-kernel-inner-3` entrypoint accepts:
129+
130+
- one previous kernel, with public inputs on `call_data(0)`;
131+
- three `PrivateCallDataWithoutPublicInputs` values;
132+
- three app public input databus columns: `call_data(1)`, `call_data(2)`, and `call_data(3)`.
133+
134+
The library implementation verifies the previous kernel, validates its VK against the allowed previous-circuit set, then
135+
applies three inner-call transitions in sequence.
136+
137+
### Inner 6 Compile/Profile Spike
138+
139+
An explicit `private_kernel_inner_6` spike has been added without introducing a Noir-level array loop:
140+
141+
- `crates/private-kernel-inner-6/Nargo.toml`
142+
- `crates/private-kernel-inner-6/src/main.nr`
143+
- `private_kernel_inner_6.nr`
144+
- workspace wiring in `Nargo.template.toml`
145+
146+
The implementation follows the same explicit pattern as `inner_3`: verify the external previous kernel once, validate
147+
its VK, then call `execute_next_private_call` six times.
148+
149+
Validation performed:
150+
151+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/nargo compile --package private_kernel_inner_6 --force --silence-warnings --skip-brillig-constraints-check`
152+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/noir-profiler opcodes --artifact-path target/private_kernel_inner_6.json --output /tmp/private-kernel-inner-6-opcodes`
153+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/nargo test --package private_kernel_lib --silence-warnings --skip-brillig-constraints-check`
154+
155+
Results:
156+
157+
- `target/private_kernel_inner_6.json`: about 2.4 MiB, bytecode length `1413780`
158+
- `private_kernel_inner_6`: `main` has `89566` ACIR opcodes
159+
- `private_kernel_lib`: 833 tests passed
160+
161+
The `inner_6` ACIR count matches the linear projection from `inner_1` and `inner_3`:
162+
163+
- `inner_1`: `18256` main ACIR opcodes
164+
- `inner_3`: `46780` main ACIR opcodes
165+
- projected `inner_6`: `18256 + 5 * ((46780 - 18256) / 2) = 89566`
166+
- measured `inner_6`: `89566`
167+
168+
This confirms that the current explicit repeated-transition design scales linearly per additional app slot at the ACIR
169+
level.
170+
171+
### Shared Transition Helper
172+
173+
`fd2b57bed1` adds `private_kernel_batch.nr` and wires it as `pub(crate)` from `private-kernel-lib`.
174+
175+
The helper:
176+
177+
1. validates the next app as an inner call against the previous kernel public inputs;
178+
2. unconstrained-composes the next output by cloning the previous output, popping the top private call request, and
179+
appending the current private call effects;
180+
3. optionally validates the composed output with `PrivateKernelCircuitOutputValidator::validate_as_inner_call`.
181+
182+
Both `private_kernel_init_3` and `private_kernel_inner_3` now call this helper for every post-init app transition.
183+
184+
### Entrypoint Compile / ACIR Integration Proof
185+
186+
The actual `_3` circuit packages compile through their `main.nr` entrypoints with databus public inputs, not just
187+
through library tests:
188+
189+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/nargo compile --package private_kernel_init_3 --force --silence-warnings --skip-brillig-constraints-check`
190+
- `/mnt/user-data/luke/aztec-packages/noir/noir-repo/target/release/nargo compile --package private_kernel_inner_3 --force --silence-warnings --skip-brillig-constraints-check`
191+
192+
Artifacts:
193+
194+
- `target/private_kernel_init_3.json`: about 1.4 MiB, bytecode length `574152`
195+
- `target/private_kernel_inner_3.json`: about 1.6 MiB, bytecode length `756008`
196+
197+
`noir-profiler opcodes` can inspect both artifacts:
198+
199+
- `private_kernel_init_3`: `main` has `37381` ACIR opcodes
200+
- `private_kernel_inner_3`: `main` has `46780` ACIR opcodes
201+
202+
BB gate counts are not currently available for these Chonk artifacts. `bb gates --scheme chonk` fails on the current
203+
artifacts, so ACIR opcode counts are the useful local inspection tool until the required barretenberg support exists.
204+
205+
### Related BB Work: PR #22640
206+
207+
PR `#22640` (`29a4f46c95 Multi app scaffolding`) is relevant but not sufficient by itself. It starts generalizing
208+
barretenberg's Chonk databus shape from one secondary app calldata column to indexed app calldata slots:
209+
210+
- introduces `NUM_APP_PER_KERNEL`;
211+
- renames the databus layout to kernel calldata, app calldata, and return data;
212+
- changes kernel public inputs to carry an array of app return-data commitments;
213+
- allows ACIR `call_data(id)` with app ids in `[1, NUM_APP_PER_KERNEL]`;
214+
- threads an app return-data index through Chonk recursive verification.
215+
216+
The current PR still has `NUM_APP_PER_KERNEL = 1` and explicitly asserts that multiple app calldata witness columns are
217+
not wired yet. So it does not make `private_kernel_inner_3` or `private_kernel_inner_6` work under Chonk today. It does
218+
identify the next BB integration seam: raise `NUM_APP_PER_KERNEL` and finish wiring multiple app calldata witness
219+
commitments through Mega/Chonk, then retry `bb gates --scheme chonk` on the `_3` and `_6` artifacts.
220+
221+
## Current Interpretation
222+
223+
The branch is a fixed-width circuit prototype, not a final batching implementation.
224+
225+
It now includes a committed `N = 3` relation proof suite and an explicit `inner_6` compile/profile spike. It
226+
intentionally does not yet include:
227+
228+
- dynamic `num_apps`;
229+
- inactive-slot padding;
230+
- reset-aware batch selection;
231+
- PXE scheduling changes;
232+
- TypeScript input classes or witness conversion for the new circuits;
233+
- artifact naming or VK integration decisions;
234+
- the full fixed-width `N = 1..6` family beyond the committed `init_3`, `inner_3`, and `inner_6` artifacts.
235+
236+
The useful result so far is narrower and clearer: after the init-only first slot, the app transition logic is the same
237+
for init-derived and inner-derived batches. That shared logic can be factored without pretending that init slot 0 is
238+
symmetrical with later slots.
239+
240+
## Reset-Aware Scheduling Notes
241+
242+
The current PXE loop already uses lookahead before processing a non-first app. It builds a reset input builder from the
243+
latest kernel output and the still-unpopped execution stack. If the top pending app would overflow one of the
244+
resettable dimensions, PXE runs one or more reset kernels first, then processes that app with an inner kernel.
245+
246+
From Chonk's accumulated circuit-chain perspective, a mid-flow reset still always comes after a kernel has processed
247+
some prior app:
248+
249+
- `app_{i-1}`
250+
- `inner_{i-1}(previous_kernel, app_{i-1})`
251+
- `reset(inner_{i-1})`
252+
- `app_i`
253+
- `inner_i(reset, app_i)`
254+
255+
So the reset is "before app_i's inner" only from the PXE planner's perspective. It is not inserted between `app_i` and
256+
the kernel that consumes `app_i`; Chonk should see each app circuit immediately before the kernel that recursively
257+
verifies/consumes it.
258+
259+
For fixed-width kernels, the intended rule is:
260+
261+
- choose the largest contiguous prefix up to width 6 that can be processed without needing a reset before any app in
262+
that prefix;
263+
- emit the corresponding `init_N` or `inner_N`;
264+
- if the next app would overflow, emit one or more reset kernels after the batch;
265+
- continue with the next app after reset.
266+
267+
Equivalently, a batch may end before a reset, but it must not cross a reset boundary.
268+
269+
The extra artifact names and VKs for widths 1 through 6 are plumbing, not a conceptual blocker. The real scheduling
270+
risk is lookahead correctness. The existing `PrivateKernelResetPrivateInputsBuilder.needsReset()` can already answer
271+
"would this next app require a reset?" without oracle work, but it only accepts one `nextIteration` from the current
272+
top of the execution stack. A width planner needs to repeatedly ask that question against a hypothetical accumulated
273+
kernel state while tentatively appending apps to the candidate batch.
274+
275+
There is not currently a production TypeScript equivalent of Noir's `PrivateKernelCircuitOutputComposer`. A planner
276+
therefore needs a small dry-run accumulator that mirrors enough of init/inner output composition to support reset
277+
lookahead:
278+
279+
- append note hash read requests, nullifier read requests, key validation requests, note hashes, nullifiers, logs,
280+
public calls, and private call requests;
281+
- pop the private call request consumed by each tentative inner slot and push nested private calls in the same
282+
depth-first order as the current PXE loop;
283+
- track fee payer, public teardown request, min revertible counter, and expiration timestamp consistently with the
284+
Noir composer;
285+
- for `init_N`, account for init-only setup and possible protocol-nullifier injection before applying later slots.
286+
287+
This should be efficient because the maximum lookahead width is 6 and the expensive reset-builder work happens in
288+
`build()`, not in `needsReset()`. The main implementation risk is divergence between the TypeScript dry-run accumulator
289+
and the Noir composer, not asymptotic cost.
290+
291+
## Recommended Next Phase
292+
293+
The Noir-level `_3` prototype is now hardened enough to move from "prove the relation" to "complete the fixed-width
294+
family and unblock real Chonk measurements." The next phase should keep the explicit fixed-width design and avoid a
295+
fused/dynamic circuit redesign.
296+
297+
### 1. Scale Remaining Fixed Widths Mechanically
298+
299+
Do not introduce a Noir-level generic fixed-array loop unless there is a clear measured reason. It may change the
300+
compiled circuit shape through array/indexing/loop lowering. Prefer explicit source in each circuit, either written
301+
manually or produced by a generator that emits explicit calls:
302+
303+
- `let output_1 = execute_next_private_call(output_0, inputs.private_call_1);`
304+
- `let output_2 = execute_next_private_call(output_1, inputs.private_call_2);`
305+
- and so on.
306+
307+
The branch already has `init_3`, `inner_3`, and `inner_6`. Add the remaining fixed-width wrappers:
308+
309+
- `private_kernel_init_1` if product integration wants a width-dispatched family rather than treating existing init as
310+
width 1;
311+
- `private_kernel_init_2`, `private_kernel_init_4`, `private_kernel_init_5`, `private_kernel_init_6`;
312+
- `private_kernel_inner_2`, `private_kernel_inner_4`, `private_kernel_inner_5`;
313+
- keep existing `private_kernel_inner` as width 1 or add an alias/package if the TypeScript dispatch layer wants a
314+
uniform `inner_1..6` naming scheme;
315+
- workspace wiring and minimal smoke/equivalence coverage for each.
316+
317+
The implementation should be mostly mechanical: explicit entrypoints and structs per circuit, shared single-step
318+
transition execution in the library.
319+
320+
### 2. Continue BB Multi-App Databus Work
321+
322+
PR `#22640` gives a concrete BB starting point but leaves `NUM_APP_PER_KERNEL = 1`. The next useful BB spike is:
323+
324+
1. apply or rebase onto the PR's multi-app scaffolding;
325+
2. raise `NUM_APP_PER_KERNEL` locally, preferably to `6`;
326+
3. fix the resulting witness-commitment/databus failures;
327+
4. run `bb gates --scheme chonk` against `private_kernel_init_3`, `private_kernel_inner_3`, and
328+
`private_kernel_inner_6`.
329+
330+
This is the path to real Chonk gate counts. ACIR opcode counts are already enough to show linear Noir-level scaling,
331+
but not enough to decide product economics.
332+
333+
### 3. Measure Before PXE Product Integration
334+
335+
Before adding PXE or TypeScript integration, the fixed-width family should eventually be measured against the current
336+
one-app path:
337+
338+
- bytecode size and compiled artifact size;
339+
- gate counts;
340+
- relevant ACIR opcode deltas;
341+
- proving-key or VK-size impact if available;
342+
- whether `init_N` is cheaper than `init + (N - 1) * inner`;
343+
- whether `inner_N` is cheaper than `N * inner`.
344+
345+
Those measurements are easier said than done for these prototypes because realistic Chonk gate/proving measurements
346+
still require the BB multi-app databus work above. Do not block the remaining mechanical Noir wrappers on that, but do
347+
treat Chonk measurements as the gate before PXE/product integration.
348+
349+
### 4. Prototype Reset-Aware TS Planning
350+
351+
Once the fixed-width family and BB support are in place, the next product-facing task is a planner that chooses the
352+
largest safe prefix up to width 6 without crossing reset boundaries. The main new code should be a TypeScript dry-run
353+
accumulator that mirrors enough of Noir's private-kernel output composer to ask the existing reset builder whether the
354+
next candidate app would require a reset.
355+
356+
This should be tested with synthetic app public inputs that force boundaries such as:
357+
358+
- `init_3 -> reset -> inner_3`;
359+
- `init_2 -> reset -> inner_4`;
360+
- consecutive resets before the next batch;
361+
- depth-first nested-call ordering where processing one app exposes new candidate apps.
362+
363+
## Open Questions
364+
365+
- Should the shared helper keep the `private_kernel_batch` name, or use a more literal transition name until dynamic
366+
batching exists?
367+
- Should the remaining fixed-width Noir sources be maintained manually, or generated by a small source generator that
368+
emits explicit calls?
369+
- Should batched init/inner VKs get distinct named indexes, or use a reset-style range abstraction for allowed previous
370+
kernels?
371+
- What Chonk gate/prover threshold justifies moving from fixed-width experiments into PXE scheduling work?

0 commit comments

Comments
 (0)