AztecProtocol · ledwards2225 · May 11, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/INIT_3_HANDOFF.md b/INIT_3_HANDOFF.md
@@ -0,0 +1,167 @@
+# Handoff: `private_kernel_init_3` is wired end-to-end
+
+`private_kernel_init_3` — the batched first-iteration kernel that verifies three app
+calls in a single circuit — is now plumbed end-to-end behind a `PXE_USE_INIT_3`
+flag. The new path captures, proves, and verifies on real client flows, with
+measured savings in the expected shape (saving 2 prev-kernel HN verifications +
+2 Oink proves per tx).
+
+This branch is `lde/integrate-init-3`, based on `merge-train/barretenberg`.
+
+## What's done — 7 commits
+
+```
+69362fbe test(end-to-end): reproducer for the PXE_USE_INIT_3 canary
+397224be feat(pxe): orchestrator init_3 path behind PXE_USE_INIT_3 flag
+43f203c1 feat(bb-prover): implement generateInit3Output / simulateInit3
+3bcad728 feat(noir-protocol-circuits-types): wire private_kernel_init_3 artifact and VK
+7de74078 feat(stdlib): add PrivateKernelInit3CircuitPrivateInputs and prover interface methods
+0dc31e27 docs(barretenberg/cpp): note that find-bb defaults to bb-avm
+94b52e31 feat(protocol-circuits): allocate PRIVATE_KERNEL_INIT_3_VK_INDEX and accept it as a previous kernel
+```
+
+Layered top-down, each commit corresponds to one layer of the stack:
+
+| Commit | Layer | What changed |
+|---|---|---|
+| `94b52e31` | Noir protocol circuits | `PRIVATE_KERNEL_INIT_3_VK_INDEX = 65`, extends `ALLOWED_PREVIOUS_CIRCUITS` for inner / reset / tail / tail_to_public |
+| `7de74078` | TS stdlib | New `PrivateKernelInit3CircuitPrivateInputs`, new `generateInit3Output` / `simulateInit3` on `PrivateKernelProver` |
+| `3bcad728` | TS noir-protocol-circuits-types | New `PrivateKernelInit3Artifact` in the union, bundle, vks/client, vks/server. Simulated lookup reuses the constrained artifact (no `private-kernel-init-3-simulated` crate exists) |
+| `43f203c1` | TS bb-prover | New witness-map converters + `BBPrivateKernelProver` methods |
+| `397224be` | TS PXE orchestrator | `useInit3 = process.env.PXE_USE_INIT_3 === '1' && totalAppCalls >= 3`. Falls back to standard init when fewer than 3 apps. Refactored top-of-loop into `consumeNextApp` helper |
+| `0dc31e27` | Docs | `find-bb` defaults to `bb-avm` — flagged because we lost a session debugging the wrong binary |
+| `69362fbe` | Reproducer | `yarn-project/end-to-end/scripts/reproduce_init3_canary.sh`, plus a `SKIP_STEP_COUNT_CHECK=1` opt-out in `captureProfile` |
+
+## Verify it works
+
+After `./bootstrap.sh` from the repo root, run the reproducer:
+
+```bash
+yarn-project/end-to-end/scripts/reproduce_init3_canary.sh
+```
+
+This runs the `deploy_ecdsar1+sponsored_fpc` client flow with `PXE_USE_INIT_3=1`,
+captures the IVC inputs, proves with `bb prove --scheme chonk`, and verifies the
+proof. Asserts the capture references `PrivateKernelInit3Artifact` so a
+regression in flag plumbing is caught at capture time. Total runtime ~2–3 min.
+
+Final output on success:
+
+```
+init_3 canary verified end-to-end.
+  Captured inputs: /tmp/init3-canary/captures/deploy_ecdsar1+sponsored_fpc/ivc-inputs.msgpack
+  Proof artifacts: /tmp/init3-canary/proof/
+```
+
+To verify a different flow, set the right test name + flow path:
+
+```bash
+# transfer_0 (3 apps total — init_3 absorbs all of them)
+PXE_USE_INIT_3=1 \
+CAPTURE_IVC_FOLDER=/tmp/init3-out \
+SKIP_STEP_COUNT_CHECK=1 \
+BENCHMARK_CONFIG=key_flows \
+LOG_LEVEL=warn \
+node --experimental-vm-modules yarn-project/node_modules/.bin/jest \
+  --testTimeout=300000 --no-cache --runInBand \
+  --testNamePattern "ecdsar1.*0 recursions.*sponsored_fpc" \
+  --rootDir yarn-project/end-to-end/src \
+  client_flows/transfers
+# then bb prove + bb verify against /tmp/init3-out/<flow>/ivc-inputs.msgpack
+```
+
+## Measured benefit (single run, remote bench machine, HARDWARE_CONCURRENCY=16)
+
+| Flow | Apps | Baseline | init_3 | Δ |
+|---|---:|---:|---:|---:|
+| `deploy_ecdsar1+sponsored_fpc` | 6 | 6.29 s | 5.99 s | **-0.30 s (-4.8%)** |
+| `ecdsar1+transfer_0_recursions+sponsored_fpc` | 3 | 5.09 s | 4.65 s | **-0.44 s (-8.6%)** |
+
+Savings come exactly where the design predicted: -2 `HypernovaFoldingProver::fold`
+calls, -2 `OinkProver::prove`, -2 `Chonk::complete_kernel_circuit_logic`. ECCVM
+shrinks ~23% in trace rows but only ~7% in time (fixed sumcheck overhead).
+Translator stays constant at 8192 rows by design (proof is constant-size).
+
+## What was deferred and why
+
+**Phase 6 (audit one-app-per-kernel assumptions) — folded into Phase 7.** The
+canary proving + verifying real client flows on real bb is a stronger integration
+gate than a code audit. If a one-app assumption had been violated, prove would
+have crashed.
+
+**Phase 8 (VK pin update) — not needed yet.** The pin script
+(`barretenberg/cpp/scripts/test_chonk_standalone_vks_havent_changed.sh`) compares
+VKs derived from *pinned bytecode* (frozen msgpack from S3) against pinned VKs in
+the same msgpack. It catches C++-side bb regressions, not Noir-side circuit
+changes. Our changes are Noir-side, so the pin test still passes on this branch
+without intervention. The pin will need updating when init_3 stops being
+flag-gated and becomes the default — at that point newly captured pins will
+include init_3 steps, and the reference msgpack on S3 should be rotated.
+
+## Follow-up opportunities
+
+In rough order of payoff:
+
+1. **`inner_3`** — the next obvious win. The deployment flow has 6 apps; init_3
+   collapses the first 3 but the remaining 3 still go through 3 inner kernels.
+   An inner that batches 3 apps would collapse those, saving another ~2 prev-kernel
+   HN verifications. Same shape as init_3 but for the middle of the chain.
+
+2. **Skip inner kernels entirely when init_3 absorbs everything.** Marked with
+   `WORKTODO(luke)` at the bottom of the init_3 branch in
+   `private_kernel_execution_prover.ts`. If `executionStack.length === 0` after
+   init_3, the chain can go directly to reset+tail. The protocol-circuit
+   ALLOWED_PREVIOUS lists already accept init_3 as a predecessor of
+   reset/tail/tail_to_public, so this is purely an orchestrator change.
+   Relevant for short flows like transfer_0.
+
+3. **`init_2` / `inner_2` / `init_1` / `inner_1`.** Round out the variant set so
+   the orchestrator can pick the largest batch that fits. Today the dispatch is
+   binary (init_3 if ≥3 apps, else init); a graceful fallback through init_2
+   would help any flow with exactly 2 apps. Probably low priority — most real
+   flows have 3+.
+
+4. **Generic `PrivateKernelInitNCircuitPrivateInputs<N>`.** The TS class is
+   currently concrete (three explicit `privateCallN` fields). When a second
+   variant lands, generalize.
+
+5. **Remove the `PXE_USE_INIT_3` flag.** Once a few weeks of green CI confirm
+   stability, the flag and its `totalAppCalls >= 3` gate can collapse into
+   unconditional use. Coordinate with Phase 8 (pin update) at that point.
+
+## Known papercuts
+
+- **`find-bb` defaults to `bb-avm`** (`AVM=1`). Rebuilding only the non-AVM `bb`
+  target leaves a stale `bb-avm` and downstream bootstraps keep failing with the
+  same error after the supposed fix. This is now documented in
+  `barretenberg/cpp/CLAUDE.md`.
+- **`SKIP_STEP_COUNT_CHECK=1`** is required when running client_flow tests with
+  `PXE_USE_INIT_3=1` because the `captureProfile` `expectedSteps` assertion is
+  hard-coded for the standard init+inner chain. The reproducer script sets it;
+  if you write new test paths that exercise init_3, set it explicitly.
+- **Pre-existing stale gitignored derivatives** in
+  `aztec.js/src/contract/protocol_contracts/` and `noir-contracts.js/src/` may
+  exist on a tree that hasn't been bootstrapped recently. Symptom: `aztecVersion
+  is missing in type ContractArtifact` build errors. Fix: run `./bootstrap.sh`
+  from the repo root.
+
+## Out of scope of this work
+
+- Any change to the C++ Chonk recursive-verifier internals. The existing
+  `bool is_init_kernel = front().type == OINK` (`chonk.cpp:314`) and the rest
+  of `complete_kernel_circuit_logic` already correctly handle a 3-entry queue
+  `[OINK, HN, HN]` — confirmed by the canary running real proofs. No bb code
+  changes were needed.
+- Changes to reset / tail config dimensions. init_3 produces the same
+  `PrivateKernelCircuitPublicInputs` shape as init, so downstream kernels see
+  no new shape.
+
+## Files of interest
+
+- `noir-projects/noir-protocol-circuits/crates/private-kernel-lib/src/private_kernel_init_n.nr`
+  — the lib that init_3's binary instantiates with N=3.
+- `noir-projects/noir-protocol-circuits/crates/private-kernel-init-3/src/main.nr`
+  — the concrete crate.
+- `yarn-project/pxe/src/private_kernel/private_kernel_execution_prover.ts`
+  — the orchestrator. The init_3 dispatch is in the `firstIteration` branch.
+- `yarn-project/end-to-end/scripts/reproduce_init3_canary.sh` — the reproducer.
diff --git a/barretenberg/cpp/CLAUDE.md b/barretenberg/cpp/CLAUDE.md
@@ -6,6 +6,20 @@ Bootstrap modes:
 - `./bootstrap.sh build` => standard build
 - `AVM=0 ./bootstrap.sh build_native` => quick build without slow bb-avm target. Good for verifying compilation works. Needed to build ts/
 
+## `bb` vs `bb-avm`: which binary do downstream scripts pick?
+
+`barretenberg/cpp/scripts/find-bb` returns `bb-avm` by default (when `AVM` is unset or `AVM=1`) and `bb` only when `AVM=0`. `noir-projects/noir-protocol-circuits/bootstrap.sh` and most other downstream tooling go through `find-bb`, so when those scripts run "the bb binary", they are running `bb-avm`.
+
+Consequence: when changing C++ that affects VK derivation, proving, or anything else exercised by downstream bootstrap scripts, `cmake --build build --target bb` is **not enough** — `bb` is non-AVM and will not be picked up. You must rebuild the AVM-enabled binary:
+
+```bash
+cd barretenberg/cpp
+cmake --preset default -DAVM=ON
+cmake --build build --target bb-avm
+```
+
+Or just run `./bootstrap.sh` (full build), which produces both. Symptom of forgetting: downstream scripts keep failing with the *same* error after your "fix" because they are still running the stale `bb-avm`.
+
 Development commands (from barretenberg/cpp):
 ```bash
 cmake --preset default    # Configure (AVM disabled by default)

diff --git a/barretenberg/cpp/scripts/test_chonk_standalone_vks_havent_changed.sh b/barretenberg/cpp/scripts/test_chonk_standalone_vks_havent_changed.sh
@@ -21,7 +21,7 @@ script_path="$root/barretenberg/cpp/scripts/test_chonk_standalone_vks_havent_cha
 # - Generate a hash for versioning: sha256sum bb-chonk-inputs.tar.gz
 # - Upload the compressed results: aws s3 cp bb-chonk-inputs.tar.gz s3://aztec-ci-artifacts/protocol/bb-chonk-inputs-[hash(0:8)].tar.gz
 # Note: In case of the "Test suite failed to run ... Unexpected token 'with' " error, need to run: docker pull aztecprotocol/build:3.0
-pinned_short_hash="20c388cc"
+pinned_short_hash="aafbeabe"
 pinned_chonk_inputs_url="https://aztec-ci-artifacts.s3.us-east-2.amazonaws.com/protocol/bb-chonk-inputs-${pinned_short_hash}.tar.gz"
 
 function update_pinned_hash_in_script {

diff --git a/barretenberg/cpp/src/barretenberg/chonk/chonk.test.cpp b/barretenberg/cpp/src/barretenberg/chonk/chonk.test.cpp
@@ -113,7 +113,7 @@ class ChonkTests : public ::testing::Test {
 
     /**
      * @brief Helper function to test tampering with AppIO pairing inputs
-     * @details Accumulates circuits, doubles the app pairing points (creating valid but different points),
+     * @details Accumulates circuits, changes the app pairing points (creating valid but different points),
      * and verifies that the final Chonk proof fails verification.
      */
     static void test_app_io_tampering()
@@ -122,9 +122,7 @@ class ChonkTests : public ::testing::Test {
 
         TestSettings settings{ .log2_num_gates = SMALL_LOG_2_NUM_GATES };
         auto [proof, vk] = run_ivc(/*num_app_circuits=*/2, settings, [](Chonk& ivc, size_t idx) {
-            if (idx == 2) {
-                EXPECT_EQ(ivc.verification_queue.size(), 2);
-
+            if (idx == 1) {
                 auto& app_entry = ivc.verification_queue[1];
                 ASSERT_FALSE(app_entry.is_kernel) << "Expected second queue entry to be an app";
 
@@ -154,10 +152,8 @@ class ChonkTests : public ::testing::Test {
         BB_DISABLE_ASSERTS();
 
         TestSettings settings{ .log2_num_gates = SMALL_LOG_2_NUM_GATES };
-        auto [proof, vk] = run_ivc(/*num_app_circuits=*/2, settings, [field_to_tamper](Chonk& ivc, size_t idx) {
-            if (idx == 2) {
-                EXPECT_EQ(ivc.verification_queue.size(), 2);
-
+        auto [proof, vk] = run_ivc(/*num_app_circuits=*/4, settings, [field_to_tamper](Chonk& ivc, size_t idx) {
+            if (idx == 3) {
                 auto& kernel_entry = ivc.verification_queue[0];
                 ASSERT_TRUE(kernel_entry.is_kernel) << "Expected first queue entry to be a kernel";
 
@@ -212,13 +208,15 @@ class ChonkTests : public ::testing::Test {
         using KernelIOSerde = bb::stdlib::recursion::honk::KernelIOSerde;
 
         const size_t NUM_APP_CIRCUITS = 2;
-        const size_t NUM_TOTAL_CIRCUITS = NUM_APP_CIRCUITS * 2 + /*num_trailing_kernels*/ 3;
+        const size_t NUM_TOTAL_CIRCUITS =
+            NUM_APP_CIRCUITS + static_cast<size_t>(ceil(static_cast<double>(NUM_APP_CIRCUITS) / MAX_APPS_PER_KERNEL)) +
+            /*num_trailing_kernels*/ 3;
         TestSettings settings{ .log2_num_gates = SMALL_LOG_2_NUM_GATES };
 
         // Extract tail kernel IO before the hiding kernel consumes the verification queue.
         KernelIOSerde tail_io;
-        auto [proof, vk_and_hash] =
-            run_ivc(/*num_app_circuits=*/NUM_APP_CIRCUITS, settings, [&tail_io](Chonk& ivc, size_t idx) {
+        auto [proof, vk_and_hash] = run_ivc(
+            /*num_app_circuits=*/NUM_APP_CIRCUITS, settings, [&tail_io, &NUM_TOTAL_CIRCUITS](Chonk& ivc, size_t idx) {
                 // With 2 apps the layout is [app, kernel, app, kernel, reset, tail, hiding].
                 if (idx == NUM_TOTAL_CIRCUITS - 2) {
                     for (auto& it : std::ranges::reverse_view(ivc.verification_queue)) {
@@ -351,7 +349,6 @@ TEST_F(ChonkTests, BadProofFailure)
             }
 
             if (idx == 2) {
-                EXPECT_EQ(ivc.verification_queue.size(), 2); // two proofs after 3 calls to accumulation
                 tamper_with_proof(ivc.verification_queue[0].proof,
                                   num_public_inputs); // tamper with first proof
             }
@@ -372,8 +369,7 @@ TEST_F(ChonkTests, BadProofFailure)
                 circuit_producer.create_next_circuit_and_vk(ivc, { .log2_num_gates = SMALL_LOG_2_NUM_GATES });
             ivc.accumulate(circuit, vk);
 
-            if (idx == 2) {
-                EXPECT_EQ(ivc.verification_queue.size(), 2); // two proofs after 3 calls to accumulation
+            if (idx == 1) {
                 tamper_with_proof(ivc.verification_queue[1].proof,
                                   circuit.num_public_inputs()); // tamper with second proof
             }
@@ -428,8 +424,7 @@ HEAVY_TEST(ChonkKernelCapacity, MaxCapacityPassing)
 {
     bb::srs::init_file_crs_factory(bb::srs::bb_crs_path());
 
-    const size_t NUM_APP_CIRCUITS = (CHONK_MAX_NUM_CIRCUITS - /*trailing kernels*/ 3) / 2;
-    auto [proof, vk] = ChonkTests::accumulate_and_prove_ivc(NUM_APP_CIRCUITS);
+    auto [proof, vk] = ChonkTests::accumulate_and_prove_ivc(CHONK_MAX_NUM_APPS);
 
     bool verified = ChonkTests::verify_chonk(proof, vk);
     EXPECT_TRUE(verified);

diff --git a/barretenberg/cpp/src/barretenberg/chonk/chonk_transcript_invariants.test.cpp b/barretenberg/cpp/src/barretenberg/chonk/chonk_transcript_invariants.test.cpp
@@ -58,33 +58,34 @@ class ChonkTranscriptInvariantTests : public ::testing::Test {
  * Any change to this count indicates a structural change in how transcripts are managed, which
  * could have security implications (e.g., unexpected transcript isolation or sharing).
  *
- * The 2-app IVC flow creates 7 circuits: app0 -> kernel0 -> app1 -> kernel1 -> reset -> tail -> hiding
+ * The 4-app IVC flow creates 6 circuits: app0 -> app1 -> app2 -> kernel1 -> app3 -> kernel2 -> reset -> tail -> hiding
  *
  * Per-circuit transcript breakdown (from complete_kernel_circuit_logic):
- * - App circuits (0, 2): 0 transcripts - use native HN folding prover
- * - Init kernel (1): 2 transcripts:
+ * - App circuits (0, 1, 2): 0 transcripts - use native HN folding prover
+ * - Init kernel (3): 3 transcripts:
  *     1. accumulation_recursive_transcript
- *     2. hash_transcript - for computing accumulator hash to propagate in public inputs
- * - Intermediate kernel (3): 3 transcripts:
+ *     2. PairingPoints::aggregate_multiple - for batching pairing points with Fiat-Shamir separator
+ *     3. hash_transcript - for computing accumulator hash to propagate in public inputs
+ * - Intermediate kernel (4): 3 transcripts:
  *     1. accumulation_recursive_transcript - shared across recursive verification
  *     2. PairingPoints::aggregate_multiple - for batching pairing points with Fiat-Shamir separator
  *     3. hash_transcript - for computing accumulator hash to propagate in public inputs
- * - Reset and tail kernels (4, 5): 2 transcripts each:
+ * - Reset and tail kernels (5, 6): 2 transcripts each:
  *     1. accumulation_recursive_transcript
  *     2. hash_transcript - for computing accumulator hash to propagate in public inputs
- * - Hiding kernel (6): 3 transcripts:
+ * - Hiding kernel (7): 3 transcripts:
  *     1. accumulation_recursive_transcript
  *     2. batch_merge_transcript - for final batch merge verification
  *     3. PairingPoints::aggregate_multiple
  *
- * Total: 0 + 2 + 0 + 3 + 2 + 2 + 3 = 12 transcripts
+ * Total: 0 + 0 + 0 + 3 + 3 + 2 + 2 + 3 = 13 transcripts
  */
 TEST_F(ChonkTranscriptInvariantTests, AccumulationTranscriptCount)
 {
-    // Pinned expected transcript count for 2 app circuits
-    constexpr size_t EXPECTED_TOTAL_TRANSCRIPTS = 12;
-    constexpr size_t EXPECTED_NUM_CIRCUITS = 7;
-    constexpr std::array<size_t, EXPECTED_NUM_CIRCUITS> EXPECTED_CIRCUIT_TRANSCRIPTS = { 0, 2, 0, 3, 2, 2, 3 };
+    // Pinned expected transcript count for 4 app circuits
+    constexpr size_t EXPECTED_TOTAL_TRANSCRIPTS = 13;
+    constexpr size_t EXPECTED_NUM_CIRCUITS = 9;
+    constexpr std::array<size_t, EXPECTED_NUM_CIRCUITS> EXPECTED_CIRCUIT_TRANSCRIPTS = { 0, 0, 0, 3, 0, 3, 2, 2, 3 };
 
     // Record transcript index before IVC
     size_t index_before_ivc = bb::unique_transcript_index.load();
@@ -93,8 +94,8 @@ TEST_F(ChonkTranscriptInvariantTests, AccumulationTranscriptCount)
     std::vector<size_t> indices_before_accumulation;
     std::vector<size_t> indices_after_accumulation;
 
-    // Create IVC with 2 app circuits
-    constexpr size_t NUM_APP_CIRCUITS = 2;
+    // Create IVC with 4 app circuits
+    constexpr size_t NUM_APP_CIRCUITS = 4;
     PrivateFunctionExecutionMockCircuitProducer circuit_producer(NUM_APP_CIRCUITS);
     const size_t num_circuits = circuit_producer.total_num_circuits;
     ASSERT_EQ(num_circuits, EXPECTED_NUM_CIRCUITS) << "Circuit count mismatch - test assumptions invalid";
@@ -112,7 +113,7 @@ TEST_F(ChonkTranscriptInvariantTests, AccumulationTranscriptCount)
 
     // Pin the total number of transcripts created during accumulation
     EXPECT_EQ(total_transcripts, EXPECTED_TOTAL_TRANSCRIPTS)
-        << "Total transcript count during 2-app IVC accumulation changed. "
+        << "Total transcript count during 4-app IVC accumulation changed. "
         << "If intentional, update EXPECTED_TOTAL_TRANSCRIPTS. "
         << "Unexpected changes may indicate security-relevant transcript isolation issues.";