You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call (#23093)
## Why
Follow-up to #21564 (bb-prover bb.js migration) addressing the IVC
verification perf regression that surfaced in `tx_stats_bench`.
The migration kept the legacy spawn-per-verification model: every
chonk/ultra-honk verification through `BBCircuitVerifier` spawned a
fresh `bb` process and SIGTERMed it after one proof.
`BB_NUM_IVC_VERIFIERS=8` only capped concurrency at the queue layer
(`QueuedIVCVerifier`), not the number of bb processes.
That made the bench spawn ~600 bb processes over its 60s 10 TPS phase
inside an 8-CPU isolate. Two compounding problems:
1. ~50–100 ms of `bb` startup tax on every verification's hot path.
2. The bind→listen race in `NativeUnixSocket`: bb's socket file appears
after `bind()` but before `listen()`. A TS `connect()` landing in that
window gets `ECONNREFUSED`. Vanishingly rare under low load; reliable
flake under contention. Diagnosis at
http://ci.aztec-labs.com/735256f13a268733.
## What
### Make `BB_NUM_IVC_VERIFIERS` mean what its name says (commits
aa99817, 0f4cb77)
Pool of long-lived bb verifier processes instead of fresh-per-call. The
factory class is renamed `BBJsProverFactory` → `BBJsFactory` (it's used
for both proving and verifying) and given a single `getInstance():
Promise<BBJsApi & AsyncDisposable>` method:
- `new BBJsFactory(path)` → no pool. Every `getInstance()` spawns a
fresh bb that is destroyed on dispose. Same as the previous
`withFreshInstance` behaviour — used by `BBNativeRollupProver`, the AVM
proving tester, and ivc-integration helpers, so their semantics are
unchanged.
- `new BBJsFactory(path, { poolSize: N })` → pool of N long-lived bb
processes, lazily spawned on first acquire. Used by `BBCircuitVerifier`
with `poolSize: numConcurrentIVCVerifiers`.
Callers use `await using inst = await factory.getInstance()` for
RAII-style release, matching the codebase's preference for
`AsyncDisposable`. `BBCircuitVerifier.stop` (already wired through to
aztec-node shutdown) tears the pool down.
### Close the bind→listen race in bb.js (commit 8e519b0)
`barretenberg/ts/src/bb_backends/node/native_socket.ts`: retry
`connect()` on `ECONNREFUSED` with exponential backoff (capped at 50 ms)
up to the existing 5 s budget. Other socket errors fail fast as before.
Pool startup still spawns N bb processes in parallel, so the race
surface is reduced from ~600 to N — the retry handles the residual.
### Server-side Chonk proof split (commit 97577cf)
`splitChonkProofToStructured` in TS had three hand-maintained constants
(`MERGE_PROOF_SIZE`, `ECCVM_PROOF_LENGTH`, `JOINT_PROOF_LENGTH`)
duplicating C++ values. When C++ shifted Chonk layout (e.g. databus
relation changes shrinking the oink portion in the previous round of
regressions), these went stale and verification failed deep in the
verifier with an opaque "OinkVerifier: num_public_inputs mismatch with
VK".
Add a new `ChonkVerifyFromFields` bbapi command that takes a flat
`Vec<bb::fr>` and calls `ChonkProof::from_field_elements` server-side,
then runs the verifier. The TS layer now passes flat fields straight
through — no layout knowledge, no hand-maintained constants.
- `bbapi_chonk.{hpp,cpp}`: new struct + `execute()`.
- `bbapi_execute.hpp`: register the variant.
- `bb_js_backend.ts`: `verifyChonkProof` calls the new API;
`splitChonkProofToStructured` and the 3 constants are deleted.
### Disposal robustness (commit 5cde220)
The first cut of `BBJsFactory` had three `.catch(() => {})` clauses that
silently swallowed bb `destroy()` errors, and an `initPool()` that
dropped already-spawned bb children if a sibling creation failed
(`Promise.all` short-circuit). Both would manifest as the Jest "worker
failed to exit gracefully" warning we hit on one test run.
Now: destroy errors propagate (`AggregateError` for the pool path);
`initPool` uses `allSettled` and tears down anything it spawned if any
sibling rejects.
### Playground bundle size (commit 1681d33)
The new `ChonkVerifyFromFields` bbapi variant tipped the playground main
entrypoint over the 1750 KB hard limit. Bumped to 1800 with a bump-log
entry.
## Effect
- `tx_stats_bench`: 600 bb spawns → 8 bb spawns at boot, then 8
long-lived processes serve every verification. The bind→listen race
surface drops 75×, *and* the residual is handled by the connect retry.
Per-call ~50–100 ms `bb` startup cost disappears from the verifier hot
path.
- Brittle TS Chonk constants are gone — Chonk layout changes in C++ can
no longer manifest as opaque verifier errors in TS.
- Disposal failures surface instead of leaking bb children.
- Behaviour for proving paths (`BBNativeRollupProver`, AVM tests,
ivc-integration) is unchanged — they still spawn fresh per call.
ClaudeBox log: https://claudebox.work/s/2d65052b0deaeab2?run=3
---------
Co-authored-by: Charlie <5764343+charlielye@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments