Skip to content

Commit 6338c3f

Browse files
AztecBotcharlielyeclaude
authored
fix(bb-prover): pool long-lived bb verifier processes instead of spawning per-call (#23093)
## Why Follow-up to #21564 (bb-prover bb.js migration) addressing the IVC verification perf regression that surfaced in `tx_stats_bench`. The migration kept the legacy spawn-per-verification model: every chonk/ultra-honk verification through `BBCircuitVerifier` spawned a fresh `bb` process and SIGTERMed it after one proof. `BB_NUM_IVC_VERIFIERS=8` only capped concurrency at the queue layer (`QueuedIVCVerifier`), not the number of bb processes. That made the bench spawn ~600 bb processes over its 60s 10 TPS phase inside an 8-CPU isolate. Two compounding problems: 1. ~50–100 ms of `bb` startup tax on every verification's hot path. 2. The bind→listen race in `NativeUnixSocket`: bb's socket file appears after `bind()` but before `listen()`. A TS `connect()` landing in that window gets `ECONNREFUSED`. Vanishingly rare under low load; reliable flake under contention. Diagnosis at http://ci.aztec-labs.com/735256f13a268733. ## What ### Make `BB_NUM_IVC_VERIFIERS` mean what its name says (commits aa99817, 0f4cb77) Pool of long-lived bb verifier processes instead of fresh-per-call. The factory class is renamed `BBJsProverFactory` → `BBJsFactory` (it's used for both proving and verifying) and given a single `getInstance(): Promise<BBJsApi & AsyncDisposable>` method: - `new BBJsFactory(path)` → no pool. Every `getInstance()` spawns a fresh bb that is destroyed on dispose. Same as the previous `withFreshInstance` behaviour — used by `BBNativeRollupProver`, the AVM proving tester, and ivc-integration helpers, so their semantics are unchanged. - `new BBJsFactory(path, { poolSize: N })` → pool of N long-lived bb processes, lazily spawned on first acquire. Used by `BBCircuitVerifier` with `poolSize: numConcurrentIVCVerifiers`. Callers use `await using inst = await factory.getInstance()` for RAII-style release, matching the codebase's preference for `AsyncDisposable`. `BBCircuitVerifier.stop` (already wired through to aztec-node shutdown) tears the pool down. ### Close the bind→listen race in bb.js (commit 8e519b0) `barretenberg/ts/src/bb_backends/node/native_socket.ts`: retry `connect()` on `ECONNREFUSED` with exponential backoff (capped at 50 ms) up to the existing 5 s budget. Other socket errors fail fast as before. Pool startup still spawns N bb processes in parallel, so the race surface is reduced from ~600 to N — the retry handles the residual. ### Server-side Chonk proof split (commit 97577cf) `splitChonkProofToStructured` in TS had three hand-maintained constants (`MERGE_PROOF_SIZE`, `ECCVM_PROOF_LENGTH`, `JOINT_PROOF_LENGTH`) duplicating C++ values. When C++ shifted Chonk layout (e.g. databus relation changes shrinking the oink portion in the previous round of regressions), these went stale and verification failed deep in the verifier with an opaque "OinkVerifier: num_public_inputs mismatch with VK". Add a new `ChonkVerifyFromFields` bbapi command that takes a flat `Vec<bb::fr>` and calls `ChonkProof::from_field_elements` server-side, then runs the verifier. The TS layer now passes flat fields straight through — no layout knowledge, no hand-maintained constants. - `bbapi_chonk.{hpp,cpp}`: new struct + `execute()`. - `bbapi_execute.hpp`: register the variant. - `bb_js_backend.ts`: `verifyChonkProof` calls the new API; `splitChonkProofToStructured` and the 3 constants are deleted. ### Disposal robustness (commit 5cde220) The first cut of `BBJsFactory` had three `.catch(() => {})` clauses that silently swallowed bb `destroy()` errors, and an `initPool()` that dropped already-spawned bb children if a sibling creation failed (`Promise.all` short-circuit). Both would manifest as the Jest "worker failed to exit gracefully" warning we hit on one test run. Now: destroy errors propagate (`AggregateError` for the pool path); `initPool` uses `allSettled` and tears down anything it spawned if any sibling rejects. ### Playground bundle size (commit 1681d33) The new `ChonkVerifyFromFields` bbapi variant tipped the playground main entrypoint over the 1750 KB hard limit. Bumped to 1800 with a bump-log entry. ## Effect - `tx_stats_bench`: 600 bb spawns → 8 bb spawns at boot, then 8 long-lived processes serve every verification. The bind→listen race surface drops 75×, *and* the residual is handled by the connect retry. Per-call ~50–100 ms `bb` startup cost disappears from the verifier hot path. - Brittle TS Chonk constants are gone — Chonk layout changes in C++ can no longer manifest as opaque verifier errors in TS. - Disposal failures surface instead of leaking bb children. - Behaviour for proving paths (`BBNativeRollupProver`, AVM tests, ivc-integration) is unchanged — they still spawn fresh per call. ClaudeBox log: https://claudebox.work/s/2d65052b0deaeab2?run=3 --------- Co-authored-by: Charlie <5764343+charlielye@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6716d66 commit 6338c3f

11 files changed

Lines changed: 421 additions & 196 deletions

File tree

barretenberg/cpp/src/barretenberg/bbapi/bbapi_chonk.cpp

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,33 @@ ChonkVerify::Response ChonkVerify::execute(const BBApiRequest& /*request*/) &&
189189
return { .valid = verified };
190190
}
191191

192+
ChonkVerifyFromFields::Response ChonkVerifyFromFields::execute(const BBApiRequest& /*request*/) &&
193+
{
194+
BB_BENCH_NAME(MSGPACK_SCHEMA_NAME);
195+
196+
using VerificationKey = Chonk::MegaVerificationKey;
197+
validate_vk_size<VerificationKey>(vk);
198+
199+
auto hiding_kernel_vk = std::make_shared<VerificationKey>(from_buffer<VerificationKey>(vk));
200+
201+
// Validate total field count: must match num_public_inputs + fixed overhead.
202+
const size_t expected_field_count =
203+
static_cast<size_t>(hiding_kernel_vk->num_public_inputs) + ChonkProof::PROOF_LENGTH_WITHOUT_PUB_INPUTS;
204+
if (proof.size() != expected_field_count) {
205+
throw_or_abort("ChonkVerifyFromFields: proof has wrong field count: expected " +
206+
std::to_string(expected_field_count) + ", got " + std::to_string(proof.size()));
207+
}
208+
209+
// Split the flat field array into the structured ChonkProof. Layout knowledge stays here.
210+
auto structured = ChonkProof::from_field_elements(proof);
211+
212+
auto vk_and_hash = std::make_shared<ChonkNativeVerifier::VKAndHash>(hiding_kernel_vk);
213+
ChonkNativeVerifier verifier(vk_and_hash);
214+
const bool verified = verifier.verify(structured);
215+
216+
return { .valid = verified };
217+
}
218+
192219
ChonkBatchVerify::Response ChonkBatchVerify::execute(const BBApiRequest& /*request*/) &&
193220
{
194221
BB_BENCH_NAME(MSGPACK_SCHEMA_NAME);

barretenberg/cpp/src/barretenberg/bbapi/bbapi_chonk.hpp

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,36 @@ struct ChonkVerify {
157157
bool operator==(const ChonkVerify&) const = default;
158158
};
159159

160+
/**
161+
* @struct ChonkVerifyFromFields
162+
* @brief Verify a Chonk proof passed as a flat field-element array (with public inputs prepended).
163+
*
164+
* The split into structured ChonkProof sub-proofs is done server-side via
165+
* ChonkProof::from_field_elements, so callers do not need to know the per-component sub-proof
166+
* sizes. This is the recommended entry point for TypeScript callers that hold the proof as a
167+
* flat Fr[] (e.g. from tx.chonkProof.attachPublicInputs).
168+
*/
169+
struct ChonkVerifyFromFields {
170+
static constexpr const char MSGPACK_SCHEMA_NAME[] = "ChonkVerifyFromFields";
171+
172+
struct Response {
173+
static constexpr const char MSGPACK_SCHEMA_NAME[] = "ChonkVerifyFromFieldsResponse";
174+
175+
/** @brief True if the proof is valid */
176+
bool valid;
177+
SERIALIZATION_FIELDS(valid);
178+
bool operator==(const Response&) const = default;
179+
};
180+
181+
/** @brief Flat proof field elements with public inputs prepended */
182+
std::vector<bb::fr> proof;
183+
/** @brief The verification key */
184+
std::vector<uint8_t> vk;
185+
Response execute(const BBApiRequest& request = {}) &&;
186+
SERIALIZATION_FIELDS(proof, vk);
187+
bool operator==(const ChonkVerifyFromFields&) const = default;
188+
};
189+
160190
/**
161191
* @struct ChonkComputeVk
162192
* @brief Compute MegaHonk verification key for a circuit to be accumulated in Chonk

barretenberg/cpp/src/barretenberg/bbapi/bbapi_execute.hpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ using Command = NamedUnion<AvmProve,
2727
ChonkAccumulate,
2828
ChonkProve,
2929
ChonkVerify,
30+
ChonkVerifyFromFields,
3031
ChonkBatchVerify,
3132
VkAsFields,
3233
MegaVkAsFields,
@@ -90,6 +91,7 @@ using CommandResponse = NamedUnion<ErrorResponse,
9091
ChonkAccumulate::Response,
9192
ChonkProve::Response,
9293
ChonkVerify::Response,
94+
ChonkVerifyFromFields::Response,
9395
ChonkBatchVerify::Response,
9496
VkAsFields::Response,
9597
MegaVkAsFields::Response,

barretenberg/ts/src/bb_backends/node/native_socket.ts

Lines changed: 62 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -173,52 +173,75 @@ export class BarretenbergNativeSocketAsyncBackend implements IMsgpackBackendAsyn
173173
throw new Error(`Path exists but is not a socket: ${this.socketPath}`);
174174
}
175175

176-
// Connect to bb's socket server as a client
177-
return new Promise<void>((resolve, reject) => {
178-
this.socket = net.connect(this.socketPath);
176+
// Connect with retry on ECONNREFUSED. The socket file appears after bb's bind() but
177+
// before its listen(); a connect() landing in that window gets ECONNREFUSED. Retry
178+
// briefly until bb is listening or we hit the 5s budget.
179+
const socket = await this.connectWithRetry(startTime);
180+
this.socket = socket;
179181

180-
// Disable Nagle's algorithm for lower latency
181-
this.socket.setNoDelay(true);
182-
183-
// Set up event handlers
184-
this.socket.once('connect', () => {
185-
// Socket starts referenced - will be unreferenced when no callbacks pending
182+
// Clear connection timeout on successful connection
183+
if (this.connectionTimeout) {
184+
clearTimeout(this.connectionTimeout);
185+
this.connectionTimeout = null;
186+
}
186187

187-
// Clear connection timeout on successful connection
188-
if (this.connectionTimeout) {
189-
clearTimeout(this.connectionTimeout);
190-
this.connectionTimeout = null;
191-
}
192-
resolve();
193-
});
188+
// Set up persistent handlers now that we're connected.
189+
socket.on('data', (chunk: Buffer) => {
190+
this.handleData(chunk);
191+
});
194192

195-
this.socket.once('error', err => {
196-
reject(new Error(`Failed to connect to bb socket: ${err.message}`));
197-
});
193+
socket.on('error', err => {
194+
const error = new Error(`Socket error: ${err.message}`);
195+
for (const callback of this.pendingCallbacks) {
196+
callback.reject(error);
197+
}
198+
this.pendingCallbacks = [];
199+
});
198200

199-
// Set up data handler after connection is established
200-
this.socket.on('data', (chunk: Buffer) => {
201-
this.handleData(chunk);
202-
});
201+
socket.on('end', () => {
202+
const error = new Error('Socket connection ended unexpectedly');
203+
for (const callback of this.pendingCallbacks) {
204+
callback.reject(error);
205+
}
206+
this.pendingCallbacks = [];
207+
});
208+
}
203209

204-
// Handle ongoing errors after initial connection
205-
this.socket.on('error', err => {
206-
// Reject all pending callbacks
207-
const error = new Error(`Socket error: ${err.message}`);
208-
for (const callback of this.pendingCallbacks) {
209-
callback.reject(error);
210+
private async connectWithRetry(startTime: number): Promise<net.Socket> {
211+
let attempt = 0;
212+
let lastErr: Error | undefined;
213+
while (Date.now() - startTime < 5000) {
214+
try {
215+
return await this.attemptConnect();
216+
} catch (err) {
217+
lastErr = err as Error;
218+
const code = (err as NodeJS.ErrnoException).code;
219+
if (code !== 'ECONNREFUSED') {
220+
throw new Error(`Failed to connect to bb socket: ${lastErr.message}`);
210221
}
211-
this.pendingCallbacks = [];
212-
});
222+
// bb has bound the path but not yet called listen(); back off and retry.
223+
const delay = Math.min(50, 5 * 2 ** attempt++);
224+
await new Promise(resolve => setTimeout(resolve, delay));
225+
}
226+
}
227+
throw new Error(`Timeout connecting to bb socket: ${lastErr?.message ?? 'unknown'}`);
228+
}
213229

214-
this.socket.on('end', () => {
215-
// Reject all pending callbacks
216-
const error = new Error('Socket connection ended unexpectedly');
217-
for (const callback of this.pendingCallbacks) {
218-
callback.reject(error);
219-
}
220-
this.pendingCallbacks = [];
221-
});
230+
private attemptConnect(): Promise<net.Socket> {
231+
return new Promise<net.Socket>((resolve, reject) => {
232+
const socket = net.connect(this.socketPath);
233+
socket.setNoDelay(true);
234+
const onConnect = () => {
235+
socket.removeListener('error', onError);
236+
resolve(socket);
237+
};
238+
const onError = (err: Error) => {
239+
socket.removeListener('connect', onConnect);
240+
socket.destroy();
241+
reject(err);
242+
};
243+
socket.once('connect', onConnect);
244+
socket.once('error', onError);
222245
});
223246
}
224247

playground/vite.config.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,9 +136,10 @@ export default defineConfig(({ mode }) => {
136136
// Bump log:
137137
// - AD: bumped from 1600 => 1680 as we now have a 20kb msgpack lib in bb.js and other logic got us 50kb higher, adding some wiggle room.
138138
// - MW: bumped from 1700 => 1750 after adding the noble curves pkg to foundation required for blob batching calculations.
139+
// - CL: bumped from 1750 => 1800 after adding the ChonkVerifyFromFields bbapi variant (PR #23093).
139140
{
140141
pattern: /assets\/index-.*\.js$/,
141-
maxSizeKB: 1750,
142+
maxSizeKB: 1800,
142143
description: 'Main entrypoint, hard limit',
143144
},
144145
// Bump log:

yarn-project/bb-prover/src/avm_proving_tests/avm_proving_tester.ts

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ import { NativeWorldStateService } from '@aztec/world-state';
1717

1818
import path from 'path';
1919

20-
import { BBJsProverFactory } from '../bb/bb_js_backend.js';
20+
import { BBJsFactory } from '../bb/bb_js_backend.js';
2121

2222
const BB_PATH = path.resolve('../../barretenberg/cpp/build/bin/bb-avm');
2323

@@ -32,7 +32,7 @@ const provingConfig: PublicSimulatorConfig = PublicSimulatorConfig.from({
3232
});
3333

3434
export class AvmProvingTester extends PublicTxSimulationTester {
35-
private readonly bbJsFactory = new BBJsProverFactory(BB_PATH);
35+
private readonly bbJsFactory = new BBJsFactory(BB_PATH);
3636

3737
constructor(
3838
private checkCircuitOnly: boolean,
@@ -64,13 +64,15 @@ export class AvmProvingTester extends PublicTxSimulationTester {
6464
const inputsBuffer = avmCircuitInputs.serializeWithMessagePack();
6565

6666
if (this.checkCircuitOnly) {
67-
const { passed, stats } = await this.bbJsFactory.withFreshInstance(i => i.checkAvmCircuit(inputsBuffer));
67+
await using instance = await this.bbJsFactory.getInstance();
68+
const { passed, stats } = await instance.checkAvmCircuit(inputsBuffer);
6869
this.recordProverMetrics(stats, txLabel);
6970
expect(passed).toBe(true);
7071
return [];
7172
}
7273

73-
const { proof, stats } = await this.bbJsFactory.withFreshInstance(i => i.generateAvmProof(inputsBuffer));
74+
await using instance = await this.bbJsFactory.getInstance();
75+
const { proof, stats } = await instance.generateAvmProof(inputsBuffer);
7476
this.recordProverMetrics(stats, txLabel);
7577
return proof;
7678
}
@@ -81,7 +83,8 @@ export class AvmProvingTester extends PublicTxSimulationTester {
8183
return;
8284
}
8385
const piBuffer = publicInputs.serializeWithMessagePack();
84-
const { verified } = await this.bbJsFactory.withFreshInstance(i => i.verifyAvmProof(proof, piBuffer));
86+
await using instance = await this.bbJsFactory.getInstance();
87+
const { verified } = await instance.verifyAvmProof(proof, piBuffer);
8588
expect(verified).toBe(true);
8689
}
8790

0 commit comments

Comments
 (0)