Skip to content

Commit 093fdde

Browse files
committed
feat(prover-node): checkpoint-driven optimistic proving
Drives the prover-node onto the split `CheckpointSubTreeOrchestrator` + `TopTreeOrchestrator` pair, with checkpoint-driven proving that pipelines sub-trees against tx-gathering and the top-tree against the in-flight sub-trees. ## What's new ### `prover-node` — `EpochProvingJob` job-model rewrite `EpochProvingJob` becomes an orchestrator over a `Map<string, CheckpointJob>` keyed by `${number}:${slot}`. Each `CheckpointJob` owns a single `CheckpointSubTreeOrchestrator` with its own per-checkpoint context (txs, attestations, previous-block header, l1ToL2 messages, archive sibling path). A `TopTreeJob` drives the epoch root rollup once all checkpoint sub-trees have started block-level proving. Public API: - `registerCheckpoint` — synchronous; sets up sub-tree, kicks off chonk-verifier cache fill, attaches the `blockProofs` promise to the eventual top-tree job. - `provideTxs` — supplies simulated txs, transitions the checkpoint job from registered → block-proving. - `removeCheckpoint(synchronous, idempotent)` — drops a single checkpoint by `(number, slot)`, fire-and-forget cancels its sub-tree. Tolerates re-add of the same checkpoint number under a different slot. - `removeCheckpointsAfter`, `getCheckpointCount`, `getCheckpointNumbers`, `cancelPendingCheckpoints`. ### `prover-node` — `L2BlockStream`-driven checkpoint pipeline The prover-node consumes `chain-checkpointed` / `chain-pruned` events from an `L2BlockStream` rooted at the first block of the first unproven epoch. On each `chain-checkpointed`: 1. Resolve the epoch via `getEpochAtSlot`. 2. Get-or-create the per-epoch `EpochProvingJob`. 3. Detached-task gather txs + register the checkpoint with the job. On `chain-pruned`: call `removeCheckpointsAfter(threshold)` on every job whose first checkpoint sits at or above the threshold. Pending gather tasks are cancelled via `AbortSignal`. Finalization is driven by the union of three signals: epoch-monitor sees the epoch close on L1, a checkpoint for a strictly later epoch arrives, or all expected checkpoints (per archiver) are registered while the epoch is already complete on L1. ### `prover-node` — reorg-after-finalization restart When the L2BlockStream emits a prune that retroactively invalidates an epoch already in finalize, the prover-node aborts the in-flight publish, clears the job, and restarts proving from the new tip. ### e2e - New `epochs_optimistic_proving.parallel.test.ts`: full e2e covering the pipelining, replacement-checkpoint reuse, and reorg-during-proving paths. - `epochs_proof_fails`, `epochs_upload_failed_proof`, `epochs_long_proving_time`, `epochs_multi_proof` updated to assert the new in-flight epoch behaviour. ## What's removed `EpochProver` interface and `ServerEpochProver` are removed: the prover-node no longer drives a single-class epoch prover, so the legacy API has no production callers. `ProvingOrchestrator` survives only as a base class for `CheckpointSubTreeOrchestrator` and as the single-class driver used by `prover-client`'s integration tests; it no longer implements `EpochProver`. ## Test plan - `yarn workspace @aztec/prover-client test` — 261 tests pass. - `yarn workspace @aztec/prover-node test` — 89 tests pass. - e2e tests covering optimistic proving, reorgs during proving, failed proof publish, and multi-checkpoint flows are included in this PR.
1 parent 7342efc commit 093fdde

25 files changed

Lines changed: 4038 additions & 856 deletions

yarn-project/aztec-node/src/aztec-node/server.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,6 @@ export class AztecNodeService implements AztecNode, AztecNodeAdmin, AztecNodeDeb
419419
async #getCheckpointContextsForBlocks(
420420
blocks: { checkpointNumber: CheckpointNumber }[],
421421
// TODO(palla): CheckpointNumber should be accepted by this lint rule
422-
// eslint-disable-next-line aztec-custom/no-non-primitive-in-collections
423422
): Promise<Map<CheckpointNumber, { l1?: L1PublishedData; attestations?: CommitteeAttestation[] } | undefined>> {
424423
const unique = Array.from(new Set(blocks.map(b => b.checkpointNumber)));
425424
const entries = await Promise.all(unique.map(async n => [n, await this.#getCheckpointContext(n)] as const));

yarn-project/end-to-end/bootstrap.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,12 @@ function test_cmds {
3333
echo "$prefix:NAME=e2e_prover_full_fake FAKE_PROOFS=1 $run_test_script simple e2e_prover/full"
3434
fi
3535
echo "$prefix:TIMEOUT=15m:NAME=e2e_block_building $(set_dump_avm e2e_block_building) $run_test_script simple e2e_block_building"
36+
echo "$prefix:TIMEOUT=15m:NAME=e2e_epochs/epochs_long_proving_time $run_test_script simple src/e2e_epochs/epochs_long_proving_time.test.ts"
3637

3738
local tests=(
3839
# List all standalone and nested tests, except for the ones listed above.
39-
src/e2e_!(prover)/*.test.ts
40+
src/e2e_!(prover|epochs)/*.test.ts
41+
src/e2e_epochs/!(epochs_long_proving_time).test.ts
4042
src/e2e_p2p/reqresp/*.test.ts
4143
src/e2e_!(block_building).test.ts
4244
)

yarn-project/end-to-end/src/e2e_epochs/epochs_long_proving_time.test.ts

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ import { jest } from '@jest/globals';
66

77
import { EpochsTestContext } from './epochs_test.js';
88

9-
jest.setTimeout(1000 * 60 * 10);
9+
jest.setTimeout(1000 * 60 * 15);
10+
11+
const MAX_JOB_COUNT = 20;
1012

1113
describe('e2e_epochs/epochs_long_proving_time', () => {
1214
let logger: Logger;
@@ -24,11 +26,14 @@ describe('e2e_epochs/epochs_long_proving_time', () => {
2426
const { aztecSlotDuration } = EpochsTestContext.getSlotDurations({ aztecEpochDuration });
2527
const epochDurationInSeconds = aztecSlotDuration * aztecEpochDuration;
2628
const proverTestDelayMs = (epochDurationInSeconds * 1000 * 3) / 4;
29+
// Each epoch takes ~3 epochs to prove, so the broker needs to keep results for
30+
// at least that many epochs to avoid rejecting jobs as stale.
2731
test = await EpochsTestContext.setup({
2832
aztecEpochDuration,
2933
aztecProofSubmissionEpochs: 1000, // Effectively don't re-org
3034
proverTestDelayMs,
31-
proverNodeMaxPendingJobs: 1, // We test for only a single job at once
35+
proverNodeMaxPendingJobs: MAX_JOB_COUNT, // Prove multiple epochs concurrently
36+
proverBrokerMaxEpochsToKeepResultsFor: 10,
3237
enableProposerPipelining: true,
3338
});
3439
({ logger, monitor, L1_BLOCK_TIME_IN_S } = test);
@@ -59,10 +64,7 @@ describe('e2e_epochs/epochs_long_proving_time', () => {
5964
// At least 3 epochs should have passed after the proven one (though we add a -1 just in case)
6065
expect(monitor.checkpointNumber).toBeGreaterThanOrEqual(targetProvenEpochs * test.epochDuration * 3 - 1);
6166

62-
// We expect maxJobCount to equal 1, since the prover node epoch monitor defines an epoch as ready to be proven
63-
// only if the previous one has already been proven. We can relax this check if we want to support multiple epochs
64-
// to be proven in parallel, in which case we should update the assertion below.
65-
expect(maxJobCount).toEqual(1);
66-
logger.info(`Test succeeded`);
67+
expect(maxJobCount).toBeLessThanOrEqual(MAX_JOB_COUNT);
68+
logger.info(`Test succeeded, max prover jobs ${maxJobCount}`);
6769
});
6870
});

yarn-project/end-to-end/src/e2e_epochs/epochs_multi_proof.test.ts

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,18 @@ describe('e2e_epochs/epochs_multi_proof', () => {
4646
// This prevents the race condition where multiple provers submit to L1 at the same time
4747
test.proverNodes.forEach((proverAztecNode, index) => {
4848
const proverManager = proverAztecNode.getProverNode()!.getProver();
49-
const origCreateEpochProver = proverManager.createEpochProver.bind(proverManager);
50-
proverManager.createEpochProver = () => {
51-
const epochProver = origCreateEpochProver();
52-
const origFinalizeEpoch = epochProver.finalizeEpoch.bind(epochProver);
53-
epochProver.finalizeEpoch = async () => {
54-
const result = await origFinalizeEpoch();
49+
const origCreateTopTree = proverManager.createTopTreeOrchestrator.bind(proverManager);
50+
proverManager.createTopTreeOrchestrator = () => {
51+
const topTree = origCreateTopTree();
52+
const origProve = topTree.prove.bind(topTree);
53+
topTree.prove = async (...args: Parameters<typeof origProve>) => {
54+
const result = await origProve(...args);
5555
const sleepTime = index * 1000 * test.constants.ethereumSlotDuration;
56-
logger.warn(`Delaying finalizeEpoch for prover node ${index} by ${sleepTime}ms`);
56+
logger.warn(`Delaying top-tree prove for prover node ${index} by ${sleepTime}ms`);
5757
await sleep(sleepTime);
5858
return result;
5959
};
60-
return epochProver;
60+
return topTree;
6161
};
6262
});
6363

0 commit comments

Comments
 (0)