Skip to content

Commit 3694a9b

Browse files
committed
feat(prover-node): checkpoint-driven optimistic proving
Drives the prover-node onto the split `CheckpointSubTreeOrchestrator` + `TopTreeOrchestrator` pair, with checkpoint-driven proving that pipelines sub-trees against tx-gathering and the top-tree against the in-flight sub-trees. ## What's new ### `prover-node` — `EpochProvingJob` job-model rewrite `EpochProvingJob` becomes an orchestrator over a `Map<string, CheckpointJob>` keyed by `${number}:${slot}`. Each `CheckpointJob` owns a single `CheckpointSubTreeOrchestrator` with its own per-checkpoint context (txs, attestations, previous-block header, l1ToL2 messages, archive sibling path). A `TopTreeJob` drives the epoch root rollup once all checkpoint sub-trees have started block-level proving. Public API: - `registerCheckpoint` — synchronous; sets up sub-tree, kicks off chonk-verifier cache fill, attaches the `blockProofs` promise to the eventual top-tree job. - `provideTxs` — supplies simulated txs, transitions the checkpoint job from registered → block-proving. - `removeCheckpoint(synchronous, idempotent)` — drops a single checkpoint by `(number, slot)`, fire-and-forget cancels its sub-tree. Tolerates re-add of the same checkpoint number under a different slot. - `removeCheckpointsAfter`, `getCheckpointCount`, `getCheckpointNumbers`, `cancelPendingCheckpoints`. ### `prover-node` — `L2BlockStream`-driven checkpoint pipeline The prover-node consumes `chain-checkpointed` / `chain-pruned` events from an `L2BlockStream` rooted at the first block of the first unproven epoch. On each `chain-checkpointed`: 1. Resolve the epoch via `getEpochAtSlot`. 2. Get-or-create the per-epoch `EpochProvingJob`. 3. Detached-task gather txs + register the checkpoint with the job. On `chain-pruned`: call `removeCheckpointsAfter(threshold)` on every job whose first checkpoint sits at or above the threshold. Pending gather tasks are cancelled via `AbortSignal`. Finalization is driven by the union of three signals: epoch-monitor sees the epoch close on L1, a checkpoint for a strictly later epoch arrives, or all expected checkpoints (per archiver) are registered while the epoch is already complete on L1. ### `prover-node` — reorg-after-finalization restart When the L2BlockStream emits a prune that retroactively invalidates an epoch already in finalize, the prover-node aborts the in-flight publish, clears the job, and restarts proving from the new tip. ### e2e - New `epochs_optimistic_proving.parallel.test.ts`: full e2e covering the pipelining, replacement-checkpoint reuse, and reorg-during-proving paths. - `epochs_proof_fails`, `epochs_upload_failed_proof`, `epochs_long_proving_time`, `epochs_multi_proof` updated to assert the new in-flight epoch behaviour. ## What's removed `EpochProver` interface and `ServerEpochProver` are removed: the prover-node no longer drives a single-class epoch prover, so the legacy API has no production callers. `ProvingOrchestrator` survives only as a base class for `CheckpointSubTreeOrchestrator` and as the single-class driver used by `prover-client`'s integration tests; it no longer implements `EpochProver`. ## Test plan - `yarn workspace @aztec/prover-client test` — 261 tests pass. - `yarn workspace @aztec/prover-node test` — 89 tests pass. - e2e tests covering optimistic proving, reorgs during proving, failed proof publish, and multi-checkpoint flows are included in this PR.
1 parent da67473 commit 3694a9b

22 files changed

Lines changed: 3336 additions & 856 deletions

yarn-project/end-to-end/bootstrap.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,12 @@ function test_cmds {
3333
echo "$prefix:NAME=e2e_prover_full_fake FAKE_PROOFS=1 $run_test_script simple e2e_prover/full"
3434
fi
3535
echo "$prefix:TIMEOUT=15m:NAME=e2e_block_building $(set_dump_avm e2e_block_building) $run_test_script simple e2e_block_building"
36+
echo "$prefix:TIMEOUT=15m:NAME=e2e_epochs/epochs_long_proving_time $run_test_script simple src/e2e_epochs/epochs_long_proving_time.test.ts"
3637

3738
local tests=(
3839
# List all standalone and nested tests, except for the ones listed above.
39-
src/e2e_!(prover)/*.test.ts
40+
src/e2e_!(prover|epochs)/*.test.ts
41+
src/e2e_epochs/!(epochs_long_proving_time).test.ts
4042
src/e2e_p2p/reqresp/*.test.ts
4143
src/e2e_!(block_building).test.ts
4244
)

yarn-project/end-to-end/src/e2e_epochs/epochs_long_proving_time.test.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import { jest } from '@jest/globals';
66

77
import { EpochsTestContext } from './epochs_test.js';
88

9-
jest.setTimeout(1000 * 60 * 10);
9+
jest.setTimeout(1000 * 60 * 15);
1010

1111
describe('e2e_epochs/epochs_long_proving_time', () => {
1212
let logger: Logger;
@@ -24,11 +24,14 @@ describe('e2e_epochs/epochs_long_proving_time', () => {
2424
const { aztecSlotDuration } = EpochsTestContext.getSlotDurations({ aztecEpochDuration });
2525
const epochDurationInSeconds = aztecSlotDuration * aztecEpochDuration;
2626
const proverTestDelayMs = (epochDurationInSeconds * 1000 * 3) / 4;
27+
// Each epoch takes ~3 epochs to prove, so the broker needs to keep results for
28+
// at least that many epochs to avoid rejecting jobs as stale.
2729
test = await EpochsTestContext.setup({
2830
aztecEpochDuration,
2931
aztecProofSubmissionEpochs: 1000, // Effectively don't re-org
3032
proverTestDelayMs,
31-
proverNodeMaxPendingJobs: 1, // We test for only a single job at once
33+
proverNodeMaxPendingJobs: 2,
34+
proverBrokerMaxEpochsToKeepResultsFor: 10,
3235
enableProposerPipelining: true,
3336
});
3437
({ logger, monitor, L1_BLOCK_TIME_IN_S } = test);
@@ -59,10 +62,7 @@ describe('e2e_epochs/epochs_long_proving_time', () => {
5962
// At least 3 epochs should have passed after the proven one (though we add a -1 just in case)
6063
expect(monitor.checkpointNumber).toBeGreaterThanOrEqual(targetProvenEpochs * test.epochDuration * 3 - 1);
6164

62-
// We expect maxJobCount to equal 1, since the prover node epoch monitor defines an epoch as ready to be proven
63-
// only if the previous one has already been proven. We can relax this check if we want to support multiple epochs
64-
// to be proven in parallel, in which case we should update the assertion below.
65-
expect(maxJobCount).toEqual(1);
65+
expect(maxJobCount).toBeLessThanOrEqual(2);
6666
logger.info(`Test succeeded`);
6767
});
6868
});

yarn-project/end-to-end/src/e2e_epochs/epochs_multi_proof.test.ts

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,18 @@ describe('e2e_epochs/epochs_multi_proof', () => {
4646
// This prevents the race condition where multiple provers submit to L1 at the same time
4747
test.proverNodes.forEach((proverAztecNode, index) => {
4848
const proverManager = proverAztecNode.getProverNode()!.getProver();
49-
const origCreateEpochProver = proverManager.createEpochProver.bind(proverManager);
50-
proverManager.createEpochProver = () => {
51-
const epochProver = origCreateEpochProver();
52-
const origFinalizeEpoch = epochProver.finalizeEpoch.bind(epochProver);
53-
epochProver.finalizeEpoch = async () => {
54-
const result = await origFinalizeEpoch();
49+
const origCreateTopTree = proverManager.createTopTreeOrchestrator.bind(proverManager);
50+
proverManager.createTopTreeOrchestrator = () => {
51+
const topTree = origCreateTopTree();
52+
const origProve = topTree.prove.bind(topTree);
53+
topTree.prove = async (...args: Parameters<typeof origProve>) => {
54+
const result = await origProve(...args);
5555
const sleepTime = index * 1000 * test.constants.ethereumSlotDuration;
56-
logger.warn(`Delaying finalizeEpoch for prover node ${index} by ${sleepTime}ms`);
56+
logger.warn(`Delaying top-tree prove for prover node ${index} by ${sleepTime}ms`);
5757
await sleep(sleepTime);
5858
return result;
5959
};
60-
return epochProver;
60+
return topTree;
6161
};
6262
});
6363

0 commit comments

Comments
 (0)