fix(archiver): prune blocks without proposed checkpoint by end of build slot by spalladino · Pull Request #23606 · AztecProtocol/aztec-packages

spalladino · 2026-05-27T20:41:04Z

When the previous proposer sent some block proposals but failed to send the corresponding checkpoint proposal, the current proposer would assume there was no proposed checkpoint to build on top of, but would still use the proposed blocks as chain tip. This meant a failed canPropose check against the Rollup contract as soon as it started its slot, since the proposed blocks from the previous proposer meant the proposer had a wrong chain tip.

To fix, the sequencer is now aware that there may be proposed blocks without the corresponding checkpoints, and it can't start building until that's resolved. Also, the archiver now prunes proposed blocks without a checkpoint when the corresponding build slot is over.

Motivation

Under proposer pipelining a node can receive and reexecute the block-only proposals for a checkpoint before (or without ever) receiving the enclosing proposed checkpoint. This leaves the local tip one checkpoint ahead of the checkpointed tip with no proposed checkpoint backing it. A sequencer that then builds the next checkpoint on top of that orphan tip forks the chain off a parent no other node can follow, which was the root cause behind the sentinel CI flake.

Approach

Two complementary defenses. The sequencer's checkSync refuses to proceed when the synced block's checkpoint is ahead of the checkpointed tip and no matching proposed checkpoint exists, holding the line during the window before cleanup. The archiver adds a wall-clock orphan prune that, shortly after a block's build slot ends, removes a block-only tip whose checkpoint was never proposed, restoring liveness even while L1 is quiet.

Changes

sequencer-client: checkSync rejects syncing onto a proposed block with no matching proposed-checkpoint tip/data, logging a descriptive warning.
archiver: new pruneOrphanProposedBlocks on the L1 synchronizer, run from Archiver.sync() after the inbound queue drains and before L1 sync; prunes after start(blockSlot) + grace using the epoch-cache pipelining offset and emits L2PruneUncheckpointed. The existing L1-sync prune is preserved (shared prune/emit helper).
archiver/stdlib/foundation config: new orphanProposedBlockPruneGraceSeconds in ArchiverSpecificConfig, archiver config mappings (ARCHIVER_ORPHAN_PROPOSED_BLOCK_PRUNE_GRACE_SECONDS), mapArchiverConfig, the synchronizer/archiver config types, and a new EnvVar.
aztec-node: defaults the grace window from blockDurationMs / 1000 when unset, falling back to MIN_EXECUTION_TIME; the archiver factory also defaults to MIN_EXECUTION_TIME.
sequencer-client (tests): orphan tip returns undefined and warns; matching proposed checkpoint proceeds.
archiver (tests): no prune before grace; prune + event after grace; no prune when a matching proposed checkpoint exists; queued proposed checkpoint is processed before the prune.

AztecBot · 2026-05-27T21:06:48Z

Flakey Tests

🤖 says: This CI run detected 2 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/b83902e17e7bb944�b83902e17e7bb9448;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "archiver skips a descendant of an invalid-attestations checkpoint" (226s) (code: 0) group:e2e-p2p-epoch-flakes
\033FLAKED\033 (8;;http://ci.aztec-labs.com/18798bcaff695f1b�18798bcaff695f1b8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "proposer invalidates multiple checkpoints" (490s) (code: 0) group:e2e-p2p-epoch-flakes

Adds a multi-node e2e (`epochs_orphan_block_prune.test.ts`) that exercises both defenses from #23606 end-to-end: it picks consecutive distinct proposers P1/P2, makes P1 publish its block but withhold the matching CheckpointProposal, then asserts that every archiver (a) ingests the orphan block at slot S1, (b) prunes it via the wall-clock orphan prune, and (c) lets P2 rebuild block 1 at slot S2 with a checkpoint that lands on L1. To enable that scenario in single-block-per-checkpoint mode, adds a new test-only `skipBroadcastCheckpointProposal` sequencer config. It is a narrower variant of the existing `skipBroadcastProposals`: when set, the sequencer skips the CheckpointProposal gossip broadcast but still broadcasts the held last block standalone, so peers receive every block yet never see a proposed checkpoint.

`pruneOrphanProposedBlocks` is wall-clock based and does not touch L1, so it belongs on the `Archiver` rather than the `ArchiverL1Synchronizer`. Moves the method (and its `epochCache` / `dateProvider` dependencies) onto the `Archiver`, called directly from `sync()` between `processInboundQueue()` and `syncFromL1()`. The synchronizer keeps the L1-block-driven `pruneUncheckpointedBlocks` (used to clear late stale blocks once L1 advances past their slot); its inline emit is now duplicated in both prune paths to keep them self-contained. No behavior change — verified by the existing orphan-prune unit tests in `archiver-sync.test.ts` and the full archiver suite.

PhilWindle · 2026-05-29T08:41:15Z

+    // The L1 rollup contract only exposes proposers for epochs whose randao seed is "stable" (i.e. queryable on L1
+    // right now). When we look too far into the future the contract reverts with `ValidatorSelection__EpochNotStable`.
+    // We handle this by warping L1 forward one epoch at a time and retrying.
+    let S1: SlotNumber | undefined;


Not needed for this PR, but I feel like we have variations of this same code in many places.

Agree. Inbetween pipelining and inbox I want to allocate some time to e2e refactoring.

…due (#23807) ## Motivation The orphan-block guard in `checkSync` (added in #23606) was logging at `warn` on every non-proposer validator, ~once per second for a full slot, every slot. Under pipelining a node receives and re-executes a block proposal for the next checkpoint up to one slot before the matching checkpoint proposal arrives, so the world-state tip legitimately sits in an as-yet-unproposed checkpoint for that whole window. That is the happy path, not the abnormal "proposer published blocks but never the checkpoint" case the guard is meant to flag. Observed on `next-net`: 118 warnings in ~59s on a healthy validator for a single slot. ## Approach The condition that distinguishes "checkpoint hasn't arrived yet" from "checkpoint will never arrive" is purely temporal — which is exactly what the archiver already computes in `pruneOrphanProposedBlocks` to decide when to prune an orphan block. The guard now reuses that same deadline: it still refuses to build (`return undefined`) whenever the orphan-shaped state holds, but only escalates to `warn` once the enclosing checkpoint is overdue by that deadline; within the normal pipelining window it logs at `debug`. The warn therefore fires at the same instant the archiver would prune the orphan. ## Changes - **sequencer-client**: Add `isProposedCheckpointOverdue`, mirroring the archiver's orphan-prune deadline (`start of slot after the block's build slot + grace`, grace derived from `blockDurationMs` as the node wiring does). Gate the existing guard's log level on it — `warn` when overdue, `debug` otherwise. Control flow is unchanged. - **sequencer-client (tests)**: Thread a real `blockSlot` through the orphan-guard test setup and split the warning test into an overdue case (expects `warn`) and a within-window case (expects no `warn`).

## Problem CI on `merge-train/fairies` failed on the boxes `react chromium` test ([log](http://ci.aztec-labs.com/1780510430908759), [failing test](http://ci.aztec-labs.com/243e7294cb8ba269)) with a timeout (code 124). The actual error was during `aztec start` / `createLocalNetwork`: ``` Error: Transaction 0x0826… was dropped. Reason: Tx dropped by P2P node at NodeEmbeddedWallet.sendTx at DeployAccountMethod.send at deployFundedSchnorrAccounts at createLocalNetwork at aztecStart ``` The local network never came up, so the browser test timed out. ## Root cause PR #23819 ("embedded wallet defaults to proposed") fixed the embedded wallet so its default wait status is *actually* `PROPOSED` — previously the default was a no-op that fell through to `waitForTx`'s `CHECKPOINTED` default. `PROPOSED` returns as soon as a tx lands in a proposed L2 block. In the serial sandbox setup that races against block pruning: a proposed-but-not-checkpointed block can be pruned by end of build slot (see #23606), and a tx in it is then neither in the archiver nor the pool, so `getTxReceipt` returns `DroppedTxReceipt("Tx dropped by P2P node")`. With the old broken default this path waited for `CHECKPOINTED` and was reliable. The real source of flakiness is the local network setup, not the boxes. ## Fix Thread an explicit `{ waitForStatus: TxStatus.CHECKPOINTED }` wait through the sandbox-setup sends: - `createLocalNetwork`: `deployFundedSchnorrAccounts`, `publishStandardAuthRegistry`, `setupBananaFPC` - `setup-l2-contracts` CLI wait options The intended product default of `PROPOSED` for normal wallet usage is unchanged; only the CI/sandbox bring-up that needs durable inclusion before the next serial tx is pinned to `CHECKPOINTED`. e2e fixtures use `TestWallet` (BaseWallet's `CHECKPOINTED` default) and are unaffected. Also reverts the per-box `CHECKPOINTED` waits that #23819 added to the react/vite/vanilla boxes: they didn't fix the flakiness (the local-network setup did), so the box sends go back to using the embedded wallet `PROPOSED` default. ## Verification TypeScript-only change in `yarn-project` plus box reverts; the box files now match their pre-#23819 state exactly. A full `./bootstrap.sh ci` could not be run in this container (clang 18 vs required 20, zig missing, no remote build cache; the suite is multi-hour). Confirmed by the merge-train CI re-run of the boxes tests.

BEGIN_COMMIT_OVERRIDE test(e2e): unskip pipelining related e2e tests (AztecProtocol#23642) fix(archiver): prune blocks without proposed checkpoint by end of build slot (AztecProtocol#23606) test: migrate benchmarks to pipelining setup (AztecProtocol#23647) fix(p2p): fall back to archiver in BLOCK_TXS response validation (AztecProtocol#23624) docs(slashing): align operator and slasher docs with AZIP-7 (AztecProtocol#23494) fix(p2p): do not penalize peers that signal a missing block with Fr.ZERO (AztecProtocol#23672) chore: adjust metrics deployment (AztecProtocol#23676) fix(cheat-codes): warpL2TimeAtLeastBy advances relative to leading clock (AztecProtocol#23675) chore: tighten node pool sizes (AztecProtocol#23678) chore: remove archival nodes (AztecProtocol#23630) chore: merge blob sink duties into RPC node (AztecProtocol#23631) fix: sync avm-transpiler Cargo.lock with noir submodule (AztecProtocol#23683) fix(spartan): set validator lag env vars in tps-scenario (AztecProtocol#23684) fix: make world-state hash queries reorg-aware to close getWorldState race (AztecProtocol#23677) fix: pin noir submodule to next's version on merge-train/spartan (AztecProtocol#23690) fix: ensure image ref is used by bench runner (AztecProtocol#23682) fix(ci): retry aztec-nr nargo dependency clone on transient network flake (AztecProtocol#23653) chore: run one-off jobs on network nodes (AztecProtocol#23701) fix: simulate proposals inside target slot (AztecProtocol#23692) chore: smaller eth-devnet (AztecProtocol#23704) chore: enable testnet autoscaling (AztecProtocol#23705) feat(api)!: redesign node log retrieval API around tag-based queries (AztecProtocol#23625) fix(sequencer): set own proposed checkpoint locally instead of via p2p loopback (AztecProtocol#23659) END_COMMIT_OVERRIDE

fix: prevent building on orphan proposed blocks

2f77221

spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 27, 2026

spalladino added 4 commits May 28, 2026 10:37

docs(archiver): document wall-clock orphan-block prune

516de43

fix: archiver prune condition

0c5e020

spalladino force-pushed the spl/fix-missing-checkpoint-proposal branch from d6d80f9 to 0c5e020 Compare May 28, 2026 15:27

test: tighten orphan block prune e2e assertions

4c8655a

spalladino changed the title ~~fix: prevent building on orphan proposed blocks~~ fix(archiver): prune blocks without proposed checkpoint by end of build slot May 28, 2026

PhilWindle reviewed May 29, 2026

View reviewed changes

PhilWindle approved these changes May 29, 2026

View reviewed changes

PhilWindle merged commit a612452 into merge-train/spartan May 29, 2026
17 checks passed

PhilWindle deleted the spl/fix-missing-checkpoint-proposal branch May 29, 2026 08:49

AztecBot mentioned this pull request May 29, 2026

feat: merge-train/spartan #23671

Merged

spalladino mentioned this pull request Jun 2, 2026

fix(sequencer): only warn about missing proposed checkpoint once overdue #23807

Merged

AztecBot mentioned this pull request Jun 3, 2026

fix: wait for checkpoint during sandbox setup #23834

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(archiver): prune blocks without proposed checkpoint by end of build slot#23606

fix(archiver): prune blocks without proposed checkpoint by end of build slot#23606
PhilWindle merged 6 commits into
merge-train/spartanfrom
spl/fix-missing-checkpoint-proposal

spalladino commented May 27, 2026 •

edited

Loading

Uh oh!

AztecBot commented May 27, 2026 •

edited

Loading

Uh oh!

PhilWindle May 29, 2026

Uh oh!

spalladino Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

spalladino commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Approach

Changes

Uh oh!

AztecBot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flakey Tests

Uh oh!

PhilWindle May 29, 2026

Choose a reason for hiding this comment

Uh oh!

spalladino Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spalladino commented May 27, 2026 •

edited

Loading

AztecBot commented May 27, 2026 •

edited

Loading