Skip to content

fix: prevent building on orphan proposed blocks#23606

Open
spalladino wants to merge 1 commit into
merge-train/spartanfrom
spl/fix-missing-checkpoint-proposal
Open

fix: prevent building on orphan proposed blocks#23606
spalladino wants to merge 1 commit into
merge-train/spartanfrom
spl/fix-missing-checkpoint-proposal

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

Motivation

Under proposer pipelining a node can receive and reexecute the block-only proposals for a checkpoint before (or without ever) receiving the enclosing proposed checkpoint. This leaves the local tip one checkpoint ahead of the checkpointed tip with no proposed checkpoint backing it. A sequencer that then builds the next checkpoint on top of that orphan tip forks the chain off a parent no other node can follow, which was the root cause behind the sentinel CI flake.

Approach

Two complementary defenses. The sequencer's checkSync refuses to proceed when the synced block's checkpoint is ahead of the checkpointed tip and no matching proposed checkpoint exists, holding the line during the window before cleanup. The archiver adds a wall-clock orphan prune that, shortly after a block's build slot ends, removes a block-only tip whose checkpoint was never proposed, restoring liveness even while L1 is quiet.

Changes

  • sequencer-client: checkSync rejects syncing onto a proposed block with no matching proposed-checkpoint tip/data, logging a descriptive warning.
  • archiver: new pruneOrphanProposedBlocks on the L1 synchronizer, run from Archiver.sync() after the inbound queue drains and before L1 sync; prunes after start(blockSlot) + grace using the epoch-cache pipelining offset and emits L2PruneUncheckpointed. The existing L1-sync prune is preserved (shared prune/emit helper).
  • archiver/stdlib/foundation config: new orphanProposedBlockPruneGraceSeconds in ArchiverSpecificConfig, archiver config mappings (ARCHIVER_ORPHAN_PROPOSED_BLOCK_PRUNE_GRACE_SECONDS), mapArchiverConfig, the synchronizer/archiver config types, and a new EnvVar.
  • aztec-node: defaults the grace window from blockDurationMs / 1000 when unset, falling back to MIN_EXECUTION_TIME; the archiver factory also defaults to MIN_EXECUTION_TIME.
  • sequencer-client (tests): orphan tip returns undefined and warns; matching proposed checkpoint proceeds.
  • archiver (tests): no prune before grace; prune + event after grace; no prune when a matching proposed checkpoint exists; queued proposed checkpoint is processed before the prune.

@spalladino spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 27, 2026
@AztecBot
Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/bf10bc4debb0b908�bf10bc4debb0b9088;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/sentinel_status_slash.parallel.test.ts "slashes the proposer with INACTIVITY when checkpoint validation records unvalidated" (215s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants