You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test(e2e): pick bad slots upfront and warp to them in proposer invalidates multiple checkpoints (#24017)
Fixes a flake in `proposer invalidates multiple checkpoints`
(`e2e_epochs/epochs_invalidate_block.parallel.test.ts`) reported on
`v5-next`: [failed run](http://ci.aztec-labs.com/e4076dd86c434c6f).
Replaces #24016 (was based on `merge-train/spartan`; this one targets
the v5 line where the flake fired and restructures the test instead of
just resizing the timeout).
## Root cause of the flake
`TimeoutError: Operation timed out after 256000ms` — the bare 8-slot
`timeoutPromise` waiting for the two bad checkpoints. The bad-slot
search from #23608 rejects any candidate pair whose proposer also owns
an earlier un-snapshotted pipelined slot, and the rejection window grows
with each attempt. In the failed run the current slot was 21 and the
search rejected (24,25)…(29,30) before accepting slots **30/31** — 9–10
slots out. The fixed 256s wait expired at 22:48:55, before slot 30 even
began (~22:49:00), while the chain healthily mined checkpoints at slots
22–28 underneath; the run was unwinnable at selection time. The race's
`.then(() => [CheckpointNumber(0), …])` fallback was also dead code,
since `timeoutPromise` rejects.
## Fix: search first, then warp
Instead of starting the sequencers and waiting in real time for whatever
slots the search lands on:
- With sequencers stopped, search for a `warpSlot` such that the
proposers of the three lead-in slots `warpSlot+1..warpSlot+3` are not
the proposers of the bad slots `warpSlot+4`/`warpSlot+5`. A far-away
candidate now costs a warp instead of a real-time wait, and
`EpochNotStable` during the search is handled by warping forward one
epoch (same pattern as the `archiver skips a descendant` test in this
file).
- Warp to one L1 block before `warpSlot`, so sequencers get a full L2
slot to boot before the first pipelined build window we rely on (end of
`warpSlot`, targeting `warpSlot+1`).
- Start the sequencers and wait for the first good checkpoint (lands at
`warpSlot`, or up to `warpSlot+2` on a slow start).
- Apply the malicious config to the bad-slot proposers. The three good
lead-in slots guarantee no pipelined job before `badSlot1` can snapshot
it, since jobs snapshot config during the last L1 slot of the previous
L2 slot.
- Fail fast with a clear assertion if config application was somehow
late enough to reach `badSlot1`'s build window, rather than timing out
opaquely.
- The 8-slot wait for the bad checkpoints is now correctly sized by
construction (`badSlot2` is at most ~6 slots from the wait start), and
gets a descriptive timeout message.
Worst case the wait phase is bounded at ~6 slots regardless of how many
candidates the search rejects, where previously each rejected candidate
pushed the bad checkpoints one slot further past the fixed timeout.
---
*Created by
[claudebox](https://claudebox.work/v2/sessions/d509a218614bf4ac) ·
group: `slackbot`*
0 commit comments