Skip to content

Commit 227a74e

Browse files
authored
test(e2e): pick bad slots upfront and warp to them in proposer invalidates multiple checkpoints (#24017)
Fixes a flake in `proposer invalidates multiple checkpoints` (`e2e_epochs/epochs_invalidate_block.parallel.test.ts`) reported on `v5-next`: [failed run](http://ci.aztec-labs.com/e4076dd86c434c6f). Replaces #24016 (was based on `merge-train/spartan`; this one targets the v5 line where the flake fired and restructures the test instead of just resizing the timeout). ## Root cause of the flake `TimeoutError: Operation timed out after 256000ms` — the bare 8-slot `timeoutPromise` waiting for the two bad checkpoints. The bad-slot search from #23608 rejects any candidate pair whose proposer also owns an earlier un-snapshotted pipelined slot, and the rejection window grows with each attempt. In the failed run the current slot was 21 and the search rejected (24,25)…(29,30) before accepting slots **30/31** — 9–10 slots out. The fixed 256s wait expired at 22:48:55, before slot 30 even began (~22:49:00), while the chain healthily mined checkpoints at slots 22–28 underneath; the run was unwinnable at selection time. The race's `.then(() => [CheckpointNumber(0), …])` fallback was also dead code, since `timeoutPromise` rejects. ## Fix: search first, then warp Instead of starting the sequencers and waiting in real time for whatever slots the search lands on: - With sequencers stopped, search for a `warpSlot` such that the proposers of the three lead-in slots `warpSlot+1..warpSlot+3` are not the proposers of the bad slots `warpSlot+4`/`warpSlot+5`. A far-away candidate now costs a warp instead of a real-time wait, and `EpochNotStable` during the search is handled by warping forward one epoch (same pattern as the `archiver skips a descendant` test in this file). - Warp to one L1 block before `warpSlot`, so sequencers get a full L2 slot to boot before the first pipelined build window we rely on (end of `warpSlot`, targeting `warpSlot+1`). - Start the sequencers and wait for the first good checkpoint (lands at `warpSlot`, or up to `warpSlot+2` on a slow start). - Apply the malicious config to the bad-slot proposers. The three good lead-in slots guarantee no pipelined job before `badSlot1` can snapshot it, since jobs snapshot config during the last L1 slot of the previous L2 slot. - Fail fast with a clear assertion if config application was somehow late enough to reach `badSlot1`'s build window, rather than timing out opaquely. - The 8-slot wait for the bad checkpoints is now correctly sized by construction (`badSlot2` is at most ~6 slots from the wait start), and gets a descriptive timeout message. Worst case the wait phase is bounded at ~6 slots regardless of how many candidates the search rejects, where previously each rejected candidate pushed the bad checkpoints one slot further past the fixed timeout. --- *Created by [claudebox](https://claudebox.work/v2/sessions/d509a218614bf4ac) · group: `slackbot`*
1 parent b2bcdba commit 227a74e

1 file changed

Lines changed: 80 additions & 55 deletions

File tree

yarn-project/end-to-end/src/e2e_epochs/epochs_invalidate_block.parallel.test.ts

Lines changed: 80 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -365,71 +365,87 @@ describe('e2e_epochs/epochs_invalidate_block', () => {
365365
// second invalid checkpoint will also have invalid attestations, we are *not* testing the scenario where the
366366
// committee is malicious (or incompetent) and attests for the descendent of an invalid checkpoint.
367367
it('proposer invalidates multiple checkpoints', async () => {
368-
// Start all sequencers with default (good) config, wait for the first checkpoint to land,
369-
// then apply the bad config to the proposers of the next two slots. This avoids the race
370-
// where a bad proposer is also the proposer of slot+1 and gets the bad config too early.
368+
// Pick the bad slots before starting any sequencer, then warp to just before them, so a far-away
369+
// candidate costs a warp instead of a real-time wait. We need a lead-in of good slots: the first
370+
// good checkpoint lands at warpSlot or warpSlot+1 (warpSlot+2 on a slow start), and the malicious
371+
// config is applied only after it is mined, so the proposers of warpSlot+1..warpSlot+3 must not be
372+
// the bad proposers — otherwise a pipelined job created before the bad slots could snapshot the
373+
// malicious config (jobs snapshot the sequencer config during the last L1 slot of the previous L2
374+
// slot, when getEpochAndSlotInNextL1Slot first returns the proposer's target slot).
371375
const sequencers = nodes.map(node => node.getSequencer()!);
372376
sequencers.forEach(s => s.updateConfig({ minTxsPerBlock: 0 }));
373-
await Promise.all(sequencers.map(s => s.start()));
374-
logger.warn(`Started all sequencers, waiting for first checkpoint before applying malicious config`);
375-
376-
// Wait for at least one checkpoint to be mined so that any in-progress slot has completed
377-
const initialCheckpointNumber = (await nodes[0].getChainTips()).checkpointed.checkpoint.number;
378-
await test.waitUntilCheckpointNumber(CheckpointNumber(initialCheckpointNumber + 1), test.L2_SLOT_DURATION_IN_S * 4);
379377

380-
// Align to the start of an L2 slot before computing the bad slots, so we have a generous
381-
// buffer to push the malicious config to badSlot1's proposer before it snapshots its config
382-
// into a new CheckpointProposalJob. Under proposer pipelining, that job is built during the
383-
// last L1 slot of the previous L2 slot (when getEpochAndSlotInNextL1Slot first returns the
384-
// proposer's target slot), so the practical window is somewhat less than a full L2 slot.
385-
await test.monitor.waitUntilNextL2Slot();
386-
const { l2SlotNumber: currentSlot } = await test.monitor.run();
387-
logger.warn(`First checkpoint mined, current slot is ${currentSlot}`);
388-
389-
// The bad config is applied while sequencers are already running; skip pairs where a pipelined
390-
// pre-bad target slot could snapshot that config before the intended bad slots.
391-
let badSlot1: SlotNumber | undefined;
392-
let badSlot2: SlotNumber | undefined;
378+
const preBadSlotCount = 3;
379+
let warpSlot: SlotNumber | undefined;
393380
let badProposers: EthAddress[] = [];
394-
const firstCandidateSlot = Number(currentSlot) + 3;
395-
const firstUnsnapshottedTargetSlot = SlotNumber.add(currentSlot, 2);
396-
const maxBadSlotSearchAttempts = 20;
397-
for (let attempt = 0; attempt < maxBadSlotSearchAttempts && badSlot1 === undefined; attempt++) {
398-
const candidateSlot1 = SlotNumber(firstCandidateSlot + attempt);
399-
const candidateSlot2 = SlotNumber.add(candidateSlot1, 1);
400-
const preBadTargetSlots = range(
401-
Math.max(0, Number(candidateSlot1) - Number(firstUnsnapshottedTargetSlot)),
402-
Number(firstUnsnapshottedTargetSlot),
403-
).map(SlotNumber);
404-
const [preBadProposers, p1, p2] = await Promise.all([
405-
Promise.all(preBadTargetSlots.map(slot => test.epochCache.getProposerAttesterAddressInSlot(slot))),
406-
test.epochCache.getProposerAttesterAddressInSlot(candidateSlot1),
407-
test.epochCache.getProposerAttesterAddressInSlot(candidateSlot2),
408-
]);
409-
410-
logger.warn(`Checking bad checkpoint slots ${candidateSlot1} and ${candidateSlot2}`, {
411-
preBadTargetSlots,
412-
preBadProposers: preBadProposers.map(proposer => proposer?.toString()),
413-
p1: p1?.toString(),
414-
p2: p2?.toString(),
415-
});
381+
let candidate = Number(test.epochCache.getEpochAndSlotNow().slot) + 2;
382+
const maxBadSlotSearchAttempts = 100;
383+
for (let attempt = 0; attempt < maxBadSlotSearchAttempts && warpSlot === undefined; attempt++) {
384+
try {
385+
const candidateWarpSlot = SlotNumber(candidate);
386+
const preBadTargetSlots = times(preBadSlotCount, i => SlotNumber.add(candidateWarpSlot, i + 1));
387+
const candidateSlot1 = SlotNumber.add(candidateWarpSlot, preBadSlotCount + 1);
388+
const candidateSlot2 = SlotNumber.add(candidateWarpSlot, preBadSlotCount + 2);
389+
const [preBadProposers, p1, p2] = await Promise.all([
390+
Promise.all(preBadTargetSlots.map(slot => test.epochCache.getProposerAttesterAddressInSlot(slot))),
391+
test.epochCache.getProposerAttesterAddressInSlot(candidateSlot1),
392+
test.epochCache.getProposerAttesterAddressInSlot(candidateSlot2),
393+
]);
416394

417-
const badProposerHasUnsnapshottedPreBadSlot =
418-
p1 !== undefined &&
419-
p2 !== undefined &&
420-
preBadProposers.some(proposer => proposer !== undefined && (proposer.equals(p1) || proposer.equals(p2)));
395+
logger.warn(`Checking bad checkpoint slots ${candidateSlot1} and ${candidateSlot2}`, {
396+
candidateWarpSlot,
397+
preBadTargetSlots,
398+
preBadProposers: preBadProposers.map(proposer => proposer?.toString()),
399+
p1: p1?.toString(),
400+
p2: p2?.toString(),
401+
});
421402

422-
if (p1 && p2 && !badProposerHasUnsnapshottedPreBadSlot) {
423-
badSlot1 = candidateSlot1;
424-
badSlot2 = candidateSlot2;
425-
badProposers = [p1, p2];
403+
const badProposerHasUnsnapshottedPreBadSlot =
404+
p1 !== undefined &&
405+
p2 !== undefined &&
406+
preBadProposers.some(proposer => proposer !== undefined && (proposer.equals(p1) || proposer.equals(p2)));
407+
408+
if (p1 && p2 && !badProposerHasUnsnapshottedPreBadSlot) {
409+
warpSlot = candidateWarpSlot;
410+
badProposers = [p1, p2];
411+
}
412+
candidate++;
413+
} catch (err) {
414+
const msg = err instanceof Error ? err.message : String(err);
415+
if (!msg.includes('EpochNotStable')) {
416+
throw err;
417+
}
418+
const block = await test.l1Client.getBlock({ includeTransactions: false });
419+
const warpBy = test.epochDuration * test.L2_SLOT_DURATION_IN_S;
420+
const newTs = Number(block.timestamp) + warpBy;
421+
logger.warn(`Hit EpochNotStable at candidate ${candidate}, warping L1 forward by ${warpBy}s to ${newTs}`);
422+
await test.context.cheatCodes.eth.warp(newTs, { resetBlockInterval: true });
423+
const newCurrentSlot = Number(test.epochCache.getEpochAndSlotNow().slot);
424+
if (candidate < newCurrentSlot + 2) {
425+
candidate = newCurrentSlot + 2;
426+
}
426427
}
427428
}
428-
if (badSlot1 === undefined || badSlot2 === undefined) {
429+
if (warpSlot === undefined) {
429430
throw new Error(`Could not find bad checkpoint slots after ${maxBadSlotSearchAttempts} attempts`);
430431
}
432+
const badSlot1 = SlotNumber.add(warpSlot, preBadSlotCount + 1);
433+
const badSlot2 = SlotNumber.add(warpSlot, preBadSlotCount + 2);
431434
const badSlots = [badSlot1, badSlot2];
432435

436+
// Warp to one L1 block before warpSlot, so the sequencers have a full L2 slot to boot and settle
437+
// pipelining before the build window for warpSlot+1 opens at the end of warpSlot.
438+
const warpTo = getTimestampForSlot(warpSlot, test.constants) - BigInt(test.L1_BLOCK_TIME_IN_S);
439+
logger.warn(`Warping L1 to ${warpTo}, one L1 block before slot ${warpSlot}`, { warpSlot, badSlot1, badSlot2 });
440+
await test.context.cheatCodes.eth.warp(Number(warpTo), { resetBlockInterval: true });
441+
442+
// Start all sequencers with default (good) config and wait for the first checkpoint to land,
443+
// so the chain is moving before we apply the bad config to the proposers of the bad slots.
444+
const initialCheckpointNumber = (await nodes[0].getChainTips()).checkpointed.checkpoint.number;
445+
await Promise.all(sequencers.map(s => s.start()));
446+
logger.warn(`Started all sequencers, waiting for first checkpoint before applying malicious config`);
447+
await test.waitUntilCheckpointNumber(CheckpointNumber(initialCheckpointNumber + 1), test.L2_SLOT_DURATION_IN_S * 4);
448+
433449
const badNodes = [];
434450
for (let badProposerIndex = 0; badProposerIndex < badProposers.length; badProposerIndex++) {
435451
const badProposer = badProposers[badProposerIndex];
@@ -451,6 +467,11 @@ describe('e2e_epochs/epochs_invalidate_block', () => {
451467
logger.warn(`Applied malicious config to node ${nodeIndex} with proposer ${badProposer} for slot ${badSlot}`);
452468
}
453469

470+
// Fail fast with a clear error if applying the configs was so slow that badSlot1's proposal job
471+
// may have already snapshotted the good config.
472+
const slotAfterBadConfig = Number(test.epochCache.getEpochAndSlotNow().slot);
473+
expect(slotAfterBadConfig).toBeLessThan(Number(badSlot1));
474+
454475
// We should see two invalid blocks being proposed by the bad proposers in those two slots
455476
const firstCheckpointPromise = promiseWithResolvers<CheckpointNumber>();
456477
const secondCheckpointPromise = promiseWithResolvers<CheckpointNumber>();
@@ -466,11 +487,15 @@ describe('e2e_epochs/epochs_invalidate_block', () => {
466487
}
467488
});
468489

469-
// Wait for both checkpoints to be mined
490+
// Wait for both checkpoints to be mined. Note that timeoutPromise rejects on timeout, so there
491+
// is no point in racing against a fallback value.
470492
logger.warn(`Waiting for two checkpoints to be mined on slots ${expectedFirstSlot} and ${expectedSecondSlot}`);
471493
const [firstCheckpoint, secondCheckpoint] = await Promise.race([
472494
Promise.all([firstCheckpointPromise.promise, secondCheckpointPromise.promise]),
473-
timeoutPromise(test.L2_SLOT_DURATION_IN_S * 8 * 1000).then(() => [CheckpointNumber(0), CheckpointNumber(0)]),
495+
timeoutPromise(
496+
test.L2_SLOT_DURATION_IN_S * 8 * 1000,
497+
`Waiting for bad checkpoints at slots ${expectedFirstSlot} and ${expectedSecondSlot}`,
498+
),
474499
]);
475500

476501
// Sanity check: verify that both bad checkpoints landed on L1 with insufficient attestations.

0 commit comments

Comments
 (0)