Skip to content

Commit 30640df

Browse files
authored
test(e2e): fix data_withholding_slash flake by freezing L1 across restart (#23162)
## Motivation `e2e_p2p_data_withholding_slash` was flaky because L1 raced past the epoch-8 prune deadline (`aztecProofSubmissionEpochs=0` makes the deadline ~32s after slot 17) while we stopped, wiped, and recreated the 4 validators (~28s). The recreated archivers detected the prune during their initial L1 sync and emitted `L2PruneUnproven` for epoch 8 with the original tx-carrying block, but `EpochPruneWatcher.start()` is only invoked inside `void archiver.waitForInitialSync().then(...)` in `aztec-node/server.ts`, so the listener wasn't attached yet and the event dropped silently. The recreated validators then built an empty epoch 10 on top of genesis which pruned cleanly later, producing 4 `VALID_EPOCH_PRUNED` offenses instead of the expected 4 `DATA_WITHHOLDING`. ## Approach Pause anvil block production between `removeInitialNode` and `stopNodes` so L1 stays inside epoch 8 across the recreate gap. The recreated archivers then ingest checkpoint 1 cleanly during initial sync (no prune fires, nothing to miss), `EpochPruneWatcher.start()` attaches its listener, and we resume L1 with an explicit warp + mine + interval restart so the deadline crossing is deterministic — the prune now fires while the watcher is live, producing `DATA_WITHHOLDING` for epoch 8 as the test expects. A `getCurrentEpoch < 9` assertion right after pausing fails fast if the timing window ever tightens further. ## Changes - **end-to-end (tests)**: in `data_withholding_slash.test.ts`, pause L1 mining after `removeInitialNode` and before `stopNodes`; resume after `waitForP2PMeshConnectivity` by warping to current wall-clock time, mining one L1 block, and restoring interval mining. Add a fail-fast assertion that we are still in epoch 8 when we pause.
1 parent 4d8791a commit 30640df

1 file changed

Lines changed: 39 additions & 1 deletion

File tree

yarn-project/end-to-end/src/e2e_p2p/data_withholding_slash.test.ts

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,8 +157,36 @@ describe('e2e_p2p_data_withholding_slash', () => {
157157
t.logger.warn('L2 txs mined');
158158

159159
t.logger.warn('Stopping nodes');
160+
// removeInitialNode sends a dummy L1 tx and awaits its receipt to sync the
161+
// dateProvider, so it must run while L1 mining is still active.
160162
await t.removeInitialNode();
161-
// Now stop the nodes,
163+
164+
// Pause L1 block production while we tear down and recreate validators. With
165+
// `aztecProofSubmissionEpochs=0`, epoch 8 becomes prunable as soon as epoch 9 begins
166+
// (~32s after slot 17). The stop/wipe/recreate cycle takes longer than that, so L1
167+
// would otherwise race past the prune deadline before the recreated nodes come up.
168+
// When that happens, the recreated archivers detect the prune during their initial
169+
// sync (`handleEpochPrune` emits `L2PruneUnproven`), but the `EpochPruneWatcher`
170+
// listener is only attached after `archiver.waitForInitialSync()` resolves
171+
// (see `aztec-node/server.ts`), so the event is dropped and `DATA_WITHHOLDING` is
172+
// never emitted. By freezing L1 here, the recreated archivers ingest checkpoint 1
173+
// cleanly during initial sync, the watcher starts and attaches its listener, and
174+
// then we resume L1 below so the prune fires while the listener is live.
175+
const ethCheatCodes = t.ctx.cheatCodes.eth;
176+
await ethCheatCodes.setAutomine(false);
177+
await ethCheatCodes.setIntervalMining(0);
178+
179+
// Fail fast if we paused too late — i.e. if L1 already crossed into epoch 9 before
180+
// we got here. In that case the recreated nodes would still see the prune during
181+
// initial sync and the test would flake exactly the same way.
182+
const epochAtPause = await rollup.getCurrentEpoch();
183+
expect(Number(epochAtPause)).toBeLessThan(9);
184+
185+
// Now stop the validator nodes. With L1 paused, any in-flight L1 submissions from
186+
// the validator sequencers would hang `sequencer.stop()` (it awaits pending L1
187+
// submissions). Since `minTxsPerBlock=1` and no txs are queued for slot 18+, the
188+
// sequencers don't submit further L1 transactions after the slot-17 checkpoint
189+
// (already published before `waitForTx` returned), so this is safe.
162190
await t.stopNodes(nodes);
163191
// And remove the data directories (which forms the crux of the "attack")
164192
for (let i = 0; i < NUM_VALIDATORS; i++) {
@@ -186,6 +214,16 @@ describe('e2e_p2p_data_withholding_slash', () => {
186214
// Wait for P2P mesh to be fully formed before proceeding
187215
await t.waitForP2PMeshConnectivity(nodes, NUM_VALIDATORS);
188216

217+
// Resume L1 block production. Warp L1 forward to current wall-clock time so the
218+
// epoch-8 deadline is crossed immediately on the next L1 block, then re-enable
219+
// interval mining. By now each recreated archiver has block 1 stored locally and
220+
// its `EpochPruneWatcher` listener is attached, so the next sync iteration emits
221+
// `L2PruneUnproven` for epoch 8 to a live listener → `DATA_WITHHOLDING`.
222+
const resumeTimestamp = Math.floor(t.ctx.dateProvider.now() / 1000);
223+
await ethCheatCodes.setNextBlockTimestamp(resumeTimestamp);
224+
await ethCheatCodes.mine();
225+
await ethCheatCodes.setIntervalMining(t.ctx.aztecNodeConfig.ethereumSlotDuration);
226+
189227
const offenses = await awaitOffenseDetected({
190228
epochDuration: t.ctx.aztecNodeConfig.aztecEpochDuration,
191229
logger: t.logger,

0 commit comments

Comments
 (0)