Skip to content

test(e2e): fix proposer invalidates multiple checkpoints timeout#23608

Merged
PhilWindle merged 1 commit into
merge-train/spartanfrom
spl/fix-invalidate-block-again
May 27, 2026
Merged

test(e2e): fix proposer invalidates multiple checkpoints timeout#23608
PhilWindle merged 1 commit into
merge-train/spartanfrom
spl/fix-invalidate-block-again

Conversation

@spalladino
Copy link
Copy Markdown
Contributor

@spalladino spalladino commented May 27, 2026

Fixes flake in proposer invalidates multiple checkpoints e2e_epochs/epochs_invalidate_block.parallel.test.ts test that caused a timeout (see this run). See below for the Codex analysis and fix.


Test Summary
proposer invalidates multiple checkpoints verifies that two intended bad checkpoints land with insufficient attestations, a later good proposer invalidates the first bad checkpoint, and the chain then progresses.

Failed Run Error
CI run 8b1c0f4ec6031f2b timed out at Jest’s 600s limit. The failure was not the shutdown L1 send error; that happened after the timeout while teardown was interrupting pending work.

Failed vs Successful Divergence
First meaningful divergence: checkpoint 4 at slot 23.

Failed log: slot 23 published checkpoint 4 with only 1 attestation, then archivers reported Insufficient attestations ... actualAttestations:1.
Successful log: slot 23 collected all 5 attestations before publishing checkpoint 4, so the first intentionally bad checkpoints were later.

Timeline
Failed:

  • 15:59:11 selected intended bad slots 25/26, applied bad config to proposer 0x15...
  • 15:59:35 slot 23 job prepared by that same proposer
  • 16:00:15 checkpoint 4 at slot 23 landed with 1 attestation
  • repeated rollback/retry consumed enough time to hit Jest timeout

Successful:

  • slot 23 checkpoint landed cleanly with 5 attestations
  • intended bad checkpoints at slots 24/25 landed with 1 attestation
  • checkpoint 5 was invalidated
  • test completed successfully

Hypothesis
High confidence: the test’s bad-slot selection only excluded candidateSlot1 - 1 as a pre-bad pipelined target. In the failed run, candidateSlot1 - 2 was still unsnapshotted and owned by a bad proposer, so applying malicious config leaked into slot 23.

Evidence

  • Logs: failed run selected slots 25/26 but slot 23 later published with 1 attestation from the newly bad proposer.
  • Source: pipelined checkpoint jobs snapshot sequencer config when the target-slot job is created, so applying config while sequencers are running can affect any not-yet-created pre-bad job.
  • Skeptic check: no contradiction found; it also caught a broken local timeout race.

Proposed Fix
Implemented in epochs_invalidate_block.parallel.test.ts: the selector now excludes bad proposers from every pre-bad target slot from currentSlot + 2 through candidateSlot1 - 1, not just the immediately prior slot.

Also fixed the broken timeout race at line 475 by removing the accidental inner await.

**Test Summary**
`proposer invalidates multiple checkpoints` verifies that two intended bad checkpoints land with insufficient attestations, a later good proposer invalidates the first bad checkpoint, and the chain then progresses.

**Failed Run Error**
CI run `8b1c0f4ec6031f2b` timed out at Jest’s 600s limit. The failure was not the shutdown L1 send error; that happened after the timeout while teardown was interrupting pending work.

**Failed vs Successful Divergence**
First meaningful divergence: checkpoint 4 at slot 23.

Failed log: slot 23 published checkpoint 4 with only 1 attestation, then archivers reported `Insufficient attestations ... actualAttestations:1`.
Successful log: slot 23 collected all 5 attestations before publishing checkpoint 4, so the first intentionally bad checkpoints were later.

**Timeline**
Failed:
- `15:59:11` selected intended bad slots 25/26, applied bad config to proposer `0x15...`
- `15:59:35` slot 23 job prepared by that same proposer
- `16:00:15` checkpoint 4 at slot 23 landed with 1 attestation
- repeated rollback/retry consumed enough time to hit Jest timeout

Successful:
- slot 23 checkpoint landed cleanly with 5 attestations
- intended bad checkpoints at slots 24/25 landed with 1 attestation
- checkpoint 5 was invalidated
- test completed successfully

**Hypothesis**
High confidence: the test’s bad-slot selection only excluded `candidateSlot1 - 1` as a pre-bad pipelined target. In the failed run, `candidateSlot1 - 2` was still unsnapshotted and owned by a bad proposer, so applying malicious config leaked into slot 23.

**Evidence**
- Logs: failed run selected slots 25/26 but slot 23 later published with 1 attestation from the newly bad proposer.
- Source: pipelined checkpoint jobs snapshot sequencer config when the target-slot job is created, so applying config while sequencers are running can affect any not-yet-created pre-bad job.
- Skeptic check: no contradiction found; it also caught a broken local timeout race.

**Proposed Fix**
Implemented in [epochs_invalidate_block.parallel.test.ts](/home/santiago/Projects/aztec-1/yarn-project/end-to-end/src/e2e_epochs/epochs_invalidate_block.parallel.test.ts:393): the selector now excludes bad proposers from every pre-bad target slot from `currentSlot + 2` through `candidateSlot1 - 1`, not just the immediately prior slot.

Also fixed the broken timeout race at [line 475](/home/santiago/Projects/aztec-1/yarn-project/end-to-end/src/e2e_epochs/epochs_invalidate_block.parallel.test.ts:475) by removing the accidental inner `await`.
@spalladino spalladino changed the title test(e2e): fix 'proposer invalidates multiple checkpoints' timeout test(e2e): fix proposer invalidates multiple checkpoints timeout May 27, 2026
@spalladino spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label May 27, 2026
@AztecBot
Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 2 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/05e2f35ea87960af�05e2f35ea87960af8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_invalidate_block.parallel.test.ts "proposer invalidates multiple checkpoints" (434s) (code: 0) group:e2e-p2p-epoch-flakes
\033FLAKED\033 (8;;http://ci.aztec-labs.com/8f3d2a7009259b3c�8f3d2a7009259b3c8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/multiple_validators_sentinel.parallel.test.ts "collects attestations for validators in proposer node when block is not published" (400s) (code: 0) group:e2e-p2p-epoch-flakes

@PhilWindle PhilWindle merged commit 206eb0f into merge-train/spartan May 27, 2026
31 of 38 checks passed
@PhilWindle PhilWindle deleted the spl/fix-invalidate-block-again branch May 27, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants