Skip to content

test(e2e): deflake epochs_mbps_redistribution#24182

Merged
PhilWindle merged 1 commit into
merge-train/spartan-v5from
spl/deflake-mbps-redistribution
Jun 18, 2026
Merged

test(e2e): deflake epochs_mbps_redistribution#24182
PhilWindle merged 1 commit into
merge-train/spartan-v5from
spl/deflake-mbps-redistribution

Conversation

@spalladino

@spalladino spalladino commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Motivation

e2e_epochs/epochs_mbps_redistribution asserted that a burst of late txs all land in a single (last) block of a checkpoint. But the proposer snapshots the mempool once per block as soon as minTxsPerBlock are available, and the burst (sent via Promise.all(send())) reaches the proposer over gossip one tx at a time — so the last block's snapshot routinely captured only a subset and the rest spilled into the next checkpoint, flaking the assertion. Empirically the one-before-last block grabs its full redistributed allotment, leaving the last block with fewer than its static share, so "all in the last block" was never a reliable invariant in the first place.

Approach

Check the redistribution claim against a race-robust invariant rather than an exact block placement:

  • Use a 4-block-per-checkpoint profile (blockDurationMs 8000 → 6000, slot unchanged at 36s) with a tight maxTxsPerCheckpoint = 9, giving a static per-block cap S = ceil(9 * 1.2 / 4) = 3 under the 1.2 multiplier floor.
  • Two early blocks take one tx each; the 7-tx burst is dumped immediately after the second early block is in, then splits arbitrarily (x / 7 - x) across the last two blocks as it propagates.
  • Assert that, within the checkpoint holding the early txs, the last two block indices jointly hold all 7 late txs (> 2 * S = 6). Without redistribution each block is capped at S, so the two could hold at most 6 and the 7th spills into the next checkpoint. The count is by checkpoint-relative block index (not global block number) so a spilled tx can't masquerade as redistributed, and the checkpoint block count is asserted to be 4 so a build collapse fails diagnostically rather than misleadingly.

This removes the dependence on which of the last two blocks wins the race. Verified robust by temporarily forcing the one-before-last block to both extremes (1 and 5 txs) — the joint count stays 7 in both cases.

Changes

  • end-to-end (tests): rework the first epochs_mbps_redistribution test (C=4, immediate burst dump, joint last-two-blocks assertion) and refresh its comments; update the second test's comments to reflect the shared C=4 timing (its logic and assertions are unchanged).

Related: this investigation surfaced a separate production bug (A-1251) where waitForMinTxs gates on a non-age-filtered pending count while the block-building iterator filters by minTxPoolAgeMs; that is tracked separately and not addressed here.

This PR is an alternative to #24101

…design

Assert a race-robust invariant — the checkpoint's last two block indices jointly hold the whole late-tx burst (> 2*S) — instead of requiring all late txs in a single block, which raced the proposer's per-block mempool snapshot.
@spalladino spalladino changed the title test(e2e): deflake epochs_mbps_redistribution via C=4 redistribution design test(e2e): deflake epochs_mbps_redistribution Jun 18, 2026
@PhilWindle PhilWindle merged commit d39d115 into merge-train/spartan-v5 Jun 18, 2026
18 checks passed
@PhilWindle PhilWindle deleted the spl/deflake-mbps-redistribution branch June 18, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants