Skip to content

Commit 1fd63ef

Browse files
authored
test: fix flaky sentinel_status_slash by asserting the fault on the checkpoint slot (#23483)
## Summary Root-cause fix for the `sentinel_status_slash.parallel.test.ts` flake (added in #23286) instead of skipping it. The two malicious-proposer tests now discover the slot at which the fault is actually recorded rather than asserting it on the pre-warped block-proposer slot. (This PR previously carried a `.test_patterns.yml` skip on `next`; per maintainer request the skip is dropped and the PR now carries the fix on `merge-train/spartan`, where the test lives.) ## Root cause The failing merge-train run (test log http://ci.aztec-labs.com/83589a3b1ce4a8b7) shows every honest observer recording the target proposer as `checkpoint-missed` at the warped slot (15), with zero `checkpoint-invalid` / `checkpoint-unvalidated` / `checkpoint-valid` for it anywhere — the malicious node didn't even self-record `checkpoint-valid` for that slot. `checkpoint-missed` means *block proposals were seen for the slot, but no checkpoint-proposal re-execution outcome was recorded for it* (`sentinel.ts` `getSlotActivity`). The re-execution outcome that drives `checkpoint-invalid` / `checkpoint-unvalidated` is keyed by `proposal.slotNumber` — the slot at which a node *closes a checkpoint* (`proposal_handler.ts:887`). A proposer self-records `checkpoint-valid` only when it actually builds a checkpoint proposal. `warpToSlotBeforeTargetProposer` warps to the target's *block-proposer* slot and the test then asserts the fault at exactly that slot. But a proposer emits a checkpoint proposal only when its slot closes a checkpoint, and that does not reliably coincide with the warped block-proposer slot. When the warped slot is mid-checkpoint, no outcome is ever keyed to it → `checkpoint-missed`. Whether the proposer slot lines up with a checkpoint close shifts with pipelining/wall-clock timing, so it flakes (it fails on every retry, so the `e2e-p2p-epoch-flakes` group does not absorb it). ## Fix Stop pinning the pre-warped slot. New helper `findObservedStatusSlot` polls an observer for the slot at which it records the expected fault status (`checkpoint-unvalidated` / `checkpoint-invalid`); the test then requires every honest observer to agree at that slot and the malicious node to self-record `checkpoint-valid` for the same slot. The warp is kept purely to cut wall-clock by advancing near the proposer region. This stays faithful to what the test verifies: it still fails (times out) if the malicious proposal is never detected, so it does not mask a genuine propagation/detection bug — it only removes the brittle assumption that the fault lands on one specific pre-chosen slot. ## Verification - Type/usage review only: `findObservedStatusSlot` reuses the same `getValidatorsStats()` → `history` shape and `retryUntil` pattern as the existing helpers; `SlotNumber` / `ValidatorStatusInSlot` / `retryUntil` are already imported. - **Not run locally**: this is an intermittent e2e p2p flake; a single local run would not establish flake-freedom and a full build + repeated e2e grind is impractical to do reliably in-session. CI on this draft exercises the test; a rerun grind is the real confirmation bar. ## Related - Replaces the skip approach (the prior content of this PR and the now-closed #23443). - Adjacent to `palla/fix-sentinel-inactivity-want-to-slash` (changes the sentinel inactivity computation), which is independent of this test-timing fix.
1 parent 2a2f49e commit 1fd63ef

1 file changed

Lines changed: 43 additions & 11 deletions

File tree

yarn-project/end-to-end/src/e2e_p2p/sentinel_status_slash.parallel.test.ts

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -140,14 +140,20 @@ describe('e2e_p2p_sentinel_status_slash', () => {
140140
// re-execution state_mismatch and therefore never push to their archivers, so the malicious
141141
// node's checkpoint proposals can't find their last block and observers record `unvalidated`.
142142
const targetAddress = await spawnMaliciousAndHonestNodes({ broadcastInvalidBlockProposal: true });
143-
const targetSlot = await warpToSlotBeforeTargetProposer(targetAddress);
143+
// Warp near the malicious node's proposer slot to keep wall-clock down. We discover the slot at
144+
// which the fault is actually recorded rather than assuming it is the warped block-proposer
145+
// slot: the re-execution outcome is keyed by the checkpoint proposal's slot, and a proposer
146+
// only emits a checkpoint proposal when its slot closes a checkpoint, which does not always
147+
// coincide with the block-proposer slot we warp to.
148+
await warpToSlotBeforeTargetProposer(targetAddress);
144149
// nodes[0] is the malicious node; honest observers are nodes[1..].
145150
const honestObservers = nodes.slice(1);
146-
await assertAllObserversSentinelStatus(honestObservers, targetAddress, targetSlot, 'checkpoint-unvalidated');
147-
// The malicious node self-records `checkpoint-valid` for its own slot using the locally
148-
// computed archive (broadcastInvalidBlockProposal only corrupts the broadcast archive,
149-
// not the proposer's local state).
150-
await assertAllObserversSentinelStatus([nodes[0]], targetAddress, targetSlot, 'checkpoint-valid');
151+
const faultSlot = await findObservedStatusSlot(honestObservers[0], targetAddress, 'checkpoint-unvalidated');
152+
await assertAllObserversSentinelStatus(honestObservers, targetAddress, faultSlot, 'checkpoint-unvalidated');
153+
// The malicious node self-records `checkpoint-valid` for that slot using the locally computed
154+
// archive (broadcastInvalidBlockProposal only corrupts the broadcast archive, not the
155+
// proposer's local state).
156+
await assertAllObserversSentinelStatus([nodes[0]], targetAddress, faultSlot, 'checkpoint-valid');
151157
await assertInactivityOffenseFor(targetAddress, nodes[1]);
152158
});
153159

@@ -156,12 +162,13 @@ describe('e2e_p2p_sentinel_status_slash', () => {
156162
// block proposals valid; observers accept the blocks (so they land in the archiver) but
157163
// reject the checkpoint via header_mismatch, recording `invalid`.
158164
const targetAddress = await spawnMaliciousAndHonestNodes({ broadcastInvalidCheckpointProposalOnly: true });
159-
const targetSlot = await warpToSlotBeforeTargetProposer(targetAddress);
165+
await warpToSlotBeforeTargetProposer(targetAddress);
160166
const honestObservers = nodes.slice(1);
161-
await assertAllObserversSentinelStatus(honestObservers, targetAddress, targetSlot, 'checkpoint-invalid');
162-
// Malicious self-records `checkpoint-valid` for its own slot — proposers always consider
163-
// their own freshly-built proposal valid from their local-state perspective.
164-
await assertAllObserversSentinelStatus([nodes[0]], targetAddress, targetSlot, 'checkpoint-valid');
167+
const faultSlot = await findObservedStatusSlot(honestObservers[0], targetAddress, 'checkpoint-invalid');
168+
await assertAllObserversSentinelStatus(honestObservers, targetAddress, faultSlot, 'checkpoint-invalid');
169+
// Malicious self-records `checkpoint-valid` for that slot — proposers always consider their
170+
// own freshly-built proposal valid from their local-state perspective.
171+
await assertAllObserversSentinelStatus([nodes[0]], targetAddress, faultSlot, 'checkpoint-valid');
165172
await assertInactivityOffenseFor(targetAddress, nodes[1]);
166173
});
167174

@@ -297,6 +304,31 @@ describe('e2e_p2p_sentinel_status_slash', () => {
297304
);
298305
}
299306

307+
/**
308+
* Polls `observerNode` until it records `expectedStatus` for `targetAddress` at some slot, and
309+
* returns that slot. The slot at which the malicious node closes its checkpoint (and so the fault
310+
* is recorded) is not necessarily the block-proposer slot we warp to, so we discover it rather
311+
* than assuming it. Times out — and therefore fails the test — if the fault is never recorded,
312+
* so a genuine failure to detect the malicious proposal is still caught.
313+
*/
314+
async function findObservedStatusSlot(
315+
observerNode: AztecNodeService,
316+
targetAddress: EthAddress,
317+
expectedStatus: ValidatorStatusInSlot,
318+
): Promise<SlotNumber> {
319+
const slot = await retryUntil(
320+
async () => {
321+
const stats = await observerNode.getValidatorsStats();
322+
const validator = stats.stats[targetAddress.toString()];
323+
const entry = validator?.history.find(h => h.status === expectedStatus);
324+
return entry?.slot !== undefined ? SlotNumber(Number(entry.slot)) : undefined;
325+
},
326+
`observed ${expectedStatus} for ${targetAddress}`,
327+
AZTEC_SLOT_DURATION * 15,
328+
);
329+
return slot;
330+
}
331+
300332
/**
301333
* Polls every honest observer node until each has its sentinel record `expectedStatus` for the
302334
* target at the given slot. Asserts the recorded status matches exactly on every observer.

0 commit comments

Comments
 (0)