Commit c6488b7
authored
fix(test): wait for full gossip mesh before committee produces (A-1219) (#24149)
## Problem
`e2e_p2p_network › should rollup txs from all peers (and add the
validators without cheating)` (in `gossip_network_no_cheat.test.ts`)
intermittently fails with `TimeoutError: Timeout awaiting first
checkpoint published` — the chain never gets a first checkpoint onto L1
within 120s.
## Log analysis
From CI run
[`6d6e74a70fce8826`](http://ci.aztec-labs.com/6d6e74a70fce8826):
- The test did a blind `sleep(8000)` for peer discovery, then waited for
the first checkpoint. On the 2-CPU runner the gossipsub
**proposal/checkpoint meshes were not fully formed** 8s in.
- The first checkpoint attempt (slot 97, proposer validator-3) reached
only **2 of 3** attestations — validators 1 and 2 never received the
slot-97 proposal at all (only validator-4 had a live gossip path). No L1
publish was attempted; the proposer aborted locally on the
attestation-collection timeout.
- Because that checkpoint never landed, the L1-confirmed chain stayed at
genesis, so every later slot rebuilt a *competing* un-checkpointed block
1 (new archive). The blocks **are** pruned (`archiver:l1-sync: Pruning
blocks after block 0 ...`), but the prune lands ~1.5 slots after the
block is built — later than the next proposal arrives. So peers still
holding a not-yet-pruned block 1 rejected the new proposal with
`block_number_already_exists`, never re-executed, never attested —
capping every round at 2/3 forever.
Root cause: the gossip mesh wasn't formed when the committee started
producing, so the first proposal reached only a subset of the committee.
That both starved the first checkpoint of quorum and split the
validators onto competing block-1 forks that never re-converge.
## Fix
Replace the blind `sleep(8000)` with `waitForP2PMeshConnectivity` on the
`block_proposal`, `checkpoint_proposal`, and `checkpoint_attestation`
topics, requiring a **full mesh (N-1 peers per node)** so the first
proposal reaches the whole committee. The first checkpoint then reaches
quorum and lands — after which the chain advances to block 2 and no
competing block 1 is ever built.
Also adds a `minMeshPeerCount` parameter to `waitForP2PMeshConnectivity`
(default `1`, preserving existing callers — the helper otherwise only
requires a single mesh peer per node, which can leave some committee
members unreached at first). Quorum-from-genesis tests pass `N-1` for a
full mesh.
This is the test-side fix that addresses the trigger. There is a
separate, more fundamental product-robustness gap — a single missed
checkpoint at the chain tip is unrecoverable because of the
`block_number_already_exists` guard vs. the prune latency — which is
consensus-sensitive and tracked separately (related to A-1218); it is
intentionally **not** addressed here.
## Testing
- Build, format, lint clean; only the test and its helper changed.
- **Not yet run:** the full e2e (`gossip_network_no_cheat.test.ts`,
real-time-dependent, ideally under a 2-CPU constraint). The real
validation is running it repeatedly and confirming the committee reaches
3/3 and the first checkpoint publishes within the gate.
Closes A-1219.1 parent 269e6d0 commit c6488b7
2 files changed
Lines changed: 26 additions & 10 deletions
Lines changed: 16 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
205 | | - | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
206 | 219 | | |
207 | 220 | | |
208 | 221 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
431 | 431 | | |
432 | 432 | | |
433 | 433 | | |
| 434 | + | |
434 | 435 | | |
435 | 436 | | |
436 | 437 | | |
| |||
457 | 458 | | |
458 | 459 | | |
459 | 460 | | |
460 | | - | |
461 | | - | |
462 | | - | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
463 | 466 | | |
464 | | - | |
| 467 | + | |
465 | 468 | | |
466 | 469 | | |
467 | 470 | | |
468 | 471 | | |
469 | 472 | | |
470 | 473 | | |
471 | 474 | | |
472 | | - | |
| 475 | + | |
473 | 476 | | |
474 | | - | |
| 477 | + | |
475 | 478 | | |
476 | 479 | | |
477 | 480 | | |
478 | 481 | | |
479 | 482 | | |
480 | | - | |
| 483 | + | |
481 | 484 | | |
482 | 485 | | |
483 | 486 | | |
| |||
0 commit comments