Skip to content

Commit d2f0ca8

Browse files
authored
mempool: per-entry TTL + bucket-sharded snapshot + silent-blackhole recovery (#722)
## Summary Silent black-hole recovery for the mempool broadcast path — and adds an e2e regression that **deterministically** exercises the recovery code path. Three commits: 1. **`feat(mempool):`** Replace global-wipe `TxnCache` with per-entry TTL + bucket-sharded snapshot + four-state filter (`Dispatch / WaitForPrimary / SuppressInTtl / SuppressSameSlot`). A tx whose first dispatch landed on a stuck Primary slot is now auto-routed via the Failover slot one TTL (default 5s) later — no gaptos changes, no global re-broadcast, no extra tokio tasks. 2. **`test(e2e):`** Add Phase 3 to `pfn_chain` e2e. Restarts pfn1 with `GRAVITY_BLACKHOLE_BROADCAST=1` (Mempool-side debug env knob, ~10 LOC in `bin/gravity_node/src/mempool.rs`) so pfn1 stays a healthy member of `sync_states` but silently drops mempool broadcasts; then drives 30s of multi-account load via pfn3 and asserts impl-d's slot-flip catches every in-flight tx within ~1 TTL + commit. 3. **`test(e2e):`** Refactor Phase 3 into **two back-to-back halves** — Half A blackholes pfn1, Half B blackholes pfn2. pfn3 is never restarted between halves so `priority.rs`'s `RandomState`-seeded Primary stays fixed → exactly one half has `Primary == blackhole target` and exercises slot-flip, the other half hits direct delivery latency. The cross-half p95 split is a **deterministic** assertion — no more `--force-init` coverage lottery. ## Phase 3 assertions Per-half: - `sent ≥ 150`, `timeout == 0`, `failed == 0`, `p99 ≤ 24s` (sanity ceiling) Cross-half (deterministic bimodal): - `min(p95) ≤ 3.0s` — one half must run at direct delivery latency - `max(p95) ≥ 6.0s` — other half must hit slot-flip (TTL=5s + commit ~1s) - `gap ≥ 4.0s` — separation between the two modes ## Test plan - [x] First clean run on `mempool-impl-d` worktree (no `--force-init`): ``` Half A (pfn1 blackhole): 210/0/0 p50=6.25 p95=8.61 p99=8.93 Half B (pfn2 blackhole): 233/0/0 p50=1.18 p95=2.05 p99=2.48 bimodal: fast=2.05 slow=8.61 gap=6.57 PASS Total wall time: 6:16 ``` Half A p50 = 6.25s ≈ TTL(5s) + commit(~1s) — slot-flip exactly at the predicted floor. - [ ] Reviewer: run `SKIP_CONTRACTS_FETCH=1 python3 gravity_e2e/runner.py pfn_chain` on a fresh worktree; expect either Half A or Half B to land in the slow mode (Primary assignment is RandomState-dependent across machines). - [ ] Phase 1 / Phase 2 still pass unmodified. - [ ] Verify `GRAVITY_BLACKHOLE_BROADCAST` env knob has no effect when unset (default node behaviour unchanged).
1 parent 2ccbee2 commit d2f0ca8

5 files changed

Lines changed: 841 additions & 88 deletions

File tree

0 commit comments

Comments
 (0)