fix(mempool): honor sender_bucket / timeline_id / count / before in read_timeline by keanji-x · Pull Request #719 · Galxe/gravity-sdk

keanji-x · 2026-05-17T13:59:15Z

Summary

CoreMempool::read_timeline was ignoring every routing parameter (sender_bucket, timeline_id, count, before, priority_of_receiver) and using a single global TxnCache to dedup across consecutive ticks. The result: the first call of any tick drained every broadcastable txn from the reth pool into one batch, and every subsequent call (other buckets, Failover priority, even other peers) returned empty until the cache's TTL/capacity expired ~60s later.

Observable bugs

sender_bucket fan-out across PFNs collapsed. Whichever peer the broadcast loop iterated first this tick swallowed the entire pool; the other PFN got nothing for ~60s.
Failover delay (shared_mempool_failover_delay_ms) bypassed. before was ignored, so Failover peers received txns with the same timing as Primary.
shared_mempool_batch_size cap bypassed. A single broadcast batch could contain the entire pool.
Peer-side resume state broken. The returned MultiBucketTimelineIndexIds.id_per_bucket had length equal to the txn count rather than the number of fee buckets, so PeerSyncState::timelines never advanced meaningfully.

Fix

Replace TxnCache with a BroadcastIndex that lazily mirrors the reth pool's broadcastable set, assigning each txn a monotonic timeline_id within its sender_bucket on first observation. read_timeline now:

Filters by sender_bucket (last-byte-mod-N, matching transaction_store::sender_bucket in upstream Aptos).
Resumes after the input timeline_id (max across positions; this PR does not subdivide by fee bucket internally — see follow-ups).
Caps at count.
Skips txns first observed after before and breaks the scan on the first too-fresh entry (timeline_id is monotonic with insertion time, so the tail is also too fresh).
Returns a MultiBucketTimelineIndexIds whose length matches the caller's input so PeerSyncState stays well-formed.

GC: entries are dropped when their txn no longer appears in the snapshot from the upstream pool (committed, expired, or evicted).

Follow-ups (intentionally out of scope)

Per-(sender_bucket, fee_bucket) subdivision. Requires exposing gas price from TxPool::get_broadcast_txns. Current PR collapses all fee buckets into one timeline, which loses high-fee-first ordering within a broadcast batch but matches the reth pool's natural ordering.
Accurate ready_time emission. Needs SystemTime tracking in addition to Instant. Left at 0 to match prior behavior.
timeline_range retransmit path. Still returns empty (unchanged from before); orthogonal to this fix.

Test plan

cargo check -p aptos-mempool — passes
cargo test -p aptos-mempool --lib — 4 new unit tests pass:
- sender_bucket_takes_last_byte_mod_n (sender_bucket arithmetic)
- broadcast_index_partitions_by_sender_bucket (bucket partitioning)
- range_query_respects_resume_id (BTreeMap::range resume semantics)
- failover_before_filter_breaks_scan (before-driven scan break)
Validator/PFN broadcast smoke: confirm both PFNs receive a balanced share of new txns under load
Confirm shared_mempool_failover_delay_ms is honored on Failover peers (Failover broadcasts strictly lag Primary by configured delay)
Watch for regressions in the per-peer distribution of the broadcast metric

🤖 Generated with Claude Code

…ead_timeline The previous CoreMempool::read_timeline ignored every routing parameter and used a single global TxnCache to dedup across consecutive ticks. That meant the first read_timeline call of a tick drained every broadcastable txn from the reth pool, populated the cache, and every subsequent call (other buckets, Failover priority, even other peers) saw an empty result until TTL/capacity cleared the cache. Effects: - The sender_bucket-based fan-out across PFNs collapsed: whichever peer iterated first this tick received the entire pool, the others got nothing for ~60s. - The Failover delay (`shared_mempool_failover_delay_ms`) was bypassed, since `before` was ignored. - The `shared_mempool_batch_size` cap was bypassed: a tick could push the whole pool into a single broadcast batch. - The returned MultiBucketTimelineIndexIds used `id_per_bucket` length equal to the txn count rather than the number of fee buckets, so the peer-side resume state never advanced meaningfully. Replace TxnCache with a BroadcastIndex that lazily mirrors the reth pool's broadcastable set and assigns each txn a monotonic timeline_id within its sender_bucket on first observation. read_timeline now: - Returns only txns whose sender hashes into the requested sender_bucket (last-byte-mod-N, matching upstream Aptos's `transaction_store::sender_bucket`). - Resumes after the input timeline_id (max position; we don't subdivide by fee bucket). - Caps the batch at `count`. - Skips txns first observed after `before` (Failover delay gate), and breaks on the first too-fresh entry since timeline_id is monotonic with insertion time. - Returns a MultiBucketTimelineIndexIds whose length matches the input so PeerSyncState stays well-formed. GC: entries are dropped when their txn no longer appears in the snapshot from the upstream pool (committed, expired, or evicted). Per-(sender_bucket, fee_bucket) subdivision and accurate ready_time emission are intentional follow-ups: both require richer txn metadata from the TxPool trait (gas price for fee buckets, SystemTime for ready_time), which is out of scope for this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nekomoto911 · 2026-05-18T08:43:22Z

Tracing through the implementation with the broadcast design notes — the cursor restoration itself (sender_bucket sharding + per-(peer, bucket) cursor + before filter) is a faithful upstream restore. A few gaps worth surfacing:

1. No retransmit path → fault-tolerance regression vs `main`

With the strict cursor honored, a single un-committing txn X is broadcast at most twice (Primary at t≈0, Failover at t≈500ms) and never again in steady state, because:

timeline_range_of_message still returns vec![] (mempool.rs:94-102) → sent_messages is wiped at the top of determine_broadcast_batch every tick → L1 ack-timeout retransmit never fires.
gc() / gc_by_expiration_time() still empty (mempool.rs:133-135, 168-170) → L4 system_ttl=600s doesn't apply; reth's max_tx_lifetime only scans queued, not pending (maintain.rs:284-296).
The previous TxnCache 60s wipe — which at least gave silent-black-hole a periodic retry window — is removed.

Net effect: silent-black-hole (Primary's TCP/ACK/ping healthy but downstream broken) becomes non-recoverable from the mempool side; X stays in reth pending until capacity eviction (hours–days). Worse than main on this axis.

Suggested follow-up: implement timeline_range_of_message so L1 works; ideally a finite gc() for the L4 backstop.

2. No snapshot amortization across same-tick calls

pool.get_broadcast_txns(None).collect() plus full HashMap rebuild runs on every read_timeline call (mempool.rs:165-174). With num_sender_buckets=4 × default_failovers=1 × 2-peer, that's ~8 full pool scans per 10ms tick. Sub-ms at ≤10k; bottleneck at 100k. Easy follow-up: cache the snapshot on Mempool for ~tick interval and share across same-tick calls.

3. `insertion_time` is "first-observed", not pool insertion

mempool.rs:215 — Failover's before filter uses this. On mempool restart, every in-pool txn gets insertion_time = now, so Failover loses its catch-up role for the first 500ms after restart. Minor in production; worth adding to the follow-ups list.

4. Nit: `saturating_add` should be `checked_add`

next_ids[sb_idx].saturating_add(1) (mempool.rs:213) saturates at u64::MAX, after which every new txn shares the same id and the BTreeMap overwrites silently. Practically unreachable, but checked_add(1).expect("timeline_id overflow") makes the invariant explicit.

github-actions · 2026-07-03T02:17:02Z

This PR is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.

github-actions Bot added the Stale label Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(mempool): honor sender_bucket / timeline_id / count / before in read_timeline#719

fix(mempool): honor sender_bucket / timeline_id / count / before in read_timeline#719
keanji-x wants to merge 1 commit into
Galxe:mainfrom
keanji-x:kj/fix-mempool-read-timeline

keanji-x commented May 17, 2026 •

edited

Loading

Uh oh!

nekomoto911 commented May 18, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

keanji-x commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Observable bugs

Fix

Follow-ups (intentionally out of scope)

Test plan

Uh oh!

nekomoto911 commented May 18, 2026

1. No retransmit path → fault-tolerance regression vs main

2. No snapshot amortization across same-tick calls

3. insertion_time is "first-observed", not pool insertion

4. Nit: saturating_add should be checked_add

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keanji-x commented May 17, 2026 •

edited

Loading

1. No retransmit path → fault-tolerance regression vs `main`

3. `insertion_time` is "first-observed", not pool insertion

4. Nit: `saturating_add` should be `checked_add`