Skip to content

fix(kad-dht): reprovide CIDs in Kademlia key order to reduce new dials#3426

Open
paschal533 wants to merge 3 commits intolibp2p:mainfrom
paschal533:fix/kad-dht-reprovide-kademlia-key-order
Open

fix(kad-dht): reprovide CIDs in Kademlia key order to reduce new dials#3426
paschal533 wants to merge 3 commits intolibp2p:mainfrom
paschal533:fix/kad-dht-reprovide-kademlia-key-order

Conversation

@paschal533
Copy link
Copy Markdown
Contributor

@paschal533 paschal533 commented Mar 31, 2026

Summary

During a reprovide run, the reprovider iterates all stored CIDs and calls provide() for each one. Each provide() opens connections to the K closest peers for that CID. Because CIDs are processed in datastore iteration order (essentially random), each CID's K closest peers are likely different, so every CID requires a fresh set of dials.

This ports the SweepingProvider optimisation from go-libp2p: collect all CIDs that need reproviding, sort them by their Kademlia key, then queue them in that order.

XOR-adjacent CIDs share the same K closest peers. By processing them consecutively, the connections opened for one CID are still live when the next CID is queued, so they get reused instead of requiring new dials. Over a full reprovide run with many CIDs this significantly reduces the total number of new connections opened.

Changes

  • src/reprovider.ts: collect CIDs into an array before queueing, sort by Kademlia key, then queue in sorted order
  • test/reprovider.spec.ts: inserts 5 CIDs in reverse Kademlia key order and verifies contentRouting.provide is called in ascending Kademlia key order

Test plan

  • should reprovide in Kademlia key order inserts CIDs in reverse order, verifies provide calls arrive in correct ascending Kademlia key order
  • All 150 existing tests continue to pass
  • Lint and TypeScript build clean

@paschal533 paschal533 requested a review from a team as a code owner March 31, 2026 10:05
@paschal533 paschal533 marked this pull request as draft March 31, 2026 10:06
@paschal533 paschal533 marked this pull request as ready for review March 31, 2026 10:37
Comment thread packages/kad-dht/src/reprovider.ts Outdated
this.log('starting reprovide/cleanup')

// collect CIDs that need reproviding so we can sort them before queueing
const toReprovide: CID[] = []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do this in a way that does not let this array grow in an unbounded way otherwise it will OOM with large CID collections

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I replaced the unbounded array with a fixed-size batch approach (sortBatchSize, default 512). CIDs are sorted in 512-entry windows; each batch is flushed before the next begins, bounding peak memory to O(sortBatchSize). Added a test that verifies within-batch Kademlia ordering with sortBatchSize: 2

Collect all CIDs that need reproviding before queueing them, sort by
their Kademlia key, then queue in that order. XOR-adjacent CIDs share
the same K closest peers, so connections opened for one CID are reused
for the next, reducing the number of new dials across a reprovide run.

Ports the SweepingProvider optimisation from go-libp2p:
libp2p/go-libp2p#2774
Replaces the unbounded toReprovide array with a fixed-size batch
approach. CIDs are accumulated into a batch of at most sortBatchSize
(default 512) entries; each full batch is sorted by Kademlia key and
queued before the next batch begins. A final flush handles the
remainder. This bounds peak memory to O(sortBatchSize) regardless of
the total number of stored CIDs, while still delivering the connection-
reuse benefit for XOR-adjacent CIDs within each batch.

Also adds sortBatchSize to ReproviderInit and a test that verifies
within-batch Kademlia ordering when batches are smaller than the CID
count.
@paschal533 paschal533 force-pushed the fix/kad-dht-reprovide-kademlia-key-order branch from cad7641 to e2d6d61 Compare April 20, 2026 10:51
…lint rule

The sortBatchSize check was nested 5 levels deep (try → for-await → try →
if-shouldReprovide → if-batchSize), exceeding the max-depth:4 eslint rule.
Invert the shouldReprovide guard to an early continue so the batch logic
sits at depth 4.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants