fix(kad-dht): reprovide CIDs in Kademlia key order to reduce new dials#3426
Open
paschal533 wants to merge 3 commits intolibp2p:mainfrom
Open
fix(kad-dht): reprovide CIDs in Kademlia key order to reduce new dials#3426paschal533 wants to merge 3 commits intolibp2p:mainfrom
paschal533 wants to merge 3 commits intolibp2p:mainfrom
Conversation
5 tasks
achingbrain
reviewed
Apr 19, 2026
| this.log('starting reprovide/cleanup') | ||
|
|
||
| // collect CIDs that need reproviding so we can sort them before queueing | ||
| const toReprovide: CID[] = [] |
Member
There was a problem hiding this comment.
Need to do this in a way that does not let this array grow in an unbounded way otherwise it will OOM with large CID collections
Contributor
Author
There was a problem hiding this comment.
Fixed. I replaced the unbounded array with a fixed-size batch approach (sortBatchSize, default 512). CIDs are sorted in 512-entry windows; each batch is flushed before the next begins, bounding peak memory to O(sortBatchSize). Added a test that verifies within-batch Kademlia ordering with sortBatchSize: 2
Collect all CIDs that need reproviding before queueing them, sort by their Kademlia key, then queue in that order. XOR-adjacent CIDs share the same K closest peers, so connections opened for one CID are reused for the next, reducing the number of new dials across a reprovide run. Ports the SweepingProvider optimisation from go-libp2p: libp2p/go-libp2p#2774
Replaces the unbounded toReprovide array with a fixed-size batch approach. CIDs are accumulated into a batch of at most sortBatchSize (default 512) entries; each full batch is sorted by Kademlia key and queued before the next batch begins. A final flush handles the remainder. This bounds peak memory to O(sortBatchSize) regardless of the total number of stored CIDs, while still delivering the connection- reuse benefit for XOR-adjacent CIDs within each batch. Also adds sortBatchSize to ReproviderInit and a test that verifies within-batch Kademlia ordering when batches are smaller than the CID count.
cad7641 to
e2d6d61
Compare
…lint rule The sortBatchSize check was nested 5 levels deep (try → for-await → try → if-shouldReprovide → if-batchSize), exceeding the max-depth:4 eslint rule. Invert the shouldReprovide guard to an early continue so the batch logic sits at depth 4.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
During a reprovide run, the reprovider iterates all stored CIDs and calls
provide()for each one. Eachprovide()opens connections to the K closest peers for that CID. Because CIDs are processed in datastore iteration order (essentially random), each CID's K closest peers are likely different, so every CID requires a fresh set of dials.This ports the SweepingProvider optimisation from go-libp2p: collect all CIDs that need reproviding, sort them by their Kademlia key, then queue them in that order.
XOR-adjacent CIDs share the same K closest peers. By processing them consecutively, the connections opened for one CID are still live when the next CID is queued, so they get reused instead of requiring new dials. Over a full reprovide run with many CIDs this significantly reduces the total number of new connections opened.
Changes
src/reprovider.ts: collect CIDs into an array before queueing, sort by Kademlia key, then queue in sorted ordertest/reprovider.spec.ts: inserts 5 CIDs in reverse Kademlia key order and verifiescontentRouting.provideis called in ascending Kademlia key orderTest plan
should reprovide in Kademlia key orderinserts CIDs in reverse order, verifies provide calls arrive in correct ascending Kademlia key order