perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46
Draft
evanlinjin wants to merge 3 commits into
Draft
perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46evanlinjin wants to merge 3 commits into
evanlinjin wants to merge 3 commits into
Conversation
CoinSelector previously stored selected/banned as Cow<BTreeSet<usize>> and candidate_order as Cow<[usize]>. The Cow's borrowed variant was never used, so the wrapper paid for no benefit. BnB also clones the selector at every node, eagerly copying the BTreeSets via Cow::Owned -- which dominates the BnB cost. This commit: - Replaces both BTreeSet<usize> fields with a hand-rolled Bitset (Vec<u64> + bit_capacity, with trailing/leading_zeros iteration). Clones become contiguous memcpy of n/64 words. - Drops Cow on candidate_order in favour of Arc<Vec<usize>>, since it is write-once (sort) then read-many (BnB). - Bumps the crate to 0.5.0; selected_indices() and banned() now return &Bitset. Benchmarks (5-run mean, release, single-threaded): bnb: 1.55s -> 0.67s (2.3x) lowest_fee: 25.8s -> 11.7s (2.2x) No new dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover edge cases of the hand-rolled Bitset: - Empty/zero-capacity, default, capacity reporting - insert/remove return values, out-of-range safety - Bits at word boundaries (0, 63, 64, 65, 127, 128, ...) - Non-word-aligned capacity, iteration with size_hint - DoubleEndedIterator: front/back interleaving, full back-drain - Equality Two proptest properties guarantee equivalence with BTreeSet: - Any random sequence of insert/remove/contains/len/is_empty/iter produces identical results. - Interleaved next() / next_back() yields each set bit exactly once, in the order dictated by the call pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two benchmark groups in benches/coin_selector.rs:
- `clone/{n}`: direct cost of CoinSelector::clone() at n =
64/256/1024/4096 candidates. Measures the per-node BnB allocation
the bitset/Arc layout was introduced to make cheap.
- `run_bnb_lowest_fee/{n}`: end-to-end run_bnb throughput on a
deterministic synthetic pool with the LowestFee metric, at n =
20/50/100/200 candidates. Reflects realistic library usage.
The pool generator is deterministic so results are comparable across
commits. Criterion is added as a dev-dependency (does not affect
downstream consumers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace
Cow<'_, BTreeSet<usize>>forselected/bannedandCow<'_, [usize]>forcandidate_orderwith:Bitset(Vec<u64>+ bit capacity, withtrailing_zeros/leading_zerositeration) forselectedandbanned.Arc<Vec<usize>>forcandidate_order— sorted once inBnbIter::new, then read-only.The previous
Cowfields were always constructed::Owned— the borrowed variant was never used — so the wrapper paid for no benefit while still triggering eagerBTreeSetclones during BnB.Bumps the crate to
0.5.0because the public types ofselected_indices()andbanned()change from&BTreeSet<usize>to&Bitset.Benchmarks (Criterion median)
benches/coin_selector.rs, release, single-threaded.clone/64clone/256clone/1024clone/4096run_bnb_lowest_fee/50run_bnb_lowest_fee/100run_bnb_lowest_fee/200clonescales near-flat (Arc bump +n/64-word memcpy) versus master's linear scaling (BTreeSet node-by-node copy). BnB benefit is largest in the mid range where clone cost is a meaningful fraction of total work; for very large n the BnB search itself dominates so faster clones matter less.Breaking changes
CoinSelector::selected_indices()now returns&Bitset(previously&BTreeSet<usize>).CoinSelector::banned()now returns&Bitset.Bitsetexposescontains/len/is_empty/iter(ExactSize + DoubleEnded), so most call patterns port across with little or no change.What this does NOT do
A flamegraph on this branch (
cargo flamegraph --bench coin_selector -- --bench --profile-time 10 run_bnb_lowest_fee/100) shows ~65% of remaining BnB time goes toselected_value+input_weightrecomputation. That's a separate, deeper refactor (delta-aware metric evaluator); it'll be explored in a follow-up branch.Test plan
cargo test— all existing tests passBitset(src/bitset.rs, 13 unit tests + 2 proptests comparing every operation againstBTreeSet)benches/coin_selector.rs) documenting the gains and acting as a regression checkCommits
feat!: replace BTreeSet+Cow with Bitset, Arc candidate_order— the main changetest: add unit and property tests for Bitset— coveragebench: add Criterion benchmarks for clone and BnB— perf regression netNo new runtime dependencies (criterion is dev-only).
🤖 Generated with Claude Code