perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order by evanlinjin · Pull Request #46 · bitcoindevkit/coin-select

evanlinjin · 2026-05-17T21:33:53Z

Summary

Replace Cow<'_, BTreeSet<usize>> for selected/banned and Cow<'_, [usize]> for candidate_order with:

A hand-rolled Bitset (Vec<u64> + bit capacity, with trailing_zeros/leading_zeros iteration) for selected and banned.
Arc<Vec<usize>> for candidate_order — sorted once in BnbIter::new, then read-only.

The previous Cow fields were always constructed ::Owned — the borrowed variant was never used — so the wrapper paid for no benefit while still triggering eager BTreeSet clones during BnB.

Bumps the crate to 0.5.0 because the public types of selected_indices() and banned() change from &BTreeSet<usize> to &Bitset.

Benchmarks (Criterion median)

benches/coin_selector.rs, release, single-threaded.

	master	this PR	speedup
`clone/64`	91 ns	51 ns	1.8×
`clone/256`	278 ns	39 ns	7.2×
`clone/1024`	1.34 µs	50 ns	27×
`clone/4096`	5.68 µs	58 ns	99×
`run_bnb_lowest_fee/50`	26.0 ms	10.3 ms	2.5×
`run_bnb_lowest_fee/100`	73.2 ms	40.2 ms	1.8×
`run_bnb_lowest_fee/200`	1.27 s	1.10 s	1.15× (CIs overlap)

clone scales near-flat (Arc bump + n/64-word memcpy) versus master's linear scaling (BTreeSet node-by-node copy). BnB benefit is largest in the mid range where clone cost is a meaningful fraction of total work; for very large n the BnB search itself dominates so faster clones matter less.

Breaking changes

CoinSelector::selected_indices() now returns &Bitset (previously &BTreeSet<usize>).
CoinSelector::banned() now returns &Bitset.

Bitset exposes contains/len/is_empty/iter (ExactSize + DoubleEnded), so most call patterns port across with little or no change.

What this does NOT do

A flamegraph on this branch (cargo flamegraph --bench coin_selector -- --bench --profile-time 10 run_bnb_lowest_fee/100) shows ~65% of remaining BnB time goes to selected_value + input_weight recomputation. That's a separate, deeper refactor (delta-aware metric evaluator); it'll be explored in a follow-up branch.

Test plan

cargo test — all existing tests pass
New unit + proptest coverage for Bitset (src/bitset.rs, 13 unit tests + 2 proptests comparing every operation against BTreeSet)
Criterion benchmark added (benches/coin_selector.rs) documenting the gains and acting as a regression check
Reviewer to confirm the API break is acceptable for 0.5.0

Commits

feat!: replace BTreeSet+Cow with Bitset, Arc candidate_order — the main change
test: add unit and property tests for Bitset — coverage
bench: add Criterion benchmarks for clone and BnB — perf regression net

No new runtime dependencies (criterion is dev-only).

🤖 Generated with Claude Code

CoinSelector previously stored selected/banned as Cow<BTreeSet<usize>> and candidate_order as Cow<[usize]>. The Cow's borrowed variant was never used, so the wrapper paid for no benefit. BnB also clones the selector at every node, eagerly copying the BTreeSets via Cow::Owned -- which dominates the BnB cost. This commit: - Replaces both BTreeSet<usize> fields with a hand-rolled Bitset (Vec<u64> + bit_capacity, with trailing/leading_zeros iteration). Clones become contiguous memcpy of n/64 words. - Drops Cow on candidate_order in favour of Arc<Vec<usize>>, since it is write-once (sort) then read-many (BnB). - Bumps the crate to 0.5.0; selected_indices() and banned() now return &Bitset. Benchmarks (5-run mean, release, single-threaded): bnb: 1.55s -> 0.67s (2.3x) lowest_fee: 25.8s -> 11.7s (2.2x) No new dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cover edge cases of the hand-rolled Bitset: - Empty/zero-capacity, default, capacity reporting - insert/remove return values, out-of-range safety - Bits at word boundaries (0, 63, 64, 65, 127, 128, ...) - Non-word-aligned capacity, iteration with size_hint - DoubleEndedIterator: front/back interleaving, full back-drain - Equality Two proptest properties guarantee equivalence with BTreeSet: - Any random sequence of insert/remove/contains/len/is_empty/iter produces identical results. - Interleaved next() / next_back() yields each set bit exactly once, in the order dictated by the call pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two benchmark groups in benches/coin_selector.rs: - `clone/{n}`: direct cost of CoinSelector::clone() at n = 64/256/1024/4096 candidates. Measures the per-node BnB allocation the bitset/Arc layout was introduced to make cheap. - `run_bnb_lowest_fee/{n}`: end-to-end run_bnb throughput on a deterministic synthetic pool with the LowestFee metric, at n = 20/50/100/200 candidates. Reflects realistic library usage. The pool generator is deterministic so results are comparable across commits. Criterion is added as a dev-dependency (does not affect downstream consumers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

evanlinjin and others added 3 commits May 17, 2026 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46

perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:fix/better-memory

evanlinjin commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evanlinjin commented May 17, 2026

Summary

Benchmarks (Criterion median)

Breaking changes

What this does NOT do

Test plan

Commits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant