Skip to content

perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46

Draft
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:fix/better-memory
Draft

perf!: replace BTreeSet+Cow with Bitset, Arc candidate_order#46
evanlinjin wants to merge 3 commits into
bitcoindevkit:masterfrom
evanlinjin:fix/better-memory

Conversation

@evanlinjin
Copy link
Copy Markdown
Member

Summary

Replace Cow<'_, BTreeSet<usize>> for selected/banned and Cow<'_, [usize]> for candidate_order with:

  • A hand-rolled Bitset (Vec<u64> + bit capacity, with trailing_zeros/leading_zeros iteration) for selected and banned.
  • Arc<Vec<usize>> for candidate_order — sorted once in BnbIter::new, then read-only.

The previous Cow fields were always constructed ::Owned — the borrowed variant was never used — so the wrapper paid for no benefit while still triggering eager BTreeSet clones during BnB.

Bumps the crate to 0.5.0 because the public types of selected_indices() and banned() change from &BTreeSet<usize> to &Bitset.

Benchmarks (Criterion median)

benches/coin_selector.rs, release, single-threaded.

master this PR speedup
clone/64 91 ns 51 ns 1.8×
clone/256 278 ns 39 ns 7.2×
clone/1024 1.34 µs 50 ns 27×
clone/4096 5.68 µs 58 ns 99×
run_bnb_lowest_fee/50 26.0 ms 10.3 ms 2.5×
run_bnb_lowest_fee/100 73.2 ms 40.2 ms 1.8×
run_bnb_lowest_fee/200 1.27 s 1.10 s 1.15× (CIs overlap)

clone scales near-flat (Arc bump + n/64-word memcpy) versus master's linear scaling (BTreeSet node-by-node copy). BnB benefit is largest in the mid range where clone cost is a meaningful fraction of total work; for very large n the BnB search itself dominates so faster clones matter less.

Breaking changes

  • CoinSelector::selected_indices() now returns &Bitset (previously &BTreeSet<usize>).
  • CoinSelector::banned() now returns &Bitset.

Bitset exposes contains/len/is_empty/iter (ExactSize + DoubleEnded), so most call patterns port across with little or no change.

What this does NOT do

A flamegraph on this branch (cargo flamegraph --bench coin_selector -- --bench --profile-time 10 run_bnb_lowest_fee/100) shows ~65% of remaining BnB time goes to selected_value + input_weight recomputation. That's a separate, deeper refactor (delta-aware metric evaluator); it'll be explored in a follow-up branch.

Test plan

  • cargo test — all existing tests pass
  • New unit + proptest coverage for Bitset (src/bitset.rs, 13 unit tests + 2 proptests comparing every operation against BTreeSet)
  • Criterion benchmark added (benches/coin_selector.rs) documenting the gains and acting as a regression check
  • Reviewer to confirm the API break is acceptable for 0.5.0

Commits

  1. feat!: replace BTreeSet+Cow with Bitset, Arc candidate_order — the main change
  2. test: add unit and property tests for Bitset — coverage
  3. bench: add Criterion benchmarks for clone and BnB — perf regression net

No new runtime dependencies (criterion is dev-only).

🤖 Generated with Claude Code

evanlinjin and others added 3 commits May 17, 2026 20:48
CoinSelector previously stored selected/banned as Cow<BTreeSet<usize>>
and candidate_order as Cow<[usize]>. The Cow's borrowed variant was
never used, so the wrapper paid for no benefit. BnB also clones the
selector at every node, eagerly copying the BTreeSets via Cow::Owned --
which dominates the BnB cost.

This commit:
- Replaces both BTreeSet<usize> fields with a hand-rolled Bitset
  (Vec<u64> + bit_capacity, with trailing/leading_zeros iteration).
  Clones become contiguous memcpy of n/64 words.
- Drops Cow on candidate_order in favour of Arc<Vec<usize>>, since
  it is write-once (sort) then read-many (BnB).
- Bumps the crate to 0.5.0; selected_indices() and banned() now
  return &Bitset.

Benchmarks (5-run mean, release, single-threaded):
  bnb:        1.55s -> 0.67s  (2.3x)
  lowest_fee: 25.8s -> 11.7s  (2.2x)

No new dependencies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover edge cases of the hand-rolled Bitset:
- Empty/zero-capacity, default, capacity reporting
- insert/remove return values, out-of-range safety
- Bits at word boundaries (0, 63, 64, 65, 127, 128, ...)
- Non-word-aligned capacity, iteration with size_hint
- DoubleEndedIterator: front/back interleaving, full back-drain
- Equality

Two proptest properties guarantee equivalence with BTreeSet:
- Any random sequence of insert/remove/contains/len/is_empty/iter
  produces identical results.
- Interleaved next() / next_back() yields each set bit exactly once,
  in the order dictated by the call pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two benchmark groups in benches/coin_selector.rs:

- `clone/{n}`: direct cost of CoinSelector::clone() at n =
  64/256/1024/4096 candidates. Measures the per-node BnB allocation
  the bitset/Arc layout was introduced to make cheap.
- `run_bnb_lowest_fee/{n}`: end-to-end run_bnb throughput on a
  deterministic synthetic pool with the LowestFee metric, at n =
  20/50/100/200 candidates. Reflects realistic library usage.

The pool generator is deterministic so results are comparable across
commits. Criterion is added as a dev-dependency (does not affect
downstream consumers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant