Skip to content

Optimize binary search methods#3110

Merged
IvoDD merged 1 commit into
masterfrom
binary-search-utils-optimization
May 21, 2026
Merged

Optimize binary search methods#3110
IvoDD merged 1 commit into
masterfrom
binary-search-utils-optimization

Conversation

@IvoDD
Copy link
Copy Markdown
Collaborator

@IvoDD IvoDD commented May 14, 2026

Reference Issues/PRs

Optimizations on top of #3091
Used in #3062

What does this implement or fix?

Some micro optimizations on binary search methods:

  • Don't keep TypedBlockData in ColumnDataIterator. Instead only keep block_data_ and block_size_
  • Don't recalculate block pointer and size when we already know them during gallop

Any other comments?

Benchmarks for all search and iteration methods:

Benchmark Before (ns) After (ns) Delta
iterate_irregular_blocks_1 (one row per block) 478,496 311,163 −35.0%
iterate_with_iterator (100 rows) 798 719 −9.9%
exponential_lb_single_block (in first 100) 356 323 −9.2%
exponential_lb_single_block (full gallop) 458 424 −7.4%
exponential_lb_regular (in first 100) 364 339 −6.7%
exponential_lb_irregular_1000 (in first 100) 360 335 −6.7%
exponential_lb_irregular_1000 (full gallop) 496 476 −3.9%
exponential_lb_regular (full gallop) 504 489 −2.9%
exponential_lb_irregular_1 (in first 100) 464 455 −2.0%
exponential_lb_irregular_1 (full gallop) 687 679 −1.3%
lower_bound_single_block 411 394 −4.1%
lower_bound_irregular_1000 444 431 −3.0%
lower_bound_irregular_1 595 579 −2.8%
lower_bound_regular_blocks 443 436 −1.4%
iterate_single_block 27,305 27,247 −0.2%
iterate_regular_blocks 29,051 28,734 −1.1%
iterate_irregular_blocks_1000 28,136 27,893 −0.9%
iterate_with_scalar_at (100 rows) 182,183,122 182,088,026 −0.1%

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@IvoDD IvoDD added no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version labels May 14, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

ArcticDB Code Review Summary

No items requiring attention. The optimization is correct, well-scoped, and the benchmark deltas in the PR description validate it.

Verified:

  • gallop_bracket first-block lambdas are safe: prev_block/cur_block remain first_block_idx throughout the first-block probing phase, so the optimized variants do not need to track the block field.
  • The raw-pointer block_begin_ replacing std::optional<TypedBlockData<TDT>> is consistently propagated (copy constructor, dereference, end-sentinel). All callers in column_algorithms.hpp and test_column.cpp updated to current_block_data() == nullptr.
  • New load_current_block computes block->logical_size() / sizeof(RawType), equivalent to the previous TypedBlockData::row_count() for Dim0 (which is static_assert-enforced by the search code paths).

Comment on lines +468 to +472
auto record_probe_in_first_block = [&](size_t next_offset, RawType probe_value) {
prev_offset = cur_offset;
cur_offset = next_offset;
return is_before(probe_value, value);
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these two extra assignments that are omitted really make a difference?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most probably they do not.

Most of the benefit is from reusing the already calculated first_block_row_count and first_block_data in make_iter_in_first_block.

It made sense to also add a first block variant of record_probe as well, to make the invariant clearer

Base automatically changed from binary-search-utils to master May 21, 2026 09:15
Additional micro optimizations on binary search methods:
- Don't keep `TypedBlockData` in `ColumnDataIterator`
- Don't recalculate block pointer and size when we already know them
  during gallop
@IvoDD IvoDD force-pushed the binary-search-utils-optimization branch from 0c2d98c to 6120021 Compare May 21, 2026 09:18
@IvoDD IvoDD merged commit f7767c2 into master May 21, 2026
226 checks passed
@IvoDD IvoDD deleted the binary-search-utils-optimization branch May 21, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release-notes This PR shouldn't be added to release notes. patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants