Skip to content

Resampling performance improvement and sparse aggregation columns support#3062

Open
IvoDD wants to merge 8 commits into
masterfrom
sparse-resampling-support
Open

Resampling performance improvement and sparse aggregation columns support#3062
IvoDD wants to merge 8 commits into
masterfrom
sparse-resampling-support

Conversation

@IvoDD
Copy link
Copy Markdown
Collaborator

@IvoDD IvoDD commented Apr 30, 2026

Reference Issues/PRs

Monday ref: 11679866800

Depends on PRs #3091 and #3110

Issues

  • There is complicated bucket hopping logic in three places: generate_output_index_column, generate_resampling_output_column, SortedAggregator::aggregate
  • The bucket hopping logic involves many branches with loads of checks

Changes (split per commit for easier review)

  1. Adds C++ benchmarks which measures the CPU intensive part of resampling
  2. Pure move of the generate_output_index_column to sorted_aggregation.cpp.
    • This way all bucket hopping logic is in one place.
  3. Construct a ResampleMapping in generate_output_index_column and use it directly in other methods.
    • ResampleMapping just has a mapping from output_row to (start_column_index, start_column_offset), (end_column_index, end_column_offset).
    • Resolves the 3 places with similar logic.
    • Makes the implementation of sparse aggregation easier.
  4. Use galloping search in generate_output_index_column to skip past all rows in a single bucket at once.
    • Index column construction was the bottleneck: aggregation vectorises well but index iteration does not.
    • Changes complexity from O(num_input_rows + num_buckets) to O(num_buckets × log(rows_per_bucket)).
    • Always ≤ O(num_input_rows + num_buckets) even when num_buckets ≥ num_input_rows.
  5. Preallocate the output index column to min(num_buckets, num_input_rows) instead of num_buckets.
    • Galloping search has a higher constant than linear scan and regresses at low rows per bucket.
    • Slightly improves the case where most buckets are empty due to smaller allocation.
  6. Use a runtime heuristic to choose between linear scan and galloping search.
    • Linear scan is faster below ~32 rows/bucket (because of smaller constant and better branch prediction); galloping search is faster above.
    • Threshold determined empirically from benchmarks at intermediate bucket counts. Extra benchmarking was done with more parametrization of the existing benchmark. Not kept in PR to avoid a huge amount of benchmarking code.
    • Recovers the Dense-100k and Empty regressions from commit 3 while retaining all gains elsewhere.
  7. Implement sparse resampling.
    • Small change made straightforward by the ResampleMapping from commit 2.
    • Minimal overhead for the dense case.

Resample benchmark timings

BM_resample/<rows_per_seg>/<num_segs>/<num_buckets>/<num_cols>. Total rows ~1M.
Source: cpp/arcticdb/processing/test/benchmark_resample.cpp. Times in ms, --benchmark_min_time=2s.

Regime Args rows/bucket Description
Dense-1k 100k × 10, 1k buckets ~1000 Many rows/bucket, single row-slice
Dense-100 100k × 10, 10k buckets ~100 Medium rows/bucket, single row-slice
Dense-10 100k × 10, 100k buckets ~10 Few rows/bucket, single row-slice
Spanning 2k × 500, 100 buckets ~10k Buckets span multiple row-slices
Empty 100k × 10, 10M buckets <1 Bucket smaller than row spacing; most empty

1 aggregation column

# Change D-1k D-100 D-10 Spanning Empty
0 Baseline 1.27 1.34 1.47 1.65 11.1
1 Code move 1.02 (−20%) 1.12 (−16%) 1.27 (−14%) 1.40 (−15%) 11.1 (0%)
2 ResampleMapping 1.02 (−20%) 1.12 (−16%) 1.32 (−10%) 1.40 (−15%) 11.8 (+6%)
3 Galloping search 0.059 (−95%) 0.385 (−71%) 2.94 (+100%) 0.285 (−83%) 21.9 (+97%)
4 Bounded allocation 0.058 (−95%) 0.396 (−70%) 2.91 (+98%) 0.291 (−82%) 21.5 (+94%)
5 Heuristic (lin/EUB) 0.059 (−95%) 0.383 (−71%) 1.27 (−14%) 0.293 (−82%) 11.5 (+4%)
6 Sparse-input support 0.068 (−95%) 0.449 (−66%) 1.28 (−13%) 0.296 (−82%) 11.5 (+4%)

100 aggregation columns

# Change D-1k D-100 D-10 Spanning Empty
0 Baseline 1.37 1.43 1.56 6.22 48.0
1 Code move 1.11 (−19%) 1.18 (−17%) 1.34 (−14%) 5.92 (−5%) 46.2 (−4%)
2 ResampleMapping 1.11 (−19%) 1.19 (−17%) 1.39 (−11%) 5.87 (−6%) 50.4 (+5%)
3 Galloping search 0.148 (−89%) 0.471 (−67%) 2.96 (+90%) 4.65 (−25%) 63.1 (+31%)
4 Bounded allocation 0.148 (−89%) 0.480 (−66%) 2.95 (+89%) 4.67 (−25%) 44.1 (−8%)
5 Heuristic (lin/EUB) 0.149 (−89%) 0.477 (−67%) 1.33 (−15%) 4.70 (−24%) 35.9 (−25%)
6 Sparse-input support 0.158 (−88%) 0.537 (−62%) 1.35 (−13%) 4.94 (−21%) 36.0 (−25%)

Deltas vs baseline (row 0).

Notes on benchmark results

  • Load average varied across runs so there are some artifacts in results like "Code move" improvements.
  • Galloping search improves the speed when there are more rows in a single bucket significantly. Thorough benchmarking showed exponential upper bound (EUB) becomes faster than linear search at ~32 rows per bucket. Hence we see some performance regressions in the 10 rows per bucket and in the mostly empty bucket cases.
  • Bounded allocation mostly helps the empty case as expected
  • Using the heuristic to choose between EUB and linear search helps when rows_per_bucket < 32. It is even more efficient than the baseline due to slightly better branch prediction (improved use of ARCTICDB_LIKELY and ARCTICDB_UNLIKELY).
  • Final state: every regime at or faster than baseline; Dense 1000 rows per bucket is the biggest winner with 20x improvement; Mostly empty bucket is the only usecase with no improvement and remains around baseline (+4%)

@IvoDD IvoDD changed the base branch from master to arrow-use-in-memory-storage-for-unit-tests April 30, 2026 10:17
@maxim-morozov maxim-morozov self-requested a review April 30, 2026 16:42
@IvoDD IvoDD force-pushed the arrow-use-in-memory-storage-for-unit-tests branch from 419c30a to 0de92a2 Compare May 5, 2026 14:11
Base automatically changed from arrow-use-in-memory-storage-for-unit-tests to master May 7, 2026 11:30
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from a5ac868 to a9e8ee4 Compare May 11, 2026 09:18
@IvoDD IvoDD changed the base branch from master to binary-search-utils May 11, 2026 09:18
@IvoDD IvoDD force-pushed the sparse-resampling-support branch 2 times, most recently from 36122bc to 4231a4f Compare May 12, 2026 15:18
@IvoDD IvoDD force-pushed the binary-search-utils branch 2 times, most recently from 5679aa0 to 4b7e881 Compare May 13, 2026 08:20
@IvoDD IvoDD force-pushed the sparse-resampling-support branch 3 times, most recently from 210a17b to 086284c Compare May 13, 2026 14:53
@IvoDD IvoDD mentioned this pull request May 14, 2026
5 tasks
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from 086284c to 5e4edb7 Compare May 14, 2026 12:10
@IvoDD IvoDD added the patch Small change, should increase patch version label May 14, 2026
@IvoDD IvoDD changed the title [Draft] Sparse resampling support Resampling performance improvement and sparse aggragation columns support May 14, 2026
@IvoDD IvoDD changed the base branch from binary-search-utils to binary-search-utils-optimization May 14, 2026 13:12
@IvoDD IvoDD marked this pull request as ready for review May 14, 2026 13:47
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 14, 2026

ArcticDB Code Review Summary

Delta since last review is a single commit (89d9fd8 "Address Alex comments") touching cpp/arcticdb/processing/sorted_aggregation.cpp and cpp/arcticdb/processing/test/test_resample.cpp. The accumulate refactor, multiplication-based threshold (avoiding division by zero), and added comments are all correct improvements over the previous code. No new issues introduced.

PR Title and Description

  • Typo "aggragation" -> "aggregation" fixed in title.

Documentation

  • Sparse resampling support still not reflected in technical docs: docs/claude/cpp/PROCESSING.md and docs/claude/python/QUERY_PROCESSING.md describe resampling but do not mention that sparse aggregation columns are now supported. The previously-thrown "Cannot aggregate sparse column" schema error has been removed (and the corresponding test_resampling_sparse_data deleted), so a brief note that sparse aggregation columns are now supported would keep the docs aligned with behaviour.
  • No user-facing doc or tutorial mention of the new sparse resampling capability. Optional but worth considering before release.

Notes (no action required)

  • Multiplication form total_input_rows < linear_scan_threshold * num_buckets is mathematically equivalent to the old total_input_rows / num_buckets < linear_scan_threshold for realistic inputs; overflow would require more than 5e17 buckets, so not a practical concern.
  • Removing the post-advance_boundary_past_value re-check of current_bucket.contains(it->value()) in generate_output_index_column is safe because advance_boundary_past_value always lands bucket_end_it on a bucket whose half-open interval contains the probed value for both LEFT- and RIGHT-closed boundaries. The previous defensive re-check was redundant.
  • This PR is stacked on binary-search-utils-optimization and depends on PRs Binary search utils #3091 and Optimize binary search methods #3110 - merge order matters.

@IvoDD IvoDD changed the title Resampling performance improvement and sparse aggragation columns support Resampling performance improvement and sparse aggregation columns support May 14, 2026
Comment thread cpp/arcticdb/processing/test/test_resample.cpp
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp Outdated
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp Outdated
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp
Comment thread cpp/arcticdb/processing/sorted_aggregation.cpp
@IvoDD IvoDD force-pushed the binary-search-utils-optimization branch from 0c2d98c to 6120021 Compare May 21, 2026 09:18
IvoDD added a commit that referenced this pull request May 21, 2026
#### Reference Issues/PRs
Optimizations on top of #3091
Used in #3062 

#### What does this implement or fix?
Some micro optimizations on binary search methods:
- Don't keep `TypedBlockData` in `ColumnDataIterator`. Instead only keep
`block_data_` and `block_size_`
- Don't recalculate block pointer and size when we already know them
during gallop

#### Any other comments?
Benchmarks for all search and iteration methods:

  | Benchmark | Before (ns) | After (ns) | Delta |
  |---|---:|---:|---:|
| iterate_irregular_blocks_1 (one row per block) | 478,496 | 311,163 |
−35.0% |
  | iterate_with_iterator (100 rows) | 798 | 719 | −9.9% |
  | exponential_lb_single_block (in first 100) | 356 | 323 | −9.2% |
  | exponential_lb_single_block (full gallop) | 458 | 424 | −7.4% |
  | exponential_lb_regular (in first 100) | 364 | 339 | −6.7% |
  | exponential_lb_irregular_1000 (in first 100) | 360 | 335 | −6.7% |
  | exponential_lb_irregular_1000 (full gallop) | 496 | 476 | −3.9% |
  | exponential_lb_regular (full gallop) | 504 | 489 | −2.9% |
  | exponential_lb_irregular_1 (in first 100) | 464 | 455 | −2.0% |
  | exponential_lb_irregular_1 (full gallop) | 687 | 679 | −1.3% |
  | lower_bound_single_block | 411 | 394 | −4.1% |
  | lower_bound_irregular_1000 | 444 | 431 | −3.0% |
  | lower_bound_irregular_1 | 595 | 579 | −2.8% |
  | lower_bound_regular_blocks | 443 | 436 | −1.4% |
  | iterate_single_block | 27,305 | 27,247 | −0.2% |
  | iterate_regular_blocks | 29,051 | 28,734 | −1.1% |
  | iterate_irregular_blocks_1000 | 28,136 | 27,893 | −0.9% |
| iterate_with_scalar_at (100 rows) | 182,183,122 | 182,088,026 | −0.1%
|

#### Checklist

<details>
  <summary>
   Checklist for code changes...
  </summary>
 
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

Co-authored-by: Ivo <ivo.dilov@man.com>
Base automatically changed from binary-search-utils-optimization to master May 21, 2026 12:06
Ivo added 8 commits May 21, 2026 15:19
Previously each of `generate_output_index_column`,
`generate_resample_output_column` and `aggregate` had complicated logic
to identify which row corresponds to which output column.

This is simplified by creating a `ResampleMapping` when building the
output index column to store which output row corresponds to which input
values. Then `ResampleMapping` is used in the other methods.
A lot of resampling runtime was spent during generation of output index
column. This can be sped up significantly in the common case where
number of buckets is much smaller then input rows by using exponential
binary search.
Helps speed up and decrease memory usage for the very rare case where num_buckets >> num_input_rows.
With benchmarking of various rows_per_bucket it was confirmed that
exponential_search becomes faster than linear scan at around 32 elements.

For <32 rows per bucket the linear pass is faster. For >32 the
exponential search is faster.
Construct output agg column based on rs_index of input sparse columns.

Then use sparse iterators to populate the values.
@IvoDD IvoDD force-pushed the sparse-resampling-support branch from 5e4edb7 to 89d9fd8 Compare May 21, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Small change, should increase patch version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants