Skip to content

Add strategy-focused InList benchmarks#21648

Merged
adriangb merged 2 commits intoapache:mainfrom
geoffreyclaude:perf/in_list_benchmarks
Apr 16, 2026
Merged

Add strategy-focused InList benchmarks#21648
adriangb merged 2 commits intoapache:mainfrom
geoffreyclaude:perf/in_list_benchmarks

Conversation

@geoffreyclaude
Copy link
Copy Markdown
Contributor

@geoffreyclaude geoffreyclaude commented Apr 15, 2026

Which issue does this PR close?

Rationale for this change

IN LIST has become the target of several specialized execution strategies, but the existing benchmark coverage in datafusion/physical-expr/benches/in_list.rs is mostly end-to-end and historical in nature. That broad coverage is useful for regression tracking, but it is not ideal for answering more focused questions such as:

  • how a specific strategy behaves as the list size crosses a threshold
  • whether the fast path helps both hit-heavy and miss-heavy workloads
  • how null handling affects the strategy-specific implementations
  • how two-stage string filters behave in their worst-case collision patterns

This PR adds a dedicated strategy benchmark harness for IN LIST so future performance work can be evaluated against a stable, repeatable, strategy-focused corpus.

This PR does not change InList execution behavior. It only adds benchmark coverage.

What changes are included in this PR?

  • Adds datafusion/physical-expr/benches/in_list_strategy.rs
  • Registers the new benchmark target in datafusion/physical-expr/Cargo.toml
  • Keeps the existing benches/in_list.rs benchmark suite intact for broader historical comparison
  • Adds targeted Criterion coverage for the main IN LIST strategy families, including:
    • bitmap-style paths for narrow integer cases
    • branchless and hash/probe-style paths for primitive values at different list-size thresholds
    • reinterpretation-heavy cases such as floats and timestamps
    • string and string-view cases, including stage-2 / prefix-collision stress inputs
    • null-heavy and NOT IN scenarios
    • dictionary and fixed-size-binary coverage used by the broader IN LIST implementation

Are these changes tested?

Yes. I validated this PR with:

  • cargo fmt --all
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings
  • cargo test -p datafusion-physical-expr in_list --lib

Are there any user-facing changes?

No user-facing changes. This PR only adds benchmark coverage for development and performance evaluation.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 15, 2026
Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments refer to optimizations which don't exist. Can we maybe reword this to be more general or refer to these as "cases" and things like "short strings, large list", "large strings, short list", etc.

CI failures are unrelated.

@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_benchmarks branch 2 times, most recently from cffffd5 to 78c9d26 Compare April 15, 2026 19:36
@geoffreyclaude
Copy link
Copy Markdown
Contributor Author

The comments refer to optimizations which don't exist. Can we maybe reword this to be more general or refer to these as "cases" and things like "short strings, large list", "large strings, short list", etc.

CI failures are unrelated.

Good remark, I've reworked the comments and naming to avoid refering to the still hypothetical optims in a new commit.

geoffreyclaude and others added 2 commits April 15, 2026 17:03
Add a new in_list_strategy benchmark file with targeted coverage of each optimization strategy, without replacing the existing in_list benchmarks which are kept intact for historical comparison.

(cherry picked from commit d6e645d)
@adriangb adriangb force-pushed the perf/in_list_benchmarks branch from 78c9d26 to fa93228 Compare April 15, 2026 22:03
@adriangb
Copy link
Copy Markdown
Contributor

Thanks! I've rebased and plan to merge this once CI passes

@adriangb adriangb added this pull request to the merge queue Apr 16, 2026
@adriangb
Copy link
Copy Markdown
Contributor

Thanks @geoffreyclaude !

Merged via the queue into apache:main with commit ef9a80c Apr 16, 2026
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants