refactor(query): support partitioned hash join#19553
refactor(query): support partitioned hash join#19553zhang2014 wants to merge 39 commits intodatabendlabs:mainfrom
Conversation
Under hash shuffle, build and probe data are already partitioned by thread. This replaces the shared hash table (atomic CAS, Mutex, Barrier) with a Doris-style compact hash table (4 bytes/row index-based chain) that each thread builds and probes independently, eliminating all synchronization overhead. - Reorganize memory/ into unpartitioned/ (broadcast) and partitioned/ (shuffle) - Add CompactJoinHashTable<I: RowIndex> with index-based chaining - Add PartitionedBuild with fixed 65536-row chunks and bit-shift addressing - Implement all 7 join types for the partitioned path - Route hash shuffle joins through partitioned pipeline in physical_hash_join Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eam for partitioned hash join Move visited bitmap from CompactJoinHashTable to PartitionedBuild so the hash table is fully immutable after build. Introduce CompactProbeStream implementing the ProbeStream trait for streaming probe with index-based chaining. Replace eager probe()/probe_and_mark_visited() with streaming create_probe_matched/create_probe factory methods. Rewrite all 7 join types (inner, left, left semi, left anti, right, right semi, right anti) with dedicated streaming JoinStream implementations. Right-side joins use field-level split borrowing to avoid borrow conflicts between immutable hash table access and mutable visited marking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker Image for PR
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b9fbcb46a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
...service/src/pipelines/processors/transforms/new_hash_join/partitioned/transform_hash_join.rs
Show resolved
Hide resolved
...service/src/pipelines/processors/transforms/new_hash_join/partitioned/transform_hash_join.rs
Outdated
Show resolved
Hide resolved
Docker Image for PR
|
Docker Image for PR
|
Docker Image for PR
|
Docker Image for PR
|
Docker Image for PR
|
Docker Image for PR
|
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
refactor(query): support partitioned hash join
Tests
Type of change
This change is