Add group join physical optimizer#21995
Conversation
73f4713 to
5fe7219
Compare
|
run benchmark tpch tpcds tpch10 |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing groupjoin-eliminate-extra-hash-build (5fe7219) to 2f2fe8f (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@Dandandan Thanks for running the benchmarks. There are additional equivalencies/optimizations of hashjoin + groupby that can be turned into groupjoin from the paper. I wanted to make this PR just the initial optimization + create the groupjoin rule. If its okay, I will ping you in another draft PR which will contain all optimizations in the paper so you can review + benchmark. I listed the optimizations im talking about at the bottom of this PRs description. Ignore the cost based one I think this is not really currently applicable |
Sounds great! |
|
This looks awesome! I have a question, why physical optimizer rule? It looks simpler to implement a logical optimizer rule instead. |
Rationale for this change
Some queries combining a join and a group-by on the same key can be executed as a single groupjoin operator. This optimization targets a common analytical pattern — dimension-fact joins where a smaller dimension table is joined with a larger fact table and aggregated by the join key.
This is based on research: Moerkotte & Neumann PVLDB 2011. The paper introduces the groupjoin algebraic equivalence and proves its correctness for both inner and outer joins, provided the join key is a key of the build side.
This PR implements the groupjoin operator and optimizer rule, using the memoizing groupjoin strategy from the paper: a single hash table serves as both the join lookup and the aggregation group table, with probe-side rows updating accumulators in-place. This eliminates the redundant hash table construction and intermediate result materialization that occur when the join and aggregate run as separate operators. This addresses #13243.
locally I saw: TPC-H Q13 (SF10): 299ms → 254ms (~15% faster), not zero regressions
What changes are included in this PR?
New physical operator —
GroupJoinExec(physical-plan/src/joins/group_join.rs):GroupValueshash table from the left (build) sideGroupsAccumulators in-place for matching rowsNew physical optimizer rule —
GroupJoinOptimizer(physical-optimizer/src/group_join.rs):AggregateExecaboveHashJoinExec(looking through intermediateProjectionExec)GroupsAccumulatorCombinePartialFinalAggregatein the optimizer pipelineHow can this be extended?
The paper describes three additional strategies and optimizations we did not implement:
Eager Right Aggregation (Strategy 1) — Pre-aggregate the probe side before the join, reducing its cardinality. For Q13, this would reduce the 15M order rows to ~1.5M pre-aggregated groups before joining with 1.5M customers. The paper reports >2x improvement on Q13 with this strategy.
Superset GROUP BY (Theorem 3 in the paper) — Handle cases where GROUP BY keys are a superset of the join keys (extra keys from the build side). This would enable queries like Q3 (
GROUP BY l_orderkey, o_orderdate, o_shipprioritywith join ono_orderkey = l_orderkey). Requires the probe side to look up by the join key subset while the hash table is keyed by the full GROUP BY.Cost-model strategy selection (Section 4 of Fent et al.) — Choose between the four strategies at optimization time based on input cardinalities and selectivities, rather than always using Strategy 2.