You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-56677][SQL] Propagate filter conditions through Join nodes in PlanMerger
### What changes were proposed in this pull request?
`PlanMerger` now supports filter propagation through `Join` nodes when merging similar subplans. Previously, when two subplans contained identical `Join` nodes but differed only in a filter applied to one of the join's children, they could not be merged.
This PR adds the ability to propagate such filter conditions through a `Join` and into the parent `Aggregate`'s `FILTER` clause. A new `filterSafeForJoin` helper checks that the filter originates from the non-nullable (preserved) side of the join: the left side of `LeftOuter`/`LeftSemi`/`LeftAnti`, the right side of `RightOuter`, or either side of `Inner`/`Cross`. `FullOuter` joins are not eligible.
The feature is gated by a new SQL config `spark.sql.optimizer.mergeSubplans.filterPropagation.throughJoin.enabled` (default: `false`).
### Why are the changes needed?
Without this change, scalar subqueries that differ only in a filter on one side of an identical join cannot be merged, resulting in redundant scans and compute. For example:
SELECT
(SELECT sum(key) FROM t1 JOIN t2 ON t1.id = t2.id),
(SELECT sum(key) FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.b > 1)
Both subqueries scan `t1` and `t2` in full even though they share the same base join. After this change a single merged scan is used and the second subquery's result is derived from it via an aggregate `FILTER` clause.
### Does this PR introduce _any_ user-facing change?
Yes. When `spark.sql.optimizer.mergeSubplans.filterPropagation.filterPropagationThroughJoin.enabled` is set to `true`, the optimizer may merge scalar subqueries that were previously kept separate, reducing the number of scan and join operations.
### How was this patch tested?
Added unit tests in `MergeSubplansSuite`:
- Merge with filter on left inner join child
- Merge with filter on right inner join child
- No merge when both join children have independent filters
- Merge with filter on the preserved side of a `LeftSemi` join
- No merge when filter is on the non-output side of a `LeftSemi` join
- No merge when filter is on the nullable side of an outer join
- No merge when the feature is disabled via config
Added integration test in `PlanMergeSuite` verifying correctness (`checkAnswer`) and plan shape (`SubqueryExec`/`ReusedSubqueryExec` counts) for both the enabled and disabled config cases, with and without AQE.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6
Closes#55628 from peter-toth/SPARK-56677-filter-propagation-through-join.
Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
0 commit comments