Skip to content

feat: transitive predicate propagation across multi-table join chains#21423

Closed
xanderbailey wants to merge 1 commit into
apache:mainfrom
xanderbailey:xb/transitive_pushdown
Closed

feat: transitive predicate propagation across multi-table join chains#21423
xanderbailey wants to merge 1 commit into
apache:mainfrom
xanderbailey:xb/transitive_pushdown

Conversation

@xanderbailey

@xanderbailey xanderbailey commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Right now, PushDownFilter does single-hop predicate inference through join equi-conditions. If you have a.x = b.y as a join key and WHERE a.x > 5, it'll infer b.y > 5 and push it down. Great.

But for multi-table join chains like:

SELECT * FROM a
  JOIN b ON a.x = b.y
  JOIN c ON b.y = c.z
WHERE a.x > 5

...it never derives c.z > 5. The problem is that b.y > 5 gets pushed below the first join and becomes invisible when processing the second join. This means table c gets scanned without any filter, which is a missed optimization that matters for any query joining 3+ tables on shared keys.

Spark, and Calcite all do transitive predicate closure -- DataFusion should too.

What changes are included in this PR?

Rather than adding a new optimizer rule, this extends the existing PushDownFilter rule with equivalence-class-based inference.

The key additions in push_down_filter.rs:

  • ColumnEquivalences -- a simple union-find over Column that tracks which columns are transitively equal
  • collect_descendant_equalities() -- walks the plan subtree collecting column equalities from descendant INNER join ON clauses. Only collects from INNER joins (outer join equalities don't unconditionally hold). Stops at projections, aggregates, limits, and other nodes that change column identity or row cardinality.
  • infer_predicates_from_equivalence_classes() -- for each single-column predicate, generates equivalent predicates for all columns in the same equivalence class
  • The existing infer_join_predicates() is modified to build equivalence classes from the current join's keys + descendant joins + WHERE-clause column equalities, then use them for inference

The existing ON-filter inference path (infer_join_predicates_from_on_filters) is kept as-is since it correctly handles join-type-specific directionality for ON clause predicates.

Also adds push_down_filter_transitive.slt with end-to-end SQL tests that verify both EXPLAIN plans and query correctness.

Are these changes tested?

Yes -- both unit tests and sqllogictests:

Unit tests (new tests in push_down_filter.rs):

  • 3-table and 4-table INNER join chains
  • Mixed INNER + LEFT join chain
  • WHERE-clause column equality (WHERE a.x = b.y AND a.x > 5)
  • FULL JOIN (verifies correct handling at the current join level)
  • Self-join with aliases
  • Projection boundary (verifies recursion stops correctly)
  • Complex ON expressions (verifies a.x + 1 = b.y is not treated as column equality)
  • ColumnEquivalences unit tests (basic operations + path compression)

SQL logic tests (push_down_filter_transitive.slt):

  • 3-table chain: EXPLAIN shows filter pushed to all tables + correct results
  • 4-table chain: same
  • Mixed INNER + LEFT JOIN: filter propagates through both
  • Correctness checks with/without filters

Are there any user-facing changes?

No API changes.

@github-actions github-actions Bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Apr 6, 2026
@xanderbailey xanderbailey force-pushed the xb/transitive_pushdown branch from dd6a6d9 to 00d7e8d Compare April 7, 2026 14:39
@xanderbailey xanderbailey force-pushed the xb/transitive_pushdown branch from 00d7e8d to 74201d8 Compare April 7, 2026 15:04
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the Stale PR has not had any activity for some time label Jun 7, 2026
@github-actions github-actions Bot closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant