Commit b5c2b68
committed
fix: tighten and clean up null-rejection analysis in EliminateOuterJoin
The null-rejection analysis used by `EliminateOuterJoin` to decide
whether an outer join can be tightened had a soundness gap:
`IS TRUE` / `IS FALSE` / `IS NOT UNKNOWN` recursed into their operand
at any depth. These predicates return a definite FALSE on NULL input
(never NULL), so they're null-rejecting at the top of a WHERE clause
but not under a surrounding `NOT` (or any context that resets
`top_level`), where the FALSE inverts to TRUE and accepts NULL rows.
The existing `IS NOT NULL` arm already gated on `top_level`; the
three others now share the same gate.
While here, clean up several other rough edges:
* Defer to `Operator::returns_null_on_null()` instead of maintaining
a hand-rolled list of null-propagating binary operators in
`extract_non_nullable_columns`. The hand list previously only
covered the six comparison operators; the upstream method is the
single source of truth and also picks up arithmetic / bitwise /
regex / SIMILAR TO / array-containment operators that were missed.
* Rename `extract_non_nullable_columns` to
`extract_null_rejecting_columns`. The returned columns are columns
the predicate rejects NULL on (a predicate property), not columns
that are themselves non-nullable (a schema property). The two are
distinct.
* Rewrite the OR / nested-AND merge logic from an O(n*m) double-loop
with break into bucketed per-side lookups.
* Handle `Expr::Negative` alongside `Expr::Not` (both propagate NULL
on their operand).
* Unify `IsNotNull` / `IsTrue` / `IsFalse` / `IsNotUnknown` into a
single match arm now that they share the same `top_level` gate.
* Rewrite the function-level doc and inline comments.
Tests:
* 7 new unit tests covering the soundness fix (three
`NOT(... IS TRUE/FALSE/NOT UNKNOWN)` regressions), the IS DISTINCT
FROM / IS NOT DISTINCT FROM negative cases, and positive coverage
for the arithmetic / unary-minus shapes that the operator-list
extension enables.
* 1 new sqllogictest case asserting both the logical plan
(`Left Join` preserved) and the row set (LEFT-padded NULL rows
survive) for `WHERE NOT((t2.y > 150) IS TRUE)` — the row-set
assertion is what catches a future regression that an explain-only
test would let through under snapshot updates.1 parent 50d74a7 commit b5c2b68
2 files changed
Lines changed: 324 additions & 103 deletions
File tree
- datafusion
- optimizer/src
- sqllogictest/test_files
0 commit comments