Skip to content

Commit b5c2b68

Browse files
committed
fix: tighten and clean up null-rejection analysis in EliminateOuterJoin
The null-rejection analysis used by `EliminateOuterJoin` to decide whether an outer join can be tightened had a soundness gap: `IS TRUE` / `IS FALSE` / `IS NOT UNKNOWN` recursed into their operand at any depth. These predicates return a definite FALSE on NULL input (never NULL), so they're null-rejecting at the top of a WHERE clause but not under a surrounding `NOT` (or any context that resets `top_level`), where the FALSE inverts to TRUE and accepts NULL rows. The existing `IS NOT NULL` arm already gated on `top_level`; the three others now share the same gate. While here, clean up several other rough edges: * Defer to `Operator::returns_null_on_null()` instead of maintaining a hand-rolled list of null-propagating binary operators in `extract_non_nullable_columns`. The hand list previously only covered the six comparison operators; the upstream method is the single source of truth and also picks up arithmetic / bitwise / regex / SIMILAR TO / array-containment operators that were missed. * Rename `extract_non_nullable_columns` to `extract_null_rejecting_columns`. The returned columns are columns the predicate rejects NULL on (a predicate property), not columns that are themselves non-nullable (a schema property). The two are distinct. * Rewrite the OR / nested-AND merge logic from an O(n*m) double-loop with break into bucketed per-side lookups. * Handle `Expr::Negative` alongside `Expr::Not` (both propagate NULL on their operand). * Unify `IsNotNull` / `IsTrue` / `IsFalse` / `IsNotUnknown` into a single match arm now that they share the same `top_level` gate. * Rewrite the function-level doc and inline comments. Tests: * 7 new unit tests covering the soundness fix (three `NOT(... IS TRUE/FALSE/NOT UNKNOWN)` regressions), the IS DISTINCT FROM / IS NOT DISTINCT FROM negative cases, and positive coverage for the arithmetic / unary-minus shapes that the operator-list extension enables. * 1 new sqllogictest case asserting both the logical plan (`Left Join` preserved) and the row set (LEFT-padded NULL rows survive) for `WHERE NOT((t2.y > 150) IS TRUE)` — the row-set assertion is what catches a future regression that an explain-only test would let through under snapshot updates.
1 parent 50d74a7 commit b5c2b68

2 files changed

Lines changed: 324 additions & 103 deletions

File tree

0 commit comments

Comments
 (0)