fix: handle IS TRUE correctly in EliminateOuterJoin#22444
Open
neilconway wants to merge 1 commit into
Open
Conversation
IS TRUE properly in EliminateOuterJoinIS TRUE correctly in EliminateOuterJoin
Contributor
Author
|
@SubhamSinghal @2010YOUY01 FYI, I'd love feedback on this if you have a moment (related to #21549) |
The null-rejection analysis used by `EliminateOuterJoin` to decide whether an outer join can be tightened had a soundness gap: `IS TRUE` / `IS FALSE` / `IS NOT UNKNOWN` recursed into their operand at any depth. These predicates return a definite FALSE on NULL input (never NULL), so they're null-rejecting at the top of a WHERE clause but not under a surrounding `NOT` (or any context that resets `top_level`), where the FALSE inverts to TRUE and accepts NULL rows. The existing `IS NOT NULL` arm already gated on `top_level`; the three others now share the same gate. While here, clean up several other rough edges: * Defer to `Operator::returns_null_on_null()` instead of maintaining a hand-rolled list of null-propagating binary operators in `extract_non_nullable_columns`. The hand list previously only covered the six comparison operators; the upstream method is the single source of truth and also picks up arithmetic / bitwise / regex / SIMILAR TO / array-containment operators that were missed. * Rename `extract_non_nullable_columns` to `extract_null_rejecting_columns`. The returned columns are columns the predicate rejects NULL on (a predicate property), not columns that are themselves non-nullable (a schema property). The two are distinct. * Rewrite the OR / nested-AND merge logic from an O(n*m) double-loop with break into bucketed per-side lookups. * Handle `Expr::Negative` alongside `Expr::Not` (both propagate NULL on their operand). * Unify `IsNotNull` / `IsTrue` / `IsFalse` / `IsNotUnknown` into a single match arm now that they share the same `top_level` gate. * Rewrite the function-level doc and inline comments. Tests: * 7 new unit tests covering the soundness fix (three `NOT(... IS TRUE/FALSE/NOT UNKNOWN)` regressions), the IS DISTINCT FROM / IS NOT DISTINCT FROM negative cases, and positive coverage for the arithmetic / unary-minus shapes that the operator-list extension enables. * 1 new sqllogictest case asserting both the logical plan (`Left Join` preserved) and the row set (LEFT-padded NULL rows survive) for `WHERE NOT((t2.y > 150) IS TRUE)` — the row-set assertion is what catches a future regression that an explain-only test would let through under snapshot updates.
b5c2b68 to
0fd7fd5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
IS TRUEcorrectly #22441.Rationale for this change
EliminateOuterJoinneeds to identify "null-rejecting" columns; a column is null-rejecting with respect to an expression if a NULL value in the column yields a NULL or false value for the expression.This analysis was unsound with respect to
IS TRUE,IS FALSE, andIS NOT UNKNOWNoperators: those operators are null-rejecting at the toplevel of theWHEREclause, but they may not be null-rejecting when nested inside an expression tree. The analysis checked this correctly forIS NOT NULLbut neglected to apply similar logic for these other three operators. This resulted in incorrectly converting outer joins to inner joins in some cases, producing incorrect query results.As part of fixing this, this PR also makes a bunch of improvements to the null-rejection analysis, enumerated below, resulting in more accurate null-rejection analysis.
What changes are included in this PR?
extract_non_nullable_columnstoextract_null_rejecting_columns: "non_nullable" is a property of a column, "null-rejecting" is a more complex property describing the relationship between a column and an expression.Operator::returns_null_on_null()instead of maintaining a hand-rolled and very incomplete list of null-propagating binary operators. We now compute null-rejection correctly for arithmetic, bitwise, and regex operators, for example.Expr::NegativeORand nestedANDoperators to be more clear, and also more efficientAre these changes tested?
Yes; new unit and SLT tests added.
Are there any user-facing changes?
Yes, query result correctness fix.