Skip to content

Commit ae86ece

Browse files
committed
Address review comments on #5488
Per @penghuo review: PlanUtils.java - Revert mayBeFilterFromBucketNonNull to the original ternary form; drop the early-return refactor (no behavior change, just cleaner). - Drop the !rexCall.getOperands().isEmpty() guard before .get(0): IS NOT NULL is always unary in Calcite, so the check is dead. - Trim the JavaDoc to the essentials (un-merged vs merged-AND shape; concrete partition-key match happens in PPLSimplifyDedupRule#apply). - Promote isNotNullOnRef from package-private to public so the dedup rule can reuse it from a different package. PPLSimplifyDedupRule.java - isNotNullOnPartitionKey now delegates the IS NOT NULL($ref) structural check to PlanUtils.isNotNullOnRef and adds the partition-key index check on top. CalciteExplainIT.java - Add testDedupAfterWherePushDown: an end-to-end regression that runs the shape `... | where <pred> | dedup <field>` and asserts (a) LogicalDedup appears in the explain output (PPLSimplifyDedupRule fired even after FilterMergeRule had a chance to merge the two filters), and (b) EnumerableWindow does NOT appear (the in-memory ROW_NUMBER fallback the bug caused is gone). Signed-off-by: Jialiang Liang <jiallian@amazon.com>
1 parent 0bf8f7c commit ae86ece

3 files changed

Lines changed: 37 additions & 36 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/plan/rule/PPLSimplifyDedupRule.java

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,11 +172,11 @@ protected void apply(
172172
}
173173

174174
private static boolean isNotNullOnPartitionKey(RexNode rex, Set<Integer> partitionKeyIndices) {
175-
return rex instanceof RexCall rexCall
176-
&& rexCall.isA(SqlKind.IS_NOT_NULL)
177-
&& !rexCall.getOperands().isEmpty()
178-
&& rexCall.getOperands().get(0) instanceof RexInputRef ref
179-
&& partitionKeyIndices.contains(ref.getIndex());
175+
if (!PlanUtils.isNotNullOnRef(rex)) {
176+
return false;
177+
}
178+
RexInputRef ref = (RexInputRef) ((RexCall) rex).getOperands().get(0);
179+
return partitionKeyIndices.contains(ref.getIndex());
180180
}
181181

182182
private static @Nullable RelCollation extractCollationFromWindow(RexWindow window) {

core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java

Lines changed: 9 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -786,44 +786,22 @@ static Mapping mapping(List<RexNode> rexNodes, RelDataType schema) {
786786
}
787787

788788
/**
789-
* Loose structural check for the bucket-non-null filter PPL emits as part of the {@code dedup}
790-
* pattern. Returns true for either:
791-
*
792-
* <ul>
793-
* <li>a pure {@code IS NOT NULL($ref)}, or
794-
* <li>any {@code AND} whose conjuncts contain at least one {@code IS NOT NULL($ref)}.
795-
* </ul>
796-
*
797-
* <p>This intentionally accepts the un-merged shape ({@code IS NOT NULL} alone) and the merged
798-
* shape that {@link org.apache.calcite.rel.rules.FilterMergeRule} produces when a user {@code
799-
* where} clause precedes the dedup ({@code AND(IS NOT NULL($pk), <user predicate>)}). The
800-
* concrete partition-key match — and the split-out of any user predicate that was folded in —
801-
* happens inside {@code PPLSimplifyDedupRule#apply}, which has access to the partition keys
802-
* declared by the {@code ROW_NUMBER} window above this filter. Keeping the operand predicate
803-
* order-independent means the simplify rule fires regardless of whether {@code FilterMergeRule}
804-
* has already run, so dedup pushdown is no longer load-bearing on HEP rule ordering.
805-
*
806-
* <p>Misclassification is bounded by the surrounding four-level operand chain: a project that
807-
* does not contain the {@code _row_number_dedup_} column, above a filter whose condition is
808-
* {@code _row_number_ <= N} and which does contain {@code _row_number_dedup_}, above a project
809-
* that does contain {@code _row_number_dedup_} (with a {@code ROW_NUMBER} window function), above
810-
* this filter. That signature is unique to the PPL-emitted dedup pattern, so a user-written
811-
* filter such as {@code IS NOT NULL(x) AND ...} cannot accidentally match.
789+
* Accepts the un-merged {@code IS NOT NULL($ref)} shape and the merged-{@code AND} shape that
790+
* {@link org.apache.calcite.rel.rules.FilterMergeRule} produces when a user {@code where}
791+
* precedes the dedup. The concrete partition-key match — and the split-out of any user predicate
792+
* folded into the AND — happens in {@link PPLSimplifyDedupRule#apply}.
812793
*/
813794
static boolean mayBeFilterFromBucketNonNull(LogicalFilter filter) {
814795
RexNode condition = filter.getCondition();
815-
if (isNotNullOnRef(condition)) {
816-
return true;
817-
}
818-
return condition instanceof RexCall rexCall
819-
&& rexCall.getOperator().equals(SqlStdOperatorTable.AND)
820-
&& rexCall.getOperands().stream().anyMatch(PlanUtils::isNotNullOnRef);
796+
return isNotNullOnRef(condition)
797+
|| (condition instanceof RexCall rexCall
798+
&& rexCall.getOperator().equals(SqlStdOperatorTable.AND)
799+
&& rexCall.getOperands().stream().anyMatch(PlanUtils::isNotNullOnRef));
821800
}
822801

823-
static boolean isNotNullOnRef(RexNode rex) {
802+
public static boolean isNotNullOnRef(RexNode rex) {
824803
return rex instanceof RexCall rexCall
825804
&& rexCall.isA(SqlKind.IS_NOT_NULL)
826-
&& !rexCall.getOperands().isEmpty()
827805
&& rexCall.getOperands().get(0) instanceof RexInputRef;
828806
}
829807

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2153,6 +2153,29 @@ public void testTransposeExplain() throws IOException {
21532153
+ "| transpose 4 column_name='column_names'"));
21542154
}
21552155

2156+
/**
2157+
* Regression test for #5482: a user {@code where} clause adjacent to dedup's bucket-non-null
2158+
* filter must still let {@code PPLSimplifyDedupRule} fire even after {@code FilterMergeRule} has
2159+
* merged the two filters into a single AND. Without the fix, the plan keeps the in-memory {@code
2160+
* ROW_NUMBER} window over a row-fetching scan and dedup pushdown is silently disabled.
2161+
*/
2162+
@Test
2163+
public void testDedupAfterWherePushDown() throws IOException {
2164+
enabledOnlyWhenPushdownIsEnabled();
2165+
String result =
2166+
explainQueryToString(
2167+
"source=opensearch-sql_test_index_account | where age > 25 | dedup gender");
2168+
assertTrue(
2169+
"Expected LogicalDedup in the plan (simplify rule should fire after FilterMergeRule):\n"
2170+
+ result,
2171+
result.contains("LogicalDedup"));
2172+
assertFalse(
2173+
"Unexpected EnumerableWindow in the plan — dedup fell back to the in-memory ROW_NUMBER"
2174+
+ " form, which means PPLSimplifyDedupRule did not fire:\n"
2175+
+ result,
2176+
result.contains("EnumerableWindow"));
2177+
}
2178+
21562179
public void testComplexDedup() throws IOException {
21572180
enabledOnlyWhenPushdownIsEnabled();
21582181
String expected = loadExpectedPlan("explain_dedup_complex1.yaml");

0 commit comments

Comments
 (0)