Skip to content

Commit e5de8f6

Browse files
[BugFix] Restore dedup pushdown when combined with WHERE clause (#5482) (#5488)
* [BugFix] Restore dedup pushdown when combined with WHERE clause (#5482) Run PPLSimplifyDedupRule before FilterMergeRule in the HEP optimizer so the bucket-non-null filter PPL emits for dedup is matched as-is. With the previous order, an upstream user where filter sat adjacent to the bucket-non-null filter; FilterMergeRule fired first and merged them into a conjunction that no longer satisfied PPLSimplifyDedupRule's operand predicate, defeating dedup pushdown to the shard. Use sequential addRuleInstance phases for explicit ordering rather than addRuleCollection, which is documented as non-deterministic in firing order. Adds two regression tests in CalcitePPLDedupTest: one that asserts LogicalDedup is produced under the fixed order, and one that pins the buggy behavior under the swapped order. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * [BugFix] Drop issue-link reference from regression-test JavaDoc (#5488) Per maintainer review feedback, the regression-test JavaDoc for testDedupAfterWhereProducesLogicalDedup mentioned the originating issue URL. The remaining JavaDoc paragraphs already describe the bug shape and the rule-ordering invariant, so the explicit issue link is unnecessary noise. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * [BugFix] Make dedup simplify operand order-independent (#5488) Address review feedback on #5488: extend mayBeFilterFromBucketNonNull to accept the merged conjunction shape FilterMergeRule produces, so PPLSimplifyDedupRule fires regardless of whether FilterMergeRule has already merged the user where clause into the bucket-non-null filter. PPLSimplifyDedupRule.apply now splits the bottom filter into IS_NOT_NULL conjuncts on partition keys (absorbed into LogicalDedup semantics) and any remaining conjuncts (preserved as a separate filter below the new LogicalDedup), so a user predicate that was folded in is no longer dropped. With the operand predicate order-independent, the HEP rule order is no longer a load-bearing invariant. Revert the addRuleCollection -> addRuleInstance change in CalciteToolsHelper.HEP_PROGRAM that the previous patch introduced. Replace the regression test that pinned the buggy rule order with one that asserts the user-visible contract: with where preceding dedup, a LogicalDedup is produced and the user predicate is preserved regardless of which order FilterMergeRule and PPLSimplifyDedupRule fire. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * Address review comments on #5488 Per @penghuo review: PlanUtils.java - Revert mayBeFilterFromBucketNonNull to the original ternary form; drop the early-return refactor (no behavior change, just cleaner). - Drop the !rexCall.getOperands().isEmpty() guard before .get(0): IS NOT NULL is always unary in Calcite, so the check is dead. - Trim the JavaDoc to the essentials (un-merged vs merged-AND shape; concrete partition-key match happens in PPLSimplifyDedupRule#apply). - Promote isNotNullOnRef from package-private to public so the dedup rule can reuse it from a different package. PPLSimplifyDedupRule.java - isNotNullOnPartitionKey now delegates the IS NOT NULL($ref) structural check to PlanUtils.isNotNullOnRef and adds the partition-key index check on top. CalciteExplainIT.java - Add testDedupAfterWherePushDown: an end-to-end regression that runs the shape `... | where <pred> | dedup <field>` and asserts (a) LogicalDedup appears in the explain output (PPLSimplifyDedupRule fired even after FilterMergeRule had a chance to merge the two filters), and (b) EnumerableWindow does NOT appear (the in-memory ROW_NUMBER fallback the bug caused is gone). Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Push user where filter into scan when blocking dedup pushdown PPLSimplifyDedupRule correctly produces Dedup -> Filter(user where) -> Scan when a `where` precedes `dedup`. The Filter between Dedup and Scan blocks DedupPushdownRule's strict Dedup -> Project -> Scan operand chain, so Volcano falls back to PPLDedupConvertRule and the plan ends up with an in-memory ROW_NUMBER window instead of the pushed-down composite + top_hits aggregation. Add a WITH_FILTER operand variant to DedupPushdownRule that matches Dedup -> Filter -> Scan, pushes the filter into the scan, then runs the standard apply() on the resulting Dedup -> Project -> Scan shape. Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Bail when filter is only partially pushable pushDownFilter returns a Filter (not a CalciteLogicalIndexScan) when the predicate analyzer can only partially push the condition. The previous cast would have thrown ClassCastException in that case. Use an instanceof-pattern check so the rule bails out cleanly and leaves the plan untouched, letting other rules handle the residual. Also drop a stale issue-link reference from a test comment. Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Apply spotless formatting to dedup unit test Reflowed the JavaDoc on testDedupAfterWhereProducesLogicalDedupWithProductionHepProgram to match Google Java Format's preferred line break, fixing the spotlessJavaCheck violation that failed the unit-test matrix on CI. Signed-off-by: Jialiang Liang <jiallian@amazon.com> --------- Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> Signed-off-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> Co-authored-by: Jialiang Liang <jiallian@amazon.com>
1 parent cf14aba commit e5de8f6

7 files changed

Lines changed: 264 additions & 2 deletions

File tree

core/src/main/java/org/opensearch/sql/calcite/plan/rule/PPLSimplifyDedupRule.java

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,14 @@
66
package org.opensearch.sql.calcite.plan.rule;
77

88
import java.util.ArrayList;
9+
import java.util.HashSet;
910
import java.util.List;
11+
import java.util.Set;
1012
import java.util.function.Predicate;
1113
import java.util.stream.Collectors;
1214
import javax.annotation.Nullable;
1315
import org.apache.calcite.plan.RelOptRuleCall;
16+
import org.apache.calcite.plan.RelOptUtil;
1417
import org.apache.calcite.plan.RelRule;
1518
import org.apache.calcite.rel.RelCollation;
1619
import org.apache.calcite.rel.RelCollations;
@@ -22,6 +25,7 @@
2225
import org.apache.calcite.rex.RexInputRef;
2326
import org.apache.calcite.rex.RexLiteral;
2427
import org.apache.calcite.rex.RexNode;
28+
import org.apache.calcite.rex.RexUtil;
2529
import org.apache.calcite.rex.RexWindow;
2630
import org.apache.calcite.sql.SqlKind;
2731
import org.apache.calcite.sql.type.SqlTypeName;
@@ -115,8 +119,41 @@ protected void apply(
115119

116120
RelCollation inputCollation = extractCollationFromWindow(windows.get(0));
117121

122+
// Split the bucket-non-null filter into two parts:
123+
// 1) IS_NOT_NULL conjuncts on a partition key — these are the bucket-non-null guards PPL
124+
// emits as part of the dedup pattern; LogicalDedup absorbs their semantics.
125+
// 2) Everything else — for example, a user `where` predicate that FilterMergeRule may
126+
// have folded into the same conjunction, or a user IS_NOT_NULL filter on a non-partition
127+
// column. These must be preserved as a separate filter below the new LogicalDedup so
128+
// user-visible behavior is unchanged regardless of whether FilterMergeRule fired.
129+
Set<Integer> partitionKeyIndices = new HashSet<>();
130+
for (RexNode key : dedupColumns) {
131+
if (key instanceof RexInputRef ref) {
132+
partitionKeyIndices.add(ref.getIndex());
133+
}
134+
}
135+
List<RexNode> bucketNonNullConjuncts = new ArrayList<>();
136+
List<RexNode> remainingConjuncts = new ArrayList<>();
137+
for (RexNode conjunct : RelOptUtil.conjunctions(bucketNonNullFilter.getCondition())) {
138+
if (isNotNullOnPartitionKey(conjunct, partitionKeyIndices)) {
139+
bucketNonNullConjuncts.add(conjunct);
140+
} else {
141+
remainingConjuncts.add(conjunct);
142+
}
143+
}
144+
// Defensive: if no IS_NOT_NULL conjunct on a partition key is present, this filter is not
145+
// actually a bucket-non-null filter — bail out without transforming. The loose operand
146+
// predicate may have matched on an unrelated AND that happens to contain an IS_NOT_NULL on
147+
// some other ref.
148+
if (bucketNonNullConjuncts.isEmpty()) {
149+
return;
150+
}
151+
118152
RelBuilder relBuilder = call.builder();
119153
relBuilder.push(bucketNonNullFilter.getInput());
154+
if (!remainingConjuncts.isEmpty()) {
155+
relBuilder.filter(RexUtil.composeConjunction(relBuilder.getRexBuilder(), remainingConjuncts));
156+
}
120157
List<Pair<RexNode, String>> targetProjections =
121158
projectWithWindow.getNamedProjects().stream()
122159
.filter(p -> !p.getKey().isA(SqlKind.ROW_NUMBER))
@@ -134,6 +171,14 @@ protected void apply(
134171
call.transformTo(relBuilder.build());
135172
}
136173

174+
private static boolean isNotNullOnPartitionKey(RexNode rex, Set<Integer> partitionKeyIndices) {
175+
if (!PlanUtils.isNotNullOnRef(rex)) {
176+
return false;
177+
}
178+
RexInputRef ref = (RexInputRef) ((RexCall) rex).getOperands().get(0);
179+
return partitionKeyIndices.contains(ref.getIndex());
180+
}
181+
137182
private static @Nullable RelCollation extractCollationFromWindow(RexWindow window) {
138183
if (window.orderKeys.isEmpty()) {
139184
return null;

core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -788,15 +788,21 @@ static Mapping mapping(List<RexNode> rexNodes, RelDataType schema) {
788788
return Mappings.target(getSelectColumns(rexNodes), schema.getFieldCount());
789789
}
790790

791+
/**
792+
* Accepts the un-merged {@code IS NOT NULL($ref)} shape and the merged-{@code AND} shape that
793+
* {@link org.apache.calcite.rel.rules.FilterMergeRule} produces when a user {@code where}
794+
* precedes the dedup. The concrete partition-key match — and the split-out of any user predicate
795+
* folded into the AND — happens in {@link PPLSimplifyDedupRule#apply}.
796+
*/
791797
static boolean mayBeFilterFromBucketNonNull(LogicalFilter filter) {
792798
RexNode condition = filter.getCondition();
793799
return isNotNullOnRef(condition)
794800
|| (condition instanceof RexCall rexCall
795801
&& rexCall.getOperator().equals(SqlStdOperatorTable.AND)
796-
&& rexCall.getOperands().stream().allMatch(PlanUtils::isNotNullOnRef));
802+
&& rexCall.getOperands().stream().anyMatch(PlanUtils::isNotNullOnRef));
797803
}
798804

799-
private static boolean isNotNullOnRef(RexNode rex) {
805+
public static boolean isNotNullOnRef(RexNode rex) {
800806
return rex instanceof RexCall rexCall
801807
&& rexCall.isA(SqlKind.IS_NOT_NULL)
802808
&& rexCall.getOperands().get(0) instanceof RexInputRef;

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2153,6 +2153,29 @@ public void testTransposeExplain() throws IOException {
21532153
+ "| transpose 4 column_name='column_names'"));
21542154
}
21552155

2156+
/**
2157+
* With a user {@code where} clause preceding {@code dedup}, the physical plan must push both the
2158+
* filter and the dedup-as-aggregation into the OpenSearch scan, not fall back to an in-memory
2159+
* {@code ROW_NUMBER} window above a row-fetching scan.
2160+
*/
2161+
@Test
2162+
public void testDedupAfterWherePushDown() throws IOException {
2163+
enabledOnlyWhenPushdownIsEnabled();
2164+
String result =
2165+
explainQueryToString(
2166+
"source=opensearch-sql_test_index_account | where age > 25 | dedup gender");
2167+
assertTrue(
2168+
"Expected user where filter pushed down to the scan:\n" + result,
2169+
result.contains("FILTER->>($8, 25)"));
2170+
assertTrue(
2171+
"Expected dedup pushed down as AGGREGATION (composite + top_hits):\n" + result,
2172+
result.contains("AGGREGATION->"));
2173+
assertFalse(
2174+
"Unexpected EnumerableWindow — dedup fell back to the in-memory ROW_NUMBER form:\n"
2175+
+ result,
2176+
result.contains("EnumerableWindow"));
2177+
}
2178+
21562179
public void testComplexDedup() throws IOException {
21572180
enabledOnlyWhenPushdownIsEnabled();
21582181
String expected = loadExpectedPlan("explain_dedup_complex1.yaml");

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import org.apache.calcite.rel.RelCollations;
1818
import org.apache.calcite.rel.RelFieldCollation;
1919
import org.apache.calcite.rel.logical.LogicalAggregate;
20+
import org.apache.calcite.rel.logical.LogicalFilter;
2021
import org.apache.calcite.rel.logical.LogicalProject;
2122
import org.apache.calcite.rel.rules.SubstitutionRule;
2223
import org.apache.calcite.rex.RexInputRef;
@@ -47,6 +48,24 @@ protected DedupPushdownRule(Config config) {
4748
@Override
4849
protected void onMatchImpl(RelOptRuleCall call) {
4950
final LogicalDedup logicalDedup = call.rel(0);
51+
if (call.rels[1] instanceof LogicalFilter filter) {
52+
// Push the filter into the scan, then synthesize an identity project so the standard
53+
// apply() can run on the resulting Dedup → Project → Scan shape. If the filter is only
54+
// partially pushable, pushDownFilter returns a Filter wrapping the residual condition over
55+
// the new scan; we can't strip a residual filter without breaking semantics, so bail.
56+
final CalciteLogicalIndexScan scan = call.rel(2);
57+
if (!(scan.pushDownFilter(filter) instanceof CalciteLogicalIndexScan newScan)) {
58+
return;
59+
}
60+
RelBuilder relBuilder = call.builder();
61+
relBuilder.push(newScan);
62+
// force=true so the identity project is materialized (apply() requires a LogicalProject)
63+
relBuilder.project(
64+
relBuilder.fields(), newScan.getRowType().getFieldNames(), /* force= */ true);
65+
LogicalProject identityProject = (LogicalProject) relBuilder.build();
66+
apply(call, logicalDedup, identityProject, newScan);
67+
return;
68+
}
5069
final LogicalProject projectWithExpr = call.rel(1);
5170
final CalciteLogicalIndexScan scan = call.rel(2);
5271
apply(call, logicalDedup, projectWithExpr, scan);
@@ -226,6 +245,27 @@ public interface Config extends OpenSearchRuleConfig {
226245
.predicate(Config::tableScanChecker)
227246
.noInputs())));
228247

248+
// +- LogicalDedup
249+
// +- LogicalFilter (e.g. a `where` predicate preserved below dedup by
250+
// +- CalciteLogicalIndexScan PPLSimplifyDedupRule when an upstream `where` is folded
251+
// into the bucket-non-null filter)
252+
Config WITH_FILTER =
253+
ImmutableDedupPushdownRule.Config.builder()
254+
.build()
255+
.withDescription("Dedup-to-Aggregate-WithFilter")
256+
.withOperandSupplier(
257+
b0 ->
258+
b0.operand(LogicalDedup.class)
259+
.predicate(dedup -> !dedup.getKeepEmpty())
260+
.oneInput(
261+
b1 ->
262+
b1.operand(LogicalFilter.class)
263+
.oneInput(
264+
b2 ->
265+
b2.operand(CalciteLogicalIndexScan.class)
266+
.predicate(Config::tableScanChecker)
267+
.noInputs())));
268+
229269
/**
230270
* Project must be not pushed since the name of expression would lose after project pushed. E.g.
231271
* in query "eval new_a = a + 1 | dedup b", the "new_a" will lose.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ public class OpenSearchIndexRules {
5050
SortIndexScanRule.Config.DEFAULT.toRule();
5151
private static final DedupPushdownRule DEDUP_PUSH_DOWN =
5252
DedupPushdownRule.Config.DEFAULT.toRule();
53+
private static final DedupPushdownRule DEDUP_PUSH_DOWN_WITH_FILTER =
54+
DedupPushdownRule.Config.WITH_FILTER.toRule();
5355
private static final SortProjectExprTransposeRule SORT_PROJECT_EXPR_TRANSPOSE =
5456
SortProjectExprTransposeRule.Config.DEFAULT.toRule();
5557
private static final ExpandCollationOnProjectExprRule EXPAND_COLLATION_ON_PROJECT_EXPR =
@@ -75,6 +77,7 @@ public class OpenSearchIndexRules {
7577
LIMIT_INDEX_SCAN,
7678
SORT_INDEX_SCAN,
7779
DEDUP_PUSH_DOWN,
80+
DEDUP_PUSH_DOWN_WITH_FILTER,
7881
SORT_PROJECT_EXPR_TRANSPOSE,
7982
SORT_AGGREGATION_METRICS_RULE,
8083
RARE_TOP_PUSH_DOWN,

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,20 @@ public RelNode getRelNode(String ppl) {
110110
return root;
111111
}
112112

113+
/**
114+
* Get the root RelNode of the given PPL query without merging adjacent filters. Useful for
115+
* regression tests that need to exercise rule ordering against the un-merged shape that PPL
116+
* actually emits to the production HEP optimizer.
117+
*/
118+
public RelNode getRelNodeRaw(String ppl) {
119+
CalcitePlanContext context = createBuilderContext();
120+
Query query = (Query) plan(pplParser, ppl);
121+
planTransformer.analyze(query.getPlan(), context);
122+
RelNode root = context.relBuilder.build();
123+
System.out.println(root.explain());
124+
return root;
125+
}
126+
113127
private RelNode mergeAdjacentFilters(RelNode relNode) {
114128
HepProgram program =
115129
new HepProgramBuilder().addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule()).build();

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.java

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,15 @@
55

66
package org.opensearch.sql.ppl.calcite;
77

8+
import org.apache.calcite.plan.hep.HepPlanner;
9+
import org.apache.calcite.plan.hep.HepProgram;
10+
import org.apache.calcite.plan.hep.HepProgramBuilder;
811
import org.apache.calcite.rel.RelNode;
12+
import org.apache.calcite.rel.rules.FilterMergeRule;
913
import org.apache.calcite.test.CalciteAssert;
14+
import org.junit.Assert;
1015
import org.junit.Test;
16+
import org.opensearch.sql.calcite.plan.rule.PPLSimplifyDedupRule;
1117

1218
public class CalcitePPLDedupTest extends CalcitePPLAbstractTest {
1319

@@ -353,4 +359,129 @@ public void testSortFieldProjectedAwayBeforeDedup() {
353359
+ " LogicalTableScan(table=[[scott, EMP]])\n";
354360
verifyLogical(root, expectedLogical);
355361
}
362+
363+
/**
364+
* When a user {@code where} precedes {@code dedup}, the user filter sits adjacent to the
365+
* bucket-non-null filter that PPL emits for the dedup pattern. {@link PPLSimplifyDedupRule} must
366+
* fold the dedup pattern into a {@link org.opensearch.sql.calcite.plan.rel.LogicalDedup} so that
367+
* {@code DedupPushdownRule} can match it; otherwise dedup falls through to the in-memory {@code
368+
* ROW_NUMBER} window form, defeating dedup pushdown to the shard. The user predicate must be
369+
* preserved as a separate filter below the new {@code LogicalDedup}.
370+
*/
371+
@Test
372+
public void testDedupAfterWhereProducesLogicalDedup() {
373+
String ppl = "source=EMP | where SAL > 1000 | dedup DEPTNO";
374+
RelNode raw = getRelNodeRaw(ppl);
375+
376+
// Sanity: the un-merged plan has both the bucket-non-null filter and the user where filter
377+
// adjacent to each other above the scan — exactly the shape that triggered the original bug.
378+
String rawExplain = raw.explain();
379+
Assert.assertTrue(
380+
"Raw plan should contain the bucket-non-null filter:\n" + rawExplain,
381+
rawExplain.contains("IS NOT NULL"));
382+
Assert.assertTrue(
383+
"Raw plan should contain the user where filter:\n" + rawExplain,
384+
rawExplain.contains(">($5, 1000)"));
385+
Assert.assertTrue(
386+
"Raw plan should contain ROW_NUMBER prior to simplification:\n" + rawExplain,
387+
rawExplain.contains("ROW_NUMBER"));
388+
389+
// Apply rules in the order PPLSimplifyDedupRule -> FilterMergeRule.
390+
HepProgram program =
391+
new HepProgramBuilder()
392+
.addRuleInstance(PPLSimplifyDedupRule.DEDUP_SIMPLIFY_RULE)
393+
.addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule())
394+
.build();
395+
RelNode optimized = runHepPlanner(raw, program);
396+
397+
String optimizedExplain = optimized.explain();
398+
Assert.assertTrue(
399+
"Optimized plan should contain LogicalDedup so DedupPushdownRule can match it:\n"
400+
+ optimizedExplain,
401+
optimizedExplain.contains("LogicalDedup"));
402+
Assert.assertFalse(
403+
"Optimized plan should not retain ROW_NUMBER after simplification:\n" + optimizedExplain,
404+
optimizedExplain.contains("ROW_NUMBER"));
405+
Assert.assertTrue(
406+
"User where predicate must be preserved as a filter below LogicalDedup:\n"
407+
+ optimizedExplain,
408+
optimizedExplain.contains(">($5, 1000)"));
409+
}
410+
411+
/**
412+
* Companion to {@link #testDedupAfterWhereProducesLogicalDedup} that pins the user-visible
413+
* contract: with a {@code where} preceding {@code dedup}, a {@code LogicalDedup} must be produced
414+
* regardless of the order in which {@link FilterMergeRule} and {@link PPLSimplifyDedupRule} fire.
415+
*
416+
* <p>The simplify rule's bucket-non-null operand predicate is order-independent — it accepts both
417+
* a pure {@code IS NOT NULL} and an {@code AND} that contains an {@code IS NOT NULL} on a
418+
* partition key — so {@code FilterMergeRule} firing first no longer disables dedup pushdown. This
419+
* guards against re-introducing the original regression by reordering, removing, or adding rules
420+
* to {@code CalciteToolsHelper#HEP_PROGRAM}.
421+
*/
422+
@Test
423+
public void testDedupAfterWhereProducesLogicalDedupRegardlessOfRuleOrder() {
424+
String ppl = "source=EMP | where SAL > 1000 | dedup DEPTNO";
425+
RelNode raw = getRelNodeRaw(ppl);
426+
427+
HepProgram program =
428+
new HepProgramBuilder()
429+
.addRuleInstance(FilterMergeRule.Config.DEFAULT.toRule())
430+
.addRuleInstance(PPLSimplifyDedupRule.DEDUP_SIMPLIFY_RULE)
431+
.build();
432+
RelNode optimized = runHepPlanner(raw, program);
433+
434+
String optimizedExplain = optimized.explain();
435+
Assert.assertTrue(
436+
"Even with FilterMergeRule firing first, PPLSimplifyDedupRule must still produce"
437+
+ " LogicalDedup so DedupPushdownRule can match it:\n"
438+
+ optimizedExplain,
439+
optimizedExplain.contains("LogicalDedup"));
440+
Assert.assertFalse(
441+
"ROW_NUMBER window form should be removed after simplification:\n" + optimizedExplain,
442+
optimizedExplain.contains("ROW_NUMBER"));
443+
Assert.assertTrue(
444+
"User where predicate must be preserved as a filter below LogicalDedup, even when"
445+
+ " FilterMergeRule fired first and folded it into the bucket-non-null filter:\n"
446+
+ optimizedExplain,
447+
optimizedExplain.contains(">($5, 1000)"));
448+
}
449+
450+
/**
451+
* Mirrors the exact shape of {@code CalciteToolsHelper.HEP_PROGRAM} used in production ({@code
452+
* addRuleCollection(List.of(FilterMergeRule, PPLSimplifyDedupRule))}). The two sequenced-{@code
453+
* addRuleInstance} tests above prove the rule fires under either ordering, but production runs
454+
* both rules in a single collection instruction. This test pins that exact shape to catch any
455+
* addRuleCollection-vs-addRuleInstance traversal differences.
456+
*/
457+
@Test
458+
public void testDedupAfterWhereProducesLogicalDedupWithProductionHepProgram() {
459+
String ppl = "source=EMP | where SAL > 1000 | dedup DEPTNO";
460+
RelNode raw = getRelNodeRaw(ppl);
461+
462+
HepProgram program =
463+
new HepProgramBuilder()
464+
.addRuleCollection(
465+
java.util.List.of(
466+
FilterMergeRule.Config.DEFAULT.toRule(),
467+
PPLSimplifyDedupRule.DEDUP_SIMPLIFY_RULE))
468+
.build();
469+
RelNode optimized = runHepPlanner(raw, program);
470+
471+
String optimizedExplain = optimized.explain();
472+
Assert.assertTrue(
473+
"Production HEP_PROGRAM (addRuleCollection) must still produce LogicalDedup:\n"
474+
+ optimizedExplain,
475+
optimizedExplain.contains("LogicalDedup"));
476+
Assert.assertFalse(
477+
"ROW_NUMBER window form should be removed under production HEP_PROGRAM:\n"
478+
+ optimizedExplain,
479+
optimizedExplain.contains("ROW_NUMBER"));
480+
}
481+
482+
private static RelNode runHepPlanner(RelNode root, HepProgram program) {
483+
HepPlanner planner = new HepPlanner(program);
484+
planner.setRoot(root);
485+
return planner.findBestExp();
486+
}
356487
}

0 commit comments

Comments
 (0)