Commit 419b45a
authored
* [BugFix] Fix sort order not preserved through dedup in Calcite engine (#3922)
Signed-off-by: Songkan Tang <songkant@amazon.com>
* Update integration test expected outputs for sort-then-dedup plan changes
Update CalcitePPLDedupIT.testSortThenDedupKeepEmpty expected rows from 7 to 9
to match the corrected dedup behavior. Update all CalciteExplainIT expected
YAML/JSON output files to reflect the new logical/physical plan structure where
Sort is restored above dedup and ROW_NUMBER includes ORDER BY. Skip testExplain
in CalciteNoPushdownIT since pushdown-disabled produces different physical plans.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* refactor(dedup): propagate sort collation through LogicalDedup and push to top_hits
Add inputCollation field to Dedup/LogicalDedup so sort order from
upstream Sort nodes is preserved across rule transformations. The
collation flows through the full pipeline:
- visitDedupe strips the Sort and captures the remapped collation
- PPLSimplifyDedupRule extracts ORDER BY from ROW_NUMBER window
- PPLDedupConvertRule uses inputCollation for ORDER BY and restore Sort
- DedupPushdownRule passes sort info via hint to AggregateAnalyzer
- AggregateAnalyzer sets top_hits sort for correct intra-bucket ordering
Also fixes an edge case where sort field is projected away before dedup
(e.g. sort DEPTNO | fields ENAME, JOB | dedup 1 JOB) by remapping
collation field indices through intermediate Projects in stripInputSort.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* fix(dedup): correct sort field mapping and null ordering for dedup pushdown
Fixes CI failures on PR #5353 by addressing three issues introduced by the
sort-preserved-through-dedup work:
1. DedupPushdownRule passed the *reordered* targetChildProject's field names to
addDedupSortHintToAggregate, but the collation's field indices are relative to
the dedup's input row type (the original project). This caused the wrong field
to be pushed into top_hits' inner sort (e.g. sort on `age` instead of
`new_gender`). Pass `project.getRowType().getFieldNames()` instead.
2. The pushed-down top_hits sort inherited OpenSearch's default null ordering
(ASC -> NULLS LAST), which disagreed with PPL's ASC-NULLS-FIRST default. This
made dedup select a different row between pushdown-on and pushdown-off paths.
Apply `.missing("_first")` / `"_last"` on the top_hits sort in the LITERAL_AGG
(dedup) branch only, so other top_hits callers (FIRST/LAST/MIN/MAX/ARG_*)
keep their existing sort semantics.
3. testSortThenDedup's expected rows assumed NULLS LAST (expected `[Z, B]` for
name=B) but PPL default is NULLS FIRST, under which B's first occurrence is
the null-category row (#14). Correct the expected rows to match Calcite
semantics.
Regenerate the expected explain YAML/JSON outputs for the changed plans:
- EnumerableWindow now carries `order by [1 ASC-nulls-first]` for PR's
window-ORDER-BY change.
- top_hits inner sort includes the newly pushed `"sort":[{field:{order,missing}}]`
clause for dedup-pushdown tests.
- Extended-mode Janino comparator code now calls `compareNullsFirst` because
the window has an ORDER BY.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* feat(dedup): push all sort fields to top_hits and capture collation field names
Extend the dedup sort hint to carry every field collation instead of only the
first one, so a multi-field PPL `sort` preserves its full ordering through the
pushed-down `top_hits`:
- `PPLHintUtils.addDedupSortHintToAggregate` now encodes the full collation as
`field:ORDER|field:ORDER|...` in a single hint option; the getter returns a
`List<DedupSortKey>`. `AggregateAnalyzer` emits one sort entry per key,
preserving ASC/DESC and the Calcite-aligned NULL ordering.
- Collation field names are now captured on `LogicalDedup` itself at creation
time (against the row type that produced the collation), rather than being
resolved later in `DedupPushdownRule` via the dedup's current input row type.
This is necessary because planner rules can narrow the dedup's input between
`PPLSimplifyDedupRule` and `DedupPushdownRule`, making the index-based
`RelCollation` unsafe to resolve against the (then-narrower) input.
- New integration test `testMultiColumnSortThenDedup` verifies that
`sort state, age, account_number | dedup 1 gender` returns the exact rows
that the full three-field ordering dictates — impossible to achieve if only
the first sort field were pushed down.
- Update the four existing dedup-expr / dedup-with-expr / complex1 expected
plans to reflect the second-field sort entry now present in `top_hits`.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* refactor(dedup): address PR review — simplify collation handling
Two PR review comments:
1. core/.../CalciteRelNodeVisitor.java:735 — the private `backtrackForCollation`
method was a one-line wrapper around `PlanUtils.findInputCollation`. Remove
the wrapper and call `PlanUtils.findInputCollation` directly; make the util
public since it is now used from another package.
2. core/.../PlanUtils.java:660 — `remapCollationThroughProjects` hand-rolled a
project-to-top index remapping that was fragile (it only understood Logical
Projects with simple `RexInputRef`s). Replace it with Calcite's metadata
framework: `RelMetadataQuery.collations(input)` returns the subtree's output
collation with project remapping handled by `RelMdCollation`, which knows
about more operators than our manual walk did.
No behavior change expected — every existing CalciteExplainIT / CalcitePPLDedupIT
/ CalcitePPLSortIT / CalciteSortCommandIT / CalciteReverseCommandIT integration
test still passes, as do the core/opensearch/ppl unit tests.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* refactor(dedup): permute collation to scan schema instead of name workaround
Replace the name-based `inputCollationFieldNames` captured on `LogicalDedup`
with Calcite's standard index-based collation propagation. The name approach
would break under any downstream rename (`RelCollation` itself stores only
indices, never names — a rename changes the name but keeps the index).
`DedupPushdownRule` now permutes `dedup.inputCollation` into scan-schema
indices via `RexInputRef`s of the immediate child `LogicalProject` — mirroring
what `Project.getMapping` + `RelCollations.permute` do for Calcite's own
trait-propagation paths (cf. `EnumerableMergeJoin.passThroughTraits`).
Two cases are handled:
- Collation still addresses the child project's output (no rule inserted a
narrower project in between): permute each key through the project's
projection list. A non-`RexInputRef` projection (computed column, e.g.
`lower(gender)`) cannot be expressed as an OS field sort, so the hint is
dropped — Calcite's outer sort still restores order.
- Collation indices are out of range for the child project's output but
within the scan's schema (a narrower project got pushed below dedup after
creation): indices are already scan-schema indices; use as-is.
Drop `Dedup.inputCollationFieldNames` and the 7-arg `copy` signature — the
name capture was a workaround that couldn't survive renames and is no longer
needed.
Update the two `explain_dedup_expr_complex1*` YAMLs: in that test the sort
keys are computed columns (`lower(gender)` / `lower(state)`), which we now
correctly refuse to push as `top_hits` sort entries. The outer Calcite sort
still runs and the test's final row order is unaffected.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* fix(dedup): resolve stale inputCollation after planner narrows dedup's input
CI revealed that after `PPLSimplifyDedupRule` captures `inputCollation` against
the dedup's original input row type, a later Calcite rule (typically scan
absorbing a narrowing project) can swap in a different input row type without
going through `Dedup.copy()` — so the collation's indices become stale.
Symptom: `IndexOutOfBoundsException` in `PPLDedupConvertRule.collationToOrderKeys`
(`field ordinal [7] out of range; input fields are: [...]`) in NoPushdown mode,
and silently-wrong top_hits sort in pushdown mode.
Fix:
- Reintroduce `Dedup.inputCollationFieldNames` captured at `LogicalDedup.create`
time, strictly as a name-based *fallback anchor* for the "replacement is not a
Project" case (scans don't rename, so names are stable there; rename-through-
Project scenarios are already handled by Calcite's own index propagation).
- `PPLDedupConvertRule.onMatch` now resolves the collation to the current
input's schema: if indices are still valid, use as-is; otherwise look each
sort key up by original name in the current row type.
- `DedupPushdownRule` uses the same two-case resolution (Project permute →
name-based fallback) to produce scan-schema indices for the top_hits sort
hint, dropping the hint cleanly if any key still can't be resolved.
Verified on the integ-test worktree: full `CalciteExplainIT`,
`CalcitePPLDedupIT`, and the entire `CalciteNoPushdownIT` suite all pass, as
do core/opensearch/ppl unit tests.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* test(dedup): extend #3922 fixture to actually reproduce the bug
The original 5-row fixture happened to pass on pre-fix because the
single-partition EnumerableWindow path preserved input order by
accident. Adding 5 more rows across a wider category range forces
the collation to get shuffled pre-fix (verified locally: output
reorders to [X,X,Z,A,B,C,D]), so the expected-datarows match now
fails without the fix.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* refactor(dedup): address PR review comments
- PPLDedupConvertRule / DedupPushdownRule: replace fully-qualified
org.apache.calcite.rel.* references with imports.
- PPLHintUtils: import java.util.Objects; throw IllegalStateException on
out-of-range collation index instead of silently skipping — the index
is resolved against scan schema upstream, so mismatch is a bug signal.
- CalciteExplainIT: drop the testExplain override that forced
pushdown-only. Update the corresponding no-pushdown expected plan
(explain_output.yaml under calcite_no_pushdown/) to reflect the
post-dedup Sort + ROW_NUMBER ORDER BY introduced by this PR.
Signed-off-by: Songkan Tang <songkant@amazon.com>
---------
Signed-off-by: Songkan Tang <songkant@amazon.com>
1 parent fc2dd11 commit 419b45a
28 files changed
Lines changed: 948 additions & 252 deletions
File tree
- core/src/main/java/org/opensearch/sql/calcite
- plan
- rel
- rule
- utils
- integ-test/src
- test
- java/org/opensearch/sql/calcite/remote
- resources/expectedOutput
- calcite_no_pushdown
- calcite
- yamlRestTest/resources/rest-api-spec/test/issues
- opensearch/src/main/java/org/opensearch/sql/opensearch
- planner/rules
- request
- ppl/src/test/java/org/opensearch/sql/ppl/calcite
Lines changed: 7 additions & 60 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
51 | | - | |
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | 57 | | |
59 | | - | |
60 | | - | |
61 | 58 | | |
62 | 59 | | |
63 | 60 | | |
| |||
766 | 763 | | |
767 | 764 | | |
768 | 765 | | |
769 | | - | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | | - | |
775 | | - | |
776 | | - | |
777 | | - | |
778 | | - | |
779 | | - | |
780 | | - | |
781 | | - | |
782 | | - | |
783 | | - | |
784 | | - | |
785 | | - | |
786 | | - | |
787 | | - | |
788 | | - | |
789 | | - | |
790 | | - | |
791 | | - | |
792 | | - | |
793 | | - | |
794 | | - | |
795 | | - | |
796 | | - | |
797 | | - | |
798 | | - | |
799 | | - | |
800 | | - | |
801 | | - | |
802 | | - | |
803 | | - | |
804 | | - | |
805 | | - | |
806 | | - | |
807 | | - | |
808 | | - | |
809 | | - | |
810 | | - | |
811 | | - | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
816 | | - | |
817 | | - | |
818 | | - | |
819 | | - | |
820 | 766 | | |
821 | 767 | | |
822 | 768 | | |
| |||
899 | 845 | | |
900 | 846 | | |
901 | 847 | | |
902 | | - | |
| 848 | + | |
903 | 849 | | |
904 | 850 | | |
905 | 851 | | |
| |||
1765 | 1711 | | |
1766 | 1712 | | |
1767 | 1713 | | |
1768 | | - | |
| 1714 | + | |
1769 | 1715 | | |
1770 | 1716 | | |
1771 | 1717 | | |
| |||
1823 | 1769 | | |
1824 | 1770 | | |
1825 | 1771 | | |
1826 | | - | |
| 1772 | + | |
1827 | 1773 | | |
1828 | 1774 | | |
1829 | 1775 | | |
| |||
1999 | 1945 | | |
2000 | 1946 | | |
2001 | 1947 | | |
| 1948 | + | |
2002 | 1949 | | |
2003 | | - | |
| 1950 | + | |
2004 | 1951 | | |
2005 | | - | |
| 1952 | + | |
2006 | 1953 | | |
2007 | 1954 | | |
2008 | 1955 | | |
| |||
Lines changed: 34 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| |||
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
26 | 44 | | |
27 | | - | |
28 | 45 | | |
29 | 46 | | |
30 | 47 | | |
31 | 48 | | |
32 | 49 | | |
33 | 50 | | |
34 | 51 | | |
35 | | - | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
36 | 55 | | |
37 | 56 | | |
38 | 57 | | |
| |||
44 | 63 | | |
45 | 64 | | |
46 | 65 | | |
| 66 | + | |
| 67 | + | |
47 | 68 | | |
48 | 69 | | |
49 | 70 | | |
| |||
54 | 75 | | |
55 | 76 | | |
56 | 77 | | |
57 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
58 | 81 | | |
59 | 82 | | |
60 | 83 | | |
| |||
63 | 86 | | |
64 | 87 | | |
65 | 88 | | |
66 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
67 | 92 | | |
68 | 93 | | |
69 | 94 | | |
| |||
72 | 97 | | |
73 | 98 | | |
74 | 99 | | |
75 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
76 | 103 | | |
77 | 104 | | |
78 | 105 | | |
| |||
81 | 108 | | |
82 | 109 | | |
83 | 110 | | |
84 | | - | |
| 111 | + | |
| 112 | + | |
85 | 113 | | |
86 | 114 | | |
87 | 115 | | |
| |||
Lines changed: 51 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
| |||
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | | - | |
28 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
29 | 42 | | |
30 | 43 | | |
31 | 44 | | |
| |||
35 | 48 | | |
36 | 49 | | |
37 | 50 | | |
38 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
39 | 54 | | |
40 | 55 | | |
41 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
42 | 65 | | |
43 | 66 | | |
44 | 67 | | |
| |||
47 | 70 | | |
48 | 71 | | |
49 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
50 | 88 | | |
51 | 89 | | |
52 | 90 | | |
53 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
54 | 100 | | |
55 | 101 | | |
56 | 102 | | |
| |||
0 commit comments