Commit 4c04a4d
authored
Fix transpose VARCHAR length drift in unpivot/pivot lowering (#5479)
* Fix transpose VARCHAR length drift in unpivot/pivot lowering
PPL `transpose` lowers via `RelBuilder.unpivot()` + `pivot()`. The unpivot
synthesizes a VALUES leaf carrying axis literals (the input field names),
e.g. `VALUES('firstname'), ('age'), ('balance')`. Calcite types each
RexLiteral as `CHAR(literalLen)` and types the VALUES column as
`CHAR(maxAxisLiteralLen)` — the longest literal wins on column-level
type inference.
This bites the analytics-engine route end-to-end:
1. After unpivot the `column` axis column is `CHAR(9)` (from "firstname").
Through Calcite's TRIM (`TO_VARYING`) it becomes `VARCHAR(9)`.
2. The `_value_transpose_` value column is built from
`CAST(input_field AS VARCHAR)` — unbounded VARCHAR.
3. `MAX(_value_transpose_)` aggCall is created with declared return type
= unbounded VARCHAR (inferred from arg 0 at call-construction time).
4. The downstream non-prefix groupSet aggregate
(`group=[{1}], MAX($0)`) splits into PARTIAL/FINAL on the analytics
path. PARTIAL hoists group keys to the output prefix, so FINAL's
`argList=[0]` reads the group-key slot — `VARCHAR(9)` — instead of
the agg-state slot. Calcite's `Aggregate.<init>` then runs
`typeMatchesInferred` and rejects the plan: declared `VARCHAR` ≠
inferred `VARCHAR(9)`.
5. Even when the aggregate validation passes, the substrait/Arrow path
sees `FixedChar(maxAxisLiteralLen)` schema vs runtime arrays whose
actual values are shorter (e.g. "age" with length 3) and trips
`Row field type (FixedChar{length=3}) does not match schema field
type (FixedChar{length=9})`.
Two fixes, both in the lowering site:
* Build every axis literal at the same `CHAR(maxAxisLiteralLen)` type.
Calcite then space-pads the shorter literals at value-construction
time, so the runtime CHAR vector and the declared schema both have
the same fixed length. The downstream TRIM strips the padding.
* Wrap the trimmed-axis group key in an explicit
`CAST(... AS VARCHAR)` to unbounded VARCHAR. This makes the group
key type match `_value_transpose_`'s unbounded VARCHAR end-to-end,
so the aggregate's row-type check sees consistent types regardless
of which side the analytics-engine split rule places the group key
on.
These have to live in sql plugin, not in the analytics-engine planner:
the typing decisions are made by Calcite's `RelBuilder.unpivot()`
implementation when it constructs the VALUES leaf — long before any
analytics-engine rule sees the plan. By the time the plan reaches the
analytics-engine route, the precision drift is already baked into the
RelDataType chain. Fixing it downstream would require pattern-matching
on transpose-shaped sub-trees inside the planner, which is fragile and
mis-attributes the root cause. The lowering author owns the type
contract for the operators it emits.
Adds:
- `testTransposeColumnAxisUsesUnboundedVarchar` regression assertion
pinning the output `column` field's type to unbounded VARCHAR. Catches
any future change that re-introduces axis-literal precision into the
group key.
- Updated plan-shape assertions across the existing transpose tests to
reflect the padded axis literals (`'cnt '`, `'COMM '`, etc.) and
the `CAST(TRIM(...) AS VARCHAR)` group key.
Verified end-to-end: `CalciteTransposeCommandIT` 5/5 pass with
`tests.analytics.parquet_indices=true`.
Signed-off-by: Songkan Tang <songkant@amazon.com>
* Update explain_transpose.yaml expected plan for new lowering shape
CalciteExplainIT.testTransposeExplain regenerated against the updated
transpose lowering: axis literals are now padded to a uniform CHAR(N)
(N = max axis literal length, i.e. 14 for 'account_number'), and the
group-key TRIM output is wrapped in a CAST(... AS VARCHAR) to unbounded
VARCHAR. The plan-shape diff exactly mirrors the documented behavior
change in the parent commit:
* `$f20=[TRIM(...)]` → `$f20=[CAST(TRIM(...)):VARCHAR NOT NULL]`
* axis literals e.g. 'firstname' → 'firstname ' (padded)
* LogicalValues row type tuples are correspondingly padded
Verified locally: `./gradlew :integ-test:integTestRemote --tests
"*CalciteExplainIT.testTransposeExplain"` passes.
Signed-off-by: Songkan Tang <songkant@amazon.com>
---------
Signed-off-by: Songkan Tang <songkant@amazon.com>1 parent 050baee commit 4c04a4d
3 files changed
Lines changed: 85 additions & 72 deletions
File tree
- core/src/main/java/org/opensearch/sql/calcite
- integ-test/src/test/resources/expectedOutput/calcite
- ppl/src/test/java/org/opensearch/sql/ppl/calcite
Lines changed: 13 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
915 | 915 | | |
916 | 916 | | |
917 | 917 | | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
918 | 921 | | |
919 | 922 | | |
920 | 923 | | |
| |||
932 | 935 | | |
933 | 936 | | |
934 | 937 | | |
935 | | - | |
| 938 | + | |
| 939 | + | |
936 | 940 | | |
937 | 941 | | |
938 | 942 | | |
939 | 943 | | |
940 | 944 | | |
941 | 945 | | |
942 | | - | |
943 | | - | |
944 | | - | |
945 | | - | |
946 | | - | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
947 | 954 | | |
948 | 955 | | |
949 | 956 | | |
| |||
Lines changed: 7 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
| 13 | + | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
0 commit comments