Recover ANY-typed column schema on the analytics route (fixes eval max/min)#5557
Merged
ahkcs merged 2 commits intoJun 17, 2026
Merged
Conversation
cdedfca to
4362926
Compare
…oute
The analytics-engine response schema was built purely from the planned
RelDataType, so a column whose planned type is ANY (e.g. the SCALAR_MAX /
SCALAR_MIN UDFs behind PPL eval max()/min(), which declare ANY because they
accept mixed numeric/string operands) was reported as 'undefined'. The
v2/Calcite path already rescues this in OpenSearchExecutionEngine.buildResultSet
by reading the actual runtime ExprValue type.
Mirror that on the analytics path: when a column's converted type is UNDEFINED
and there is a result row, take the concrete type from the first row's value.
The rows are already converted before the schema is built, so no extra work is
needed. Behavior is unchanged on the v2 path.
This fixes eval max()/min() over homogeneous operands on the analytics route
(e.g. max('apple','sam',dog_name) now reports 'string' instead of 'undefined').
Signed-off-by: Kai Huang <ahkcs@amazon.com>
With the ANY-rescue fix, five of CalcitePPLEvalMaxMinFunctionIT's eight tests
now pass on the analytics route. Skip the three that still diverge via the
assumeNotAnalytics(...) registry plus matching excludeTestsMatching entries:
- EVAL_MAX_MIN_MIXED_TYPES: mixed numeric+string operands throw 'Cannot infer
return type for GREATEST' (the DataFusion GREATEST/LEAST can't unify
heterogeneous operand types) — testEvalMax/MinNumericAndString.
- EVAL_MAX_MIN_INT_WIDENING: max() over int operands reports bigint on the
route (DataFusion widens integers to Int64) where v2/Calcite reports int —
testEvalMaxNumeric.
Results (-Dtests.analytics.parquet_indices=true against the analytics route):
CalcitePPLEvalMaxMinFunctionIT: 0/8 -> 5/8 pass, 3 excluded, 0 fail
v2/Calcite route unchanged: 8/8 pass.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
4362926 to
fd5736d
Compare
Swiddis
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brings
CalcitePPLEvalMaxMinFunctionITto parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true) by fixing a real schema-type defect on the analytics path, then skipping the two genuinely-unsupported residual cases. The IT passes on the v2/Calcite route; the engine fix is shared, the skips are analytics-route-only.The defect (commit 1)
PPL
eval max(...)/min(...)lower to theSCALAR_MAX/SCALAR_MINUDFs, which declare their return type asANY(they accept mixed numeric/string operands, with OpenSearch's "strings sort above numbers" semantics).ANYmaps toUNDEFINED→ the schema reports the result column as"undefined".The v2/Calcite path already handles this:
OpenSearchExecutionEngine.buildResultSetseesANYand recovers the concrete type from the first row's runtimeExprValue. The analytics path (AnalyticsExecutionEngine.buildSchema) built the schema purely from the plannedRelDataType, with no such recovery — so the column came backundefinedeven though the values were correct.Fix: mirror the Calcite-path behavior in
AnalyticsExecutionEngine.buildSchema— when a column's converted type isUNDEFINEDand a result row exists, take the concrete type from the first row's value. The rows are already converted before the schema is built, so there's no extra work. Behavior on the v2 path is unchanged. This is generic — it fixes anyANY-typed UDF column on the analytics route, not just max/min. Covered by two newAnalyticsExecutionEngineTestcases (recovery from first row; staysUNDEFINEDwhen empty, matching the Calcite fallback).Residual divergences (commit 2)
With the fix, 5 of 8 tests pass on the analytics route. The remaining 3 are genuine route behaviors, skipped via
assumeNotAnalytics(...)+excludeTestsMatching:EVAL_MAX_MIN_MIXED_TYPES(2 tests) — mixed numeric+string operands (max(14, age, 'Fred', holdersName)) throwSqlValidatorException: Cannot infer return type for GREATEST. The UDF is converted to a DataFusionGREATEST/LEASTduring Substrait conversion (in OpenSearch core), which can't unify heterogeneous operand types. Out of scope for this repo.EVAL_MAX_MIN_INT_WIDENING(1 test) —max(1, 3, age)reportsbiginton the route (DataFusion widens integer results to Int64) where v2/Calcite reportsint. Values match; only the integer width differs.Results
Run via
:integ-test:integTestRemoteagainst an analytics:runcluster.CalcitePPLEvalMaxMinFunctionITThe 8 failures before split into 6
undefined-schema (now fixed by the engine change) and 2 mixed-type 500s. After the fix, 5 pass; the 3 skips are 2 mixed-type + 1 int-widening. The v2 route runs all 8 (the engine fix doesn't change v2 types;assumeNotAnalyticsis a no-op off-route; gradle excludes apply only tointegTestRemote).Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.