Skip to content

Recover ANY-typed column schema on the analytics route (fixes eval max/min)#5557

Merged
ahkcs merged 2 commits into
opensearch-project:mainfrom
ahkcs:fix-evalmaxmin-analytics-parity
Jun 17, 2026
Merged

Recover ANY-typed column schema on the analytics route (fixes eval max/min)#5557
ahkcs merged 2 commits into
opensearch-project:mainfrom
ahkcs:fix-evalmaxmin-analytics-parity

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Brings CalcitePPLEvalMaxMinFunctionIT to parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true) by fixing a real schema-type defect on the analytics path, then skipping the two genuinely-unsupported residual cases. The IT passes on the v2/Calcite route; the engine fix is shared, the skips are analytics-route-only.

The defect (commit 1)

PPL eval max(...) / min(...) lower to the SCALAR_MAX / SCALAR_MIN UDFs, which declare their return type as ANY (they accept mixed numeric/string operands, with OpenSearch's "strings sort above numbers" semantics). ANY maps to UNDEFINED → the schema reports the result column as "undefined".

The v2/Calcite path already handles this: OpenSearchExecutionEngine.buildResultSet sees ANY and recovers the concrete type from the first row's runtime ExprValue. The analytics path (AnalyticsExecutionEngine.buildSchema) built the schema purely from the planned RelDataType, with no such recovery — so the column came back undefined even though the values were correct.

Fix: mirror the Calcite-path behavior in AnalyticsExecutionEngine.buildSchema — when a column's converted type is UNDEFINED and a result row exists, take the concrete type from the first row's value. The rows are already converted before the schema is built, so there's no extra work. Behavior on the v2 path is unchanged. This is generic — it fixes any ANY-typed UDF column on the analytics route, not just max/min. Covered by two new AnalyticsExecutionEngineTest cases (recovery from first row; stays UNDEFINED when empty, matching the Calcite fallback).

Residual divergences (commit 2)

With the fix, 5 of 8 tests pass on the analytics route. The remaining 3 are genuine route behaviors, skipped via assumeNotAnalytics(...) + excludeTestsMatching:

  • EVAL_MAX_MIN_MIXED_TYPES (2 tests) — mixed numeric+string operands (max(14, age, 'Fred', holdersName)) throw SqlValidatorException: Cannot infer return type for GREATEST. The UDF is converted to a DataFusion GREATEST/LEAST during Substrait conversion (in OpenSearch core), which can't unify heterogeneous operand types. Out of scope for this repo.
  • EVAL_MAX_MIN_INT_WIDENING (1 test) — max(1, 3, age) reports bigint on the route (DataFusion widens integer results to Int64) where v2/Calcite reports int. Values match; only the integer width differs.

Results

Run via :integ-test:integTestRemote against an analytics :run cluster.

Test class Analytics route — before Analytics route — after v2/Calcite route
CalcitePPLEvalMaxMinFunctionIT 0/8 pass, 8 fail 5/8 pass, 3 excluded, 0 fail 8/8 pass

The 8 failures before split into 6 undefined-schema (now fixed by the engine change) and 2 mixed-type 500s. After the fix, 5 pass; the 3 skips are 2 mixed-type + 1 int-widening. The v2 route runs all 8 (the engine fix doesn't change v2 types; assumeNotAnalytics is a no-op off-route; gradle excludes apply only to integTestRemote).

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ahkcs ahkcs added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 16, 2026
@ahkcs ahkcs force-pushed the fix-evalmaxmin-analytics-parity branch from cdedfca to 4362926 Compare June 16, 2026 22:04
@ahkcs ahkcs changed the title Skip CalcitePPLEvalMaxMinFunctionIT on the analytics-engine route Recover ANY-typed column schema on the analytics route (fixes eval max/min) Jun 16, 2026
ahkcs added 2 commits June 16, 2026 16:55
…oute

The analytics-engine response schema was built purely from the planned
RelDataType, so a column whose planned type is ANY (e.g. the SCALAR_MAX /
SCALAR_MIN UDFs behind PPL eval max()/min(), which declare ANY because they
accept mixed numeric/string operands) was reported as 'undefined'. The
v2/Calcite path already rescues this in OpenSearchExecutionEngine.buildResultSet
by reading the actual runtime ExprValue type.

Mirror that on the analytics path: when a column's converted type is UNDEFINED
and there is a result row, take the concrete type from the first row's value.
The rows are already converted before the schema is built, so no extra work is
needed. Behavior is unchanged on the v2 path.

This fixes eval max()/min() over homogeneous operands on the analytics route
(e.g. max('apple','sam',dog_name) now reports 'string' instead of 'undefined').

Signed-off-by: Kai Huang <ahkcs@amazon.com>
With the ANY-rescue fix, five of CalcitePPLEvalMaxMinFunctionIT's eight tests
now pass on the analytics route. Skip the three that still diverge via the
assumeNotAnalytics(...) registry plus matching excludeTestsMatching entries:
  - EVAL_MAX_MIN_MIXED_TYPES: mixed numeric+string operands throw 'Cannot infer
    return type for GREATEST' (the DataFusion GREATEST/LEAST can't unify
    heterogeneous operand types) — testEvalMax/MinNumericAndString.
  - EVAL_MAX_MIN_INT_WIDENING: max() over int operands reports bigint on the
    route (DataFusion widens integers to Int64) where v2/Calcite reports int —
    testEvalMaxNumeric.

Results (-Dtests.analytics.parquet_indices=true against the analytics route):
  CalcitePPLEvalMaxMinFunctionIT: 0/8 -> 5/8 pass, 3 excluded, 0 fail
v2/Calcite route unchanged: 8/8 pass.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix-evalmaxmin-analytics-parity branch from 4362926 to fd5736d Compare June 16, 2026 23:58
@ahkcs ahkcs merged commit 9ed9b1a into opensearch-project:main Jun 17, 2026
28 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants