Skip to content

Bring CalcitePPLEnhancedCoalesceIT to parity on the analytics-engine route#5552

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/coalesce-it-parity
Jun 16, 2026
Merged

Bring CalcitePPLEnhancedCoalesceIT to parity on the analytics-engine route#5552
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/coalesce-it-parity

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Description

Brings CalcitePPLEnhancedCoalesceIT to parity on the analytics-engine route (:integ-test:integTestRemote -Dtests.analytics.parquet_indices=true): 4 failing → 0. coalesce itself computes correctly on the route (it's wired via STANDARD_PROJECT_OPS); every failure was a row-ordering or capability edge case, not a coalesce bug.

Where a sort makes the test deterministic, the test is fixed in place — not skipped. Only the cases a sort can't make correct on both routes are skipped, behind assumeNotAnalytics(...) + the gradle exclude list (the AnalyticsRouteLimitation registry / pattern from #5546).

Fixed in place with an explicit sort (no skip)

These used head N without a sort. On the analytics route the scan surfaces the raw-PUT docs (added in init()) ahead of the bulk-loaded docs, so head returned a different row set. Adding sort - age makes them deterministic with their original expectations unchanged, because the sort key is unique over the head window (or the tie projects identical values):

Test Fix
testCoalesceNested (head 2) sort - age — top-2 ages (70, 30) unique
testCoalesceEmptyFieldWithFallback (head 1) sort - age — max age (70) unique
testCoalesceWithMultipleNonExistentFields (head 1) sort - age — max age (70) unique
testCoalesceWithNullLiteralAndIntegerField (head 3) sort - age — the age=25 tie projects identical [25, 25], so the expectation holds regardless of tie order

Skipped on the analytics route (a sort can't fix these)

Test(s) Reason
testCoalesceBasic, testCoalesceWithMixedTypes (head 3) The third row needs a nullable tiebreak: at the age=25 tie, the real doc (John) vs. a raw-PUT null-name doc. Null placement under sort diverges between the v2/Calcite and analytics routes, so no sort makes the expectation hold on both.
testCoalesceWithAllNonExistentFields coalesce(field1, field2, field3) where every operand is untyped NULL is rejected by the analytics-engine capability registry: No backend supports scalar function [COALESCE] among [datafusion]. The v2/Calcite path returns an undefined-typed null instead (#5175). Genuine engine-side gap.

Pass rate

Route Before After
analytics-engine (:integTestRemote -Dtests.analytics.parquet_indices=true) 4 failing / 19 0 failing — 16 run, 3 excluded (stable across 3 reruns)
v2 (:integ-test:integTest, fresh cluster) 19 pass 19 pass, 0 skip

Note for reviewers

The skips are genuine route divergences (recorded as AnalyticsRouteLimitation constants): the COALESCE-over-all-null-operands one is an engine capability gap, and the head-without-stable-sort + nullable-tiebreak one is a route ordering difference that a test-side sort can't reconcile. The four sort fixes are pure determinism additions — expectations and the v2 path are unchanged.

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Swiddis Swiddis added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 15, 2026
…route

Four ordering-sensitive tests failed on the analytics route only because they
used 'head N' without a sort, and the analytics scan surfaces raw-PUT docs
ahead of bulk-loaded docs (non-deterministic row set). Fixed in place by adding
an explicit 'sort - age' where the sort key is unique over the head window (or
the tie projects identical values), so the expectations are unchanged and hold
on both routes:
  - testCoalesceNested
  - testCoalesceEmptyFieldWithFallback
  - testCoalesceWithMultipleNonExistentFields
  - testCoalesceWithNullLiteralAndIntegerField

Three tests are skipped on the analytics route (assumeNotAnalytics + gradle
exclude list, recorded as AnalyticsRouteLimitation constants):
  - testCoalesceBasic, testCoalesceWithMixedTypes: 'head 3' whose third row
    needs a nullable tiebreak (age=25 tie between a real doc and a raw-PUT
    null-name doc); null placement diverges between the v2/Calcite and
    analytics routes, so a sort can't make the expectation hold on both.
  - testCoalesceWithAllNonExistentFields: COALESCE over all-untyped-null
    operands is rejected by the analytics-engine capability registry
    (No backend supports scalar function [COALESCE]); the v2 path returns an
    undefined-typed null (opensearch-project#5175). Genuine engine gap.

The v2/Calcite path is unchanged: all 19 tests run and pass there.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the analytics/coalesce-it-parity branch from 36f14ca to f19eeda Compare June 15, 2026 22:45
@ahkcs ahkcs merged commit 7099773 into opensearch-project:main Jun 16, 2026
28 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants