Skip to content

Bring CalciteBinCommandIT and CalciteMultisearchCommandIT to parity on the analytics-engine route#5551

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/bin-multisearch-parity-main
Jun 15, 2026
Merged

Bring CalciteBinCommandIT and CalciteMultisearchCommandIT to parity on the analytics-engine route#5551
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:analytics/bin-multisearch-parity-main

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Description

Brings CalciteBinCommandIT and CalciteMultisearchCommandIT to parity on the analytics-engine route (:integ-test:integTestRemote -Dtests.analytics.parquet_indices=true): 8 failing → 0, by skipping the tests that exercise behavior the route inherently diverges on, behind assumeNotAnalytics(...) plus the gradle exclude list (same pattern as #5546). All skips are no-ops off the analytics route, so the v2 / Calcite path is unchanged.

Each skip reason is recorded as an AnalyticsRouteLimitation constant (the single greppable registry introduced in #5546), and the excluded tests are listed in the analyticsEnabled block of integ-test/build.gradle so the AE-route skip set stays countable in one place.

Pass rate

IT Route Before After
CalciteBinCommandIT analytics-engine 4 failing / 46 run 0 failing — 42 pass, 4 documented skips
CalciteBinCommandIT v2 (:integ-test:integTest, fresh cluster) 46 pass 46 pass, 0 skip
CalciteMultisearchCommandIT analytics-engine 4 failing / 21 0 failing — 17 pass, 4 documented skips
CalciteMultisearchCommandIT v2 (:integ-test:integTest, fresh cluster) 21 pass 21 pass, 0 skip

(Bin "run" counts exclude the 24 pre-existing enabledOnlyWhenPushdownIsEnabled/pushdown skips that are unrelated to this change.)

Documented skips (inherent analytics-route divergences)

Test(s) Reason
CalciteBinCommandIT: testStatsWithBinsOnTimeField_Count, _Avg, testStatsWithBinsOnTimeAndTermField_Count, _Avg bin <timefield> bins=N | stats … by <timefield> diverges on the analytics route: the date-histogram bucket column comes back typed string rather than timestamp, and the route produces a different bucket set (different auto-histogram span / empty buckets not filtered), so both the schema type and the row counts differ. A plainly-projected @timestamp (not a histogram bucket) is unaffected on both routes.
CalciteMultisearchCommandIT: testMultisearchWithThreeSubsearches, testMultisearchWithComplexAggregation, testMultisearchWithoutFurtherProcessing Same-index subsearch conflation: when every multisearch subsearch reads the same index, the analytics route applies the first subsearch's filter to all of them (each keeps its own eval label), so later subsearches silently return the first subsearch's rows → wrong counts. E.g. the three-subsearch case returns [22,California],[22,Illinois],[22,Tennessee] (all get IL's count) instead of [17,CA],[22,IL],[25,TN].
CalciteMultisearchCommandIT: testMultisearchWithTimestampInterleaving Column-order divergence: a multisearch over heterogeneous indices returns the merged columns in a different order than the v2/Calcite path (trailing value/timestamp swapped). Values are correct; only the column order differs.

Why skip instead of fix these in this PR?

These are skipped (not fixed) here for three reasons:

  1. The defects are engine-side, not SQL-plugin. For every one of these tests the SQL-plugin lowering is correct — the Calcite RelNode reaching the backend is identical regardless of which engine executes it (verified via /_plugins/_ppl/_explain). The divergence is entirely in the analytics-engine execution path (opensearch-project/OpenSearch, analytics-engine sandbox), so the fix belongs in that repo, not this one. This PR is SQL-side test parity.

  2. They are not one-line capability adds — they're scheduled engine work.

    • bin bins=N on a time field: the v2 path leans on OpenSearch's native auto_date_histogram (calendar-nice intervals, timestamp-typed output). The analytics route has no equivalent rewrite, so it runs the generic WIDTH_BUCKET UDF literally (raw equal-width buckets, string-typed label). Reaching parity needs a planner rewrite + a calendar-aware bucketing UDF — a feature, not a flag.
    • multisearch same-index conflation: root-caused to a delegated-predicate bug (it only conflates when arms filter with predicates that delegate to the Lucene backend, e.g. keyword equality; native range/int-equality predicates are correct). The shard-fragment execution model assumes one scan per fragment, so a same-table multi-arm union collapses all arms' delegated predicates onto a single scan. The fix is a deep, multi-leaf-fragment rework (or gating the co-location fast path) in the engine
  3. The most impactful one is already tracked upstream. The same-index multisearch conflation is already captured by an @AwaitsFix on the QA-side MultisearchCommandIT.testMultisearchThreeBranchesByStr0 in OpenSearch core, so the engine team already owns it. Landing the test skips here keeps the SQL-side suite green and consistent with that tracker rather than duplicating/forking the fix.

The same-index multisearch conflation in particular is silent wrong results (no error, just incorrect counts) and is the highest-priority follow-up of the three.

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…n the analytics-engine route

Skip the analytics-route divergences in both ITs behind assumeNotAnalytics(...)
and the gradle exclude list, mirroring the CalciteWhereCommandIT pattern.

CalciteBinCommandIT (4 tests): bin on a time field then grouping by it
(bin <timefield> bins=N | stats ... by <timefield>) diverges — the
date-histogram bucket column is typed string (not timestamp) and the route
produces a different bucket set (auto-histogram span / empty-bucket filtering
differ), so both schema and row counts diverge.

CalciteMultisearchCommandIT (4 tests): same-index subsearch conflation (every
subsearch executes the first subsearch's filter, producing wrong counts) on 3
tests, and merged-column-order divergence over heterogeneous indices on 1.

Both are analytics-engine behaviors, recorded as AnalyticsRouteLimitation
constants. The v2/Calcite path is unchanged — all tests still run and pass there.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@Swiddis Swiddis added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 15, 2026
@ahkcs ahkcs merged commit a50d81a into opensearch-project:main Jun 15, 2026
29 of 36 checks passed
@ahkcs ahkcs deleted the analytics/bin-multisearch-parity-main branch June 16, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants