Skip to content

Stabilize subquery PPL ITs on the analytics-engine route#5555

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix-subquery-analytics-accumulation
Jun 16, 2026
Merged

Stabilize subquery PPL ITs on the analytics-engine route#5555
ahkcs merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix-subquery-analytics-accumulation

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Brings CalcitePPLScalarSubqueryIT and CalcitePPLInSubqueryIT to parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true), where every test-created index is composite/parquet-backed and PPL queries route through the analytics engine (DataFusion). Both ITs already pass on the v2/Calcite route; this change is test-only.

Root cause 1 — append-only accumulation (fixes 19 of 23 failures)

Both ITs seed an extra worker doc (_id 7, "Tommy") via an unconditional raw PUT in init(). init() runs as @Before, before every test method, and preserveClusterUponCompletion() keeps indices across methods. On the analytics-engine parquet-backed store a PUT with an existing _id appends a new row instead of replacing, so Tommy accumulated one duplicate per test method and inflated row counts across the suite (e.g. a 7-row expectation returned 20 = 6 bulk docs + 14 duplicate Tommys).

The bulk loadIndex(...) calls don't have this problem because they're already isIndexExist-guarded. Fix: capture isIndexExist(...) before loadIndex creates the index, then seed Tommy only on first creation. End state is identical on the v2/Calcite route.

Root cause 2 — two unsupported behaviors on the route (the remaining 4 failures)

These are genuine analytics-engine behaviors, not test bugs (each verified directly against the route). They're skipped via the assumeNotAnalytics(...) registry (AnalyticsRouteLimitation) introduced in #5551, plus matching excludeTestsMatching entries in integTestRemote so the skip set stays countable in one place. Two new AnalyticsRouteLimitation constants:

  • TEXT_FIELD_EXACT_MATCH — exact equality (= / ==) on an explicitly text-mapped field (no .keyword sub-field) returns no rows on the DataFusion scan. The sibling of the existing DYNAMIC_STRING_NO_KEYWORD for fields mapped text on purpose rather than by dynamic mapping. Verified directly: where department = 'DATA' returns no rows, while like(department, 'DATA') and keyword-field equality both work. Covers testTwoUncorrelatedScalarSubqueriesInOr, testInSubqueryWithTableAlias, testInCorrelatedSubquery (each filters department/occupation with = in the subsearch).
  • SUBSEARCH_MAXOUT_IN_SUBQUERY — the subsearch.maxout cap is lowered as a LIMIT on the right-hand side of the IN-subquery semi-join (LogicalSystemLimit(fetch=N, type=SUBSEARCH_MAXOUT)), which the route does not honor, so the subsearch returns all rows regardless of the cap. Verified: with subsearch.maxout=1 an id in [...] subquery still returns every matching row. Covers testSubsearchMaxOut.

Note on the text-equality root cause: an earlier revision of this PR described it as "text fields can't do exact match", which was too coarse — testSelfInSubquery filters a text field (country = 'USA') inside a self-index subquery and passes. Re-verified directly against the cluster: standalone text = returns no rows for every text field, keyword = works, and like() works; the three skipped tests all hinge on a text = filter in a cross-index subsearch. The gap is the analyzed-text exact-match scan limitation (same family as DYNAMIC_STRING_NO_KEYWORD), now captured precisely in TEXT_FIELD_EXACT_MATCH.

Results

Run via :integ-test:integTestRemote against an analytics :run cluster.

Test class Analytics route — before Analytics route — after v2/Calcite route
CalcitePPLScalarSubqueryIT 2/14 pass, 12 fail 13/13 run, 0 fail (1 excluded) 14/14 pass
CalcitePPLInSubqueryIT 7/18 pass, 11 fail 14/15 run, 0 fail (3 excluded; 1 pre-existing @Ignore) 17/18 pass (1 @Ignore)

On the analytics route the skipped tests are removed by excludeTestsMatching so they don't run; each also carries an in-test assumeNotAnalytics(...) as a safety net. The v2/Calcite route runs all of them (the gradle excludes apply only to integTestRemote, and assumeNotAnalytics is a no-op when the analytics flag is off).

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

CalcitePPLScalarSubqueryIT and CalcitePPLInSubqueryIT both seed an extra
worker doc via an unconditional raw PUT in init(). init() runs as @before
before every test method, and the analytics-engine parquet-backed store is
append-only on same-_id PUT, so the doc accumulated a duplicate per method
and inflated row counts across the suite. Guard the seed on a pre-loadIndex
isIndexExist check so it runs exactly once; behavior is unchanged on the
v2/Calcite route (same end state).

Skip the four tests that exercise behaviors the analytics-engine route does
not support, using the assumeNotAnalytics(...) registry (AnalyticsRouteLimitation)
plus matching excludeTestsMatching entries in integTestRemote so the skip set
stays countable in one place. Two new AnalyticsRouteLimitation constants:
  - TEXT_FIELD_EXACT_MATCH: exact = / == on an explicitly text-mapped field
    (no .keyword sub-field) returns no rows on the DataFusion scan. Sibling of
    DYNAMIC_STRING_NO_KEYWORD for explicitly-mapped (not dynamic) text fields.
    Covers testTwoUncorrelatedScalarSubqueriesInOr, testInSubqueryWithTableAlias,
    testInCorrelatedSubquery (each filters department/occupation = '...').
  - SUBSEARCH_MAXOUT_IN_SUBQUERY: the subsearch.maxout cap is lowered as a LIMIT
    on the in-subquery semi-join's right side, which the route does not honor, so
    the subsearch returns all rows. Covers testSubsearchMaxOut.

Results (-Dtests.analytics.parquet_indices=true against the analytics route):
  CalcitePPLScalarSubqueryIT: 2/14 -> 13/13 run, 0 fail (1 excluded)
  CalcitePPLInSubqueryIT:     7/18 -> 14/15 run, 0 fail (3 excluded; 1 @ignore)
v2/Calcite route unchanged: 14/14 and 17/17 (1 pre-existing @ignore).

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix-subquery-analytics-accumulation branch from 8ee2fc4 to c34ffd4 Compare June 16, 2026 19:26
@ahkcs ahkcs merged commit 3829dfa into opensearch-project:main Jun 16, 2026
29 of 33 checks passed
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative
@RequiresCapability annotation enforced by CapabilityRule (JUnit4) on
SQLIntegTestCase, so both SQL and PPL ITs can gate by capability. Replace
all assumeNotAnalytics(...) calls with method-level @RequiresCapability
(including the cases added by opensearch-project#5555 and opensearch-project#5556). Skip stays AE-route-only.

Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass.
Signed-off-by: Chen Dai <daichen@amazon.com>
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative
@RequiresCapability annotation enforced by CapabilityRule (JUnit4),
wired into SQLIntegTestCase so SQL and PPL ITs gate by capability.
Gate logic lives in BackendCapabilities (skips on the analytics-engine
route, which supports none of the defined capabilities today). Replace
all assumeNotAnalytics(...) with @RequiresCapability (incl. opensearch-project#5555/opensearch-project#5556).

Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass.
Signed-off-by: Chen Dai <daichen@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants