Stabilize subquery PPL ITs on the analytics-engine route#5555
Merged
ahkcs merged 1 commit intoJun 16, 2026
Merged
Conversation
mengweieric
approved these changes
Jun 16, 2026
CalcitePPLScalarSubqueryIT and CalcitePPLInSubqueryIT both seed an extra worker doc via an unconditional raw PUT in init(). init() runs as @before before every test method, and the analytics-engine parquet-backed store is append-only on same-_id PUT, so the doc accumulated a duplicate per method and inflated row counts across the suite. Guard the seed on a pre-loadIndex isIndexExist check so it runs exactly once; behavior is unchanged on the v2/Calcite route (same end state). Skip the four tests that exercise behaviors the analytics-engine route does not support, using the assumeNotAnalytics(...) registry (AnalyticsRouteLimitation) plus matching excludeTestsMatching entries in integTestRemote so the skip set stays countable in one place. Two new AnalyticsRouteLimitation constants: - TEXT_FIELD_EXACT_MATCH: exact = / == on an explicitly text-mapped field (no .keyword sub-field) returns no rows on the DataFusion scan. Sibling of DYNAMIC_STRING_NO_KEYWORD for explicitly-mapped (not dynamic) text fields. Covers testTwoUncorrelatedScalarSubqueriesInOr, testInSubqueryWithTableAlias, testInCorrelatedSubquery (each filters department/occupation = '...'). - SUBSEARCH_MAXOUT_IN_SUBQUERY: the subsearch.maxout cap is lowered as a LIMIT on the in-subquery semi-join's right side, which the route does not honor, so the subsearch returns all rows. Covers testSubsearchMaxOut. Results (-Dtests.analytics.parquet_indices=true against the analytics route): CalcitePPLScalarSubqueryIT: 2/14 -> 13/13 run, 0 fail (1 excluded) CalcitePPLInSubqueryIT: 7/18 -> 14/15 run, 0 fail (3 excluded; 1 @ignore) v2/Calcite route unchanged: 14/14 and 17/17 (1 pre-existing @ignore). Signed-off-by: Kai Huang <ahkcs@amazon.com>
8ee2fc4 to
c34ffd4
Compare
dai-chen
added a commit
to dai-chen/sql-1
that referenced
this pull request
Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative @RequiresCapability annotation enforced by CapabilityRule (JUnit4) on SQLIntegTestCase, so both SQL and PPL ITs can gate by capability. Replace all assumeNotAnalytics(...) calls with method-level @RequiresCapability (including the cases added by opensearch-project#5555 and opensearch-project#5556). Skip stays AE-route-only. Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass. Signed-off-by: Chen Dai <daichen@amazon.com>
dai-chen
added a commit
to dai-chen/sql-1
that referenced
this pull request
Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative @RequiresCapability annotation enforced by CapabilityRule (JUnit4), wired into SQLIntegTestCase so SQL and PPL ITs gate by capability. Gate logic lives in BackendCapabilities (skips on the analytics-engine route, which supports none of the defined capabilities today). Replace all assumeNotAnalytics(...) with @RequiresCapability (incl. opensearch-project#5555/opensearch-project#5556). Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass. Signed-off-by: Chen Dai <daichen@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brings
CalcitePPLScalarSubqueryITandCalcitePPLInSubqueryITto parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true), where every test-created index is composite/parquet-backed and PPL queries route through the analytics engine (DataFusion). Both ITs already pass on the v2/Calcite route; this change is test-only.Root cause 1 — append-only accumulation (fixes 19 of 23 failures)
Both ITs seed an extra
workerdoc (_id 7, "Tommy") via an unconditional rawPUTininit().init()runs as@Before, before every test method, andpreserveClusterUponCompletion()keeps indices across methods. On the analytics-engine parquet-backed store aPUTwith an existing_idappends a new row instead of replacing, so Tommy accumulated one duplicate per test method and inflated row counts across the suite (e.g. a 7-row expectation returned 20 = 6 bulk docs + 14 duplicate Tommys).The bulk
loadIndex(...)calls don't have this problem because they're alreadyisIndexExist-guarded. Fix: captureisIndexExist(...)beforeloadIndexcreates the index, then seed Tommy only on first creation. End state is identical on the v2/Calcite route.Root cause 2 — two unsupported behaviors on the route (the remaining 4 failures)
These are genuine analytics-engine behaviors, not test bugs (each verified directly against the route). They're skipped via the
assumeNotAnalytics(...)registry (AnalyticsRouteLimitation) introduced in #5551, plus matchingexcludeTestsMatchingentries inintegTestRemoteso the skip set stays countable in one place. Two newAnalyticsRouteLimitationconstants:TEXT_FIELD_EXACT_MATCH— exact equality (=/==) on an explicitlytext-mapped field (no.keywordsub-field) returns no rows on the DataFusion scan. The sibling of the existingDYNAMIC_STRING_NO_KEYWORDfor fields mappedtexton purpose rather than by dynamic mapping. Verified directly:where department = 'DATA'returns no rows, whilelike(department, 'DATA')and keyword-field equality both work. CoverstestTwoUncorrelatedScalarSubqueriesInOr,testInSubqueryWithTableAlias,testInCorrelatedSubquery(each filtersdepartment/occupationwith=in the subsearch).SUBSEARCH_MAXOUT_IN_SUBQUERY— thesubsearch.maxoutcap is lowered as aLIMITon the right-hand side of the IN-subquery semi-join (LogicalSystemLimit(fetch=N, type=SUBSEARCH_MAXOUT)), which the route does not honor, so the subsearch returns all rows regardless of the cap. Verified: withsubsearch.maxout=1anid in [...]subquery still returns every matching row. CoverstestSubsearchMaxOut.Results
Run via
:integ-test:integTestRemoteagainst an analytics:runcluster.CalcitePPLScalarSubqueryITCalcitePPLInSubqueryIT@Ignore)@Ignore)On the analytics route the skipped tests are removed by
excludeTestsMatchingso they don't run; each also carries an in-testassumeNotAnalytics(...)as a safety net. The v2/Calcite route runs all of them (the gradle excludes apply only tointegTestRemote, andassumeNotAnalyticsis a no-op when the analytics flag is off).Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.