Stabilize CalcitePPLConditionBuiltinFunctionIT on the analytics-engine route#5556
Merged
ahkcs merged 1 commit intoJun 16, 2026
Merged
Conversation
31593a4 to
9ccd9d8
Compare
…e route init() seeds two extra docs into state_country_with_null via unconditional raw PUTs. init() runs as @before before every test method, and the analytics-engine parquet-backed store is append-only on same-_id PUT, so the docs accumulated a duplicate per method and inflated row counts across the suite. Guard the seed on a pre-loadIndex isIndexExist check so it runs exactly once; behavior is unchanged on the v2/Calcite route (same end state). Skip the six tests that exercise behaviors the analytics-engine route does not support, using the assumeNotAnalytics(...) registry (AnalyticsRouteLimitation) plus matching excludeTestsMatching entries in integTestRemote so the skip set stays countable in one place. NESTED_FIELDS is reused; three new constants: - STRUCT_PARENT_FIELD: querying an object/struct parent field directly (isnull/isnotnull(aws)) resolves to FIELD_NOT_FOUND — the route flattens objects to dotted leaf columns and the parent is not a queryable column. - CONCAT_NULL_AS_EMPTY: concat('H', null) = 'H' on the route (DataFusion NULL-as-empty) vs null on v2/Calcite (NULL-propagating). - EARLIEST_LATEST_NOW_CLOCK: earliest('now', utc_timestamp()) is true on the route (same instant) but false on v2 (clock-source divergence). The two nested tests reuse NESTED_FIELDS (nested fields are stripped at index creation on this route, opensearch-project#5541). Results (-Dtests.analytics.parquet_indices=true against the analytics route): CalcitePPLConditionBuiltinFunctionIT: 6/24 -> 18/18 run, 0 fail (6 excluded) v2/Calcite route unchanged: 24/24 pass. Signed-off-by: Kai Huang <ahkcs@amazon.com>
9ccd9d8 to
db62766
Compare
Swiddis
approved these changes
Jun 16, 2026
dai-chen
added a commit
to dai-chen/sql-1
that referenced
this pull request
Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative @RequiresCapability annotation enforced by CapabilityRule (JUnit4) on SQLIntegTestCase, so both SQL and PPL ITs can gate by capability. Replace all assumeNotAnalytics(...) calls with method-level @RequiresCapability (including the cases added by opensearch-project#5555 and opensearch-project#5556). Skip stays AE-route-only. Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass. Signed-off-by: Chen Dai <daichen@amazon.com>
dai-chen
added a commit
to dai-chen/sql-1
that referenced
this pull request
Jun 16, 2026
Rename AnalyticsRouteLimitation to Capability and add a declarative @RequiresCapability annotation enforced by CapabilityRule (JUnit4), wired into SQLIntegTestCase so SQL and PPL ITs gate by capability. Gate logic lives in BackendCapabilities (skips on the analytics-engine route, which supports none of the defined capabilities today). Replace all assumeNotAnalytics(...) with @RequiresCapability (incl. opensearch-project#5555/opensearch-project#5556). Verified: ./gradlew :integ-test:compileTestJava and :integ-test:spotlessCheck pass. Signed-off-by: Chen Dai <daichen@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Brings
CalcitePPLConditionBuiltinFunctionITto parity on the analytics-engine route (-Dtests.analytics.parquet_indices=true), where every test-created index is composite/parquet-backed and PPL queries route through the analytics engine (DataFusion). The IT already passes on the v2/Calcite route; this change is test-only.Root cause 1 — append-only accumulation (fixes 12 of 18 failures)
init()seeds two extra docs (_id 7,_id 8) intostate_country_with_nullvia unconditional rawPUTs.init()runs as@Before, before every test method, andpreserveClusterUponCompletion()keeps indices across methods. On the analytics-engine parquet-backed store aPUTwith an existing_idappends a new row instead of replacing, so the two docs accumulated one duplicate per method and inflated row counts across the suite (thewascounts grow monotonically: 13, 16, 25, 26, 32, 36, 42, 45, 47, 50, 52). The bulkloadIndex(...)calls don't have this problem because they're alreadyisIndexExist-guarded. Fix: captureisIndexExist(...)beforeloadIndexcreates the index, then seed the extra docs only on first creation. End state is identical on the v2/Calcite route.Root cause 2 — four unsupported behaviors on the route (the remaining 6 failures)
These are genuine analytics-engine behaviors, not test bugs (each verified directly against the route). They're skipped via the
assumeNotAnalytics(...)registry (AnalyticsRouteLimitation) introduced in #5551, plus matchingexcludeTestsMatchingentries inintegTestRemoteso the skip set stays countable in one place.NESTED_FIELDSis reused; three newAnalyticsRouteLimitationconstants:STRUCT_PARENT_FIELD— querying anobject/struct parent field directly (isnull(aws)/isnotnull(aws)onbig5.aws) resolves toFIELD_NOT_FOUND. The route flattens objects into dotted leaf columns (aws.cloudwatch.log_groupscans fine) but the struct parent is not a queryable column. Distinct fromNESTED_FIELDS:objectparents survive in the OpenSearch mapping but still can't be referenced as a whole. (testIsNullWithStruct,testIsNotNullWithStruct)NESTED_FIELDS(reused) —nested_simple.addressis typenested, which the route cannot store, so the test infra strips it at index creation ([analytics-engine] Strip AE-unsupported fields from test data and exclude the ITs doomed by it #5541) and the field resolves toFIELD_NOT_FOUND. (testIsNullWithNested,testIsNotNullWithNested)CONCAT_NULL_AS_EMPTY— DataFusionconcattreats NULL as an empty string (concat('H', null) = 'H'), whereas v2/Calcite propagates NULL (= null), so the null-name row diverges. (testNullIfWithExpression)EARLIEST_LATEST_NOW_CLOCK—earliest('now', utc_timestamp()): the relative-time'now'andutc_timestamp()resolve to the same instant on the route (now >= now→true), but differ on v2/Calcite (false) — a clock-source divergence. (testEarliestWithEval)Results
Run via
:integ-test:integTestRemoteagainst an analytics:runcluster.CalcitePPLConditionBuiltinFunctionITOn the analytics route the skipped tests are removed by
excludeTestsMatchingso they don't run; each also carries an in-testassumeNotAnalytics(...)as a safety net. The v2/Calcite route runs all 24 (the gradle excludes apply only tointegTestRemote, andassumeNotAnalyticsis a no-op when the analytics flag is off).Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.