Skip to content

Stabilize PPL ITs on the analytics-engine route (case/string/full-text/like/appendpipe/datatype)#5561

Open
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:fix-casefunction-analytics-parity
Open

Stabilize PPL ITs on the analytics-engine route (case/string/full-text/like/appendpipe/datatype)#5561
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:fix-casefunction-analytics-parity

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Description

Analytics-engine route (-Dtests.analytics.parquet_indices=true) parity pass across several PPL IT classes. All changes are test-only; the v2/Calcite route is unchanged (every gated test still runs there — the assumeNotAnalytics(...) guards are no-ops off-route and the gradle excludes apply only to integTestRemote).

CalcitePPLCaseFunctionIT (3/9 → 8/9 pass, 1 excluded)

  • Accumulation: appendDataForBadResponse() raw-PUTs 4 weblogs docs unconditionally in init(); the append-only AE store can't replace on same-_id PUT, so they accumulated per method. Guarded on a pre-loadIndex isIndexExist check.
  • otel_logs multi-value reject: the dataset has a multi-value attributes.email.invalid_recipients array the parquet store rejects (Cannot accept multiple values for field ... of type keyword), aborting init(). Only testNestedCaseAggWithAutoDateHistogram uses it, so the load is skipped on the AE route.
  • BIN_TIME_FIELD_BUCKETING (existing): bin @timestamp | stats by @timestamp returns the bucket column typed string not timestamp. Skips testNestedCaseAggWithAutoDateHistogram.

CalcitePPLStringBuiltinFunctionIT (7 tests skipped — DOC_MUTATION)

testConcatWithField/ConcatWs/Reverse/Right/Trim/RTrim/LTrim re-PUT a shared _id (5/6/7) with different data, relying on PUT-replace. On the AE store same-_id PUT appends and DELETE is unsupported, so these accumulate and cross-contaminate (counts are order-dependent). Skipped via the existing DOC_MUTATION limitation.

Full-text relevance functions (3 tests skipped — new FULLTEXT_RELEVANCE_FUNC)

MultiMatchIT.test_wildcard_multi_match, QueryStringIT.wildcard_test, SimpleQueryStringIT.test_wildcard_simple_query_string use multi_match / query_string / simple_query_string — Lucene relevance functions with no DataFusion equivalent (they return no rows).

CalciteLikeQueryIT.test_the_default_3rd_option (skipped — new LIKE_CASE_SENSITIVITY)

The v3 branch expects case-sensitive LIKE (0 rows for 'test Wildcard%' vs lowercase data); the AE route's LIKE is case-insensitive (DataFusion) and returns 7.

CalcitePPLAppendPipeCommandIT.testDoubleAppendPipeWithFilter (skipped — new APPENDPIPE_MAIN_RESULT_DROPPED)

appendpipe [subpipe] drops the main pipeline's rows on the AE route: the subpipe's filter is applied to the main result instead of being appended, so the originals are lost (verified: stats ... | appendpipe [where gender='F'] returns only the F rows, not the originals plus the filtered copy).

DataTypeIT exclude globs (coverage fix)

test_nonnumeric_data_types, test_alias_data_type, and SystemFunctionIT.typeof_opensearch_types were excluded with org.opensearch.sql.ppl.* globs that did not match the Calcite subclasses. Broadened to * so CalciteDataTypeIT / CalciteSystemFunctionIT are covered too.

Results (analytics route, per-listed-test)

Test Before After
CalcitePPLCaseFunctionIT.testCaseWhen{InFilter,InSubquery,WithCast,WithIn} fail (accumulation) pass
CalcitePPLStringBuiltinFunctionIT.test{LTrim,Reverse,Trim} fail (accumulation) excluded (DOC_MUTATION)
CalciteMultiMatchIT.test_wildcard_multi_match fail excluded
CalciteQueryStringIT.wildcard_test fail excluded
CalciteSimpleQueryStringIT.test_wildcard_simple_query_string fail excluded
CalciteLikeQueryIT.test_the_default_3rd_option fail excluded
CalcitePPLAppendPipeCommandIT.testDoubleAppendPipeWithFilter fail excluded
CalciteDataTypeIT.test_nonnumeric_data_types fail excluded (glob fix)

v2/Calcite route: all edited classes 100% pass, 0 skips.

Not included (deferred — investigated but not cleanly fixable here): CalciteChartCommandIT has 5 analytics-route failures (its own multi-bucket triage, partly the known otel_logs ip-scan defect), and NfwPplDashboardIT.testTopLongLivedTCPFlows returns 0 rows via the IT harness but the identical query returns the correct 2 rows in a direct probe — its cause isn't reproducible/classifiable from outside the harness, so it's left for separate investigation rather than shipping a mislabeled skip.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ahkcs ahkcs force-pushed the fix-casefunction-analytics-parity branch from cd45dd1 to ad0121c Compare June 17, 2026 18:06
@ahkcs ahkcs added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 17, 2026
@ahkcs ahkcs changed the title Stabilize CalcitePPLCaseFunctionIT on the analytics-engine route Stabilize PPL ITs on the analytics-engine route (case/string/full-text/like/appendpipe/datatype) Jun 17, 2026
@ahkcs ahkcs force-pushed the fix-casefunction-analytics-parity branch from aa67346 to d352b23 Compare June 17, 2026 22:07
…t/like/appendpipe)

Analytics-engine route parity for several PPL IT classes; test-only. Uses the
@RequiresCapability annotation + Capability registry (opensearch-project#5560) plus matching
excludeTestsMatching entries.

CalcitePPLCaseFunctionIT:
  - Guard the weblogs raw-PUT seeding (appendDataForBadResponse) on a pre-load
    isIndexExist check — the append-only AE store inflated counts per method.
  - Skip the otel_logs load on the AE route (multi-value keyword the parquet
    store rejects); only testNestedCaseAggWithAutoDateHistogram uses it, and
    that test requires BIN_TIME_FIELD_BUCKETING (bucket column typed string).

CalcitePPLStringBuiltinFunctionIT: 7 tests re-PUT a shared _id with different
data; the append-only AE store can't replace docs (DELETE unsupported) ->
DOC_MUTATION.

MultiMatchIT / QueryStringIT / SimpleQueryStringIT wildcard tests: full-text
relevance functions with no DataFusion equivalent -> new FULLTEXT_RELEVANCE_FUNC.

CalciteLikeQueryIT.test_the_default_3rd_option: AE LIKE is case-insensitive but
v2/Calcite is case-sensitive -> new LIKE_CASE_SENSITIVITY.

CalcitePPLAppendPipeCommandIT.testDoubleAppendPipeWithFilter: appendpipe drops
the main pipeline's rows on the AE route -> new APPENDPIPE_MAIN_RESULT_DROPPED.

v2/Calcite route unchanged (all run, 0 skips).

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix-casefunction-analytics-parity branch from d352b23 to 7bf75e0 Compare June 17, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant