Skip to content

Exclude full-text search-filter operator tests on the analytics-engine route#5527

Merged
penghuo merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix-operatorit-analytics-route
Jun 9, 2026
Merged

Exclude full-text search-filter operator tests on the analytics-engine route#5527
penghuo merged 1 commit into
opensearch-project:mainfrom
ahkcs:fix-operatorit-analytics-route

Conversation

@ahkcs

@ahkcs ahkcs commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Description

The 7 comparison-operator tests in OperatorIT / CalciteOperatorIT (testEqualOperator, testNotEqualOperator, testLessOperator, testLteOperator, testGreaterOperator, testGteOperator, testNotOperator) fail on the analytics-engine route (-Dtests.analytics.parquet_indices=true).

Root cause. They use the implicit search-filter syntax (source=idx age = 32), which the PPL search command lowers to a Lucene query_string predicate:

LogicalFilter(condition=[query_string(MAP('query', 'age:32'))])

Executing that on the analytics route fails with:

java.lang.NullPointerException: Cannot invoke
"org.opensearch.be.lucene.LuceneReader.searcher(...)" because "luceneReader" is null
  at io.substrait.isthmus.SubstraitRelVisitor.fromAggCall
  at org.opensearch.be.datafusion.DataFusionFragmentConvertor.convertToSubstrait

query_string is full-text search — it needs an inverted-index LuceneReader/searcher. A parquet-backed analytics index (composite.primary_data_format=parquet, no Lucene secondary) has no Lucene reader, and DataFusion is a columnar engine, not a full-text engine. So full-text search is genuinely unsupported on the analytics route, not a bug.

Fix. Exclude these tests on the analytics route, gated on tests.analytics.parquet_indices, following the existing integ-test/build.gradle exclusion pattern — recording that full-text search isn't supported there rather than changing what the tests assert. The v2 / Calcite (Lucene-backed) path is unaffected; the tests run and pass there.

Testing

CalciteOperatorIT (21 tests):

route search-filter operator tests rest of class
analytics-engine (parquet) excluded (7) ✅ 14/14 pass
v2 / Calcite ✅ run & pass

Check List

  • New functionality includes testing.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Verify duplicate expected row

The expected rows for the not-equal operator test include duplicate rows(36). Verify
this is intentional based on the test data. If the test index contains only one
record with age 36, remove the duplicate to prevent test failures.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteOperatorIT.java [55-61]

 @Override
 public void testNotEqualOperator() throws IOException {
   if (isAnalyticsParquetIndicesEnabled()) {
-    assertViaWhere("age != 32", rows(28), rows(33), rows(34), rows(36), rows(36), rows(39));
+    assertViaWhere("age != 32", rows(28), rows(33), rows(34), rows(36), rows(39));
   } else {
     super.testNotEqualOperator();
   }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue with duplicate rows(36) in the expected results. This could indicate either intentional test data (two records with age 36) or an error. Since this is asking for verification rather than asserting a definite bug, and the duplicate could be legitimate based on the test data, it receives a moderate score.

Medium

@ahkcs ahkcs changed the title Exercise CalciteOperatorIT comparison operators via where on the analytics-engine route Exclude full-text search-filter operator tests on the analytics-engine route Jun 8, 2026
@ahkcs ahkcs force-pushed the fix-operatorit-analytics-route branch 2 times, most recently from 1e0d113 to 0f9fa3f Compare June 8, 2026 23:00
…e route

The 7 comparison-operator tests in OperatorIT / CalciteOperatorIT use the search-filter
syntax (source=idx age = 32), which lowers to a Lucene query_string predicate. Executing
it on the analytics-engine route fails with a null LuceneReader: query_string needs an
inverted-index searcher, which parquet-backed analytics indices don't have (DataFusion is
not a full-text engine). Full-text search is genuinely unsupported there.

Exclude these tests only when the analytics route is actually enabled
(Boolean.parseBoolean(tests.analytics.parquet_indices), matching
AnalyticsIndexConfig.isEnabled) so the v2 path still runs them, following the established
build.gradle exclusion pattern.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the fix-operatorit-analytics-route branch from 0f9fa3f to 56a5c96 Compare June 8, 2026 23:13
@dai-chen dai-chen added the infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc. label Jun 9, 2026
@penghuo penghuo merged commit ab58c4d into opensearch-project:main Jun 9, 2026
39 of 42 checks passed
ahkcs added a commit to ahkcs/sql that referenced this pull request Jun 15, 2026
Add the 9 analytics-route skips from this PR to the analyticsEnabled
excludeTestsMatching block in integ-test/build.gradle, following the
opensearch-project#5527 / opensearch-project#5541 precedent, so the full set of tests skipped on the
analytics-engine route is countable in one place.

These complement the in-test assumeNotAnalytics(...) calls (which record
the per-test reason via AnalyticsRouteLimitation); the gradle list is the
single tracking surface for how many tests the route skips.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit that referenced this pull request Jun 15, 2026
…5546)

* Bring CalciteWhereCommandIT to parity on the analytics-engine route

Branch the tests that exercise analytics-route-incompatible behavior behind
isAnalyticsParquetIndicesEnabled():
- 6 nested-field tests (in the Calcite subclass): nested fields are stripped on
  the parquet/composite route, which has no nested-document support.
- 2 _id metadata-field tests: the analytics route doesn't expose the _id
  metadata field.
- testDoubleEqualWithSpecialCharacters: email is dynamically mapped to a text
  field without a .keyword sub-field on the parquet route (the composite
  dataformat's dynamic mapping omits the keyword sub-field standard OpenSearch
  adds), so exact equality on the analyzed text field returns no rows on the
  DataFusion scan. firstname (explicit text+keyword) still exercises == there.

Analytics-engine route: 9 failing -> 0 (32 pass, 9 documented skips).
v2 route: 41/41 pass.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Centralize analytics-route skip reasons in AnalyticsRouteLimitation enum

Replace per-test assumeFalse(reason, isAnalyticsParquetIndicesEnabled())
calls with assumeNotAnalytics(LIMITATION), backed by a single
AnalyticsRouteLimitation enum that holds every known analytics-engine
route divergence and its skip reason.

This gives one greppable registry of route gaps (addressing review
feedback to make the skips trackable in one place) and removes the
PUT+DELETE skip reason that was copy-pasted across four IT classes.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Track CalciteWhereCommandIT AE-route skips in the gradle exclude list

Add the 9 analytics-route skips from this PR to the analyticsEnabled
excludeTestsMatching block in integ-test/build.gradle, following the
#5527 / #5541 precedent, so the full set of tests skipped on the
analytics-engine route is countable in one place.

These complement the in-test assumeNotAnalytics(...) calls (which record
the per-test reason via AnalyticsRouteLimitation); the gradle list is the
single tracking surface for how many tests the route skips.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Changes to infrastructure, testing, CI/CD, pipelines, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants