Skip to content

test(integ-test): provision special-index search test through parquet-backed create on the analytics route#5447

Closed
ahkcs wants to merge 2 commits into
opensearch-project:mainfrom
ahkcs:pr/search-special-index-parquet
Closed

test(integ-test): provision special-index search test through parquet-backed create on the analytics route#5447
ahkcs wants to merge 2 commits into
opensearch-project:mainfrom
ahkcs:pr/search-special-index-parquet

Conversation

@ahkcs

@ahkcs ahkcs commented May 15, 2026

Copy link
Copy Markdown
Collaborator

What this PR does

Rebased onto current main. The two original correctness commits were dropped — both are now redundant with merged work:

Dropped commit Why it's no longer needed
Preserve datetime schema-column types in AnalyticsExecutionEngine Superseded by #5454 + opensearch-project/OpenSearch#21748, which delete DatetimeOutputCastRule / DatetimeOutputCastRewriter. Datetime root columns are no longer wrapped in CAST(... AS VARCHAR), so AnalyticsExecutionEngine reports the real timestamp / date / time type natively. There is no cast slot left to recover from.
Convert byte[] cells to base64 for IP / BINARY Superseded by #5463 (Added UDT for IP and Binary support), which dispatches on IpType / BinaryType in AnalyticsExecutionEngine.toExprValue and renders IP cells in canonical dotted form — strictly better than the base64 fallback. The unsupported object class [B crash is gone.

What remains are two integ-test-only commits for testSearchCommandWithSpecialIndexName:

  1. Route the logs-2021.01.11 / logs-7.10.0-2021.01.11 creates through TestUtils.createIndexByRestClient so the tests.analytics.parquet_indices toggle backs them with composite/parquet storage (the raw PUTs bypassed it).
  2. Pass a minimal {"mappings":{"properties":{}}} body so OpenSearchSchemaBuilder.buildSchema surfaces the otherwise mapping-less index in the analytics catalog.

No main-code changes.

Effect on the analytics-engine route

These commits fix the SQL-side plumbing but do not flip the test green on the analytics route — they advance its failure throw site:

  • Before: Table 'logs-2021.01.11' not found (the catalog never surfaced the index).
  • After: IllegalStateException: Cannot apply function on indexer class org.opensearch.index.engine.DataFormatAwareEngine directly on IndexShard — the composite engine can't open an empty parquet shard. That's an analytics-engine-side concern, tracked separately.

On the legacy / Calcite path the create still produces a plain Lucene index and the test passes exactly as before.

Pass rate — CalciteSearchCommandIT on the analytics route (-Dtests.analytics.parquet_indices=true)

Run against current main (which already carries #5454 / #21748 / #5463) on a force-routed parquet cluster:

Step PASS / 52
Rebased main baseline 26
+ this PR 26

The two commits don't change the pass rate — they correct SQL-side plumbing and shift one test's failure to the analytics-engine layer. The 25 failures are all the Lucene-secondary query_string family (numeric / date-range literal re-typing, compound-AND, wildcard vs NOT semantics), which is fixed upstream in the analytics engine, not in this repo.

Check list

  • Commits are signed per the DCO using --signoff

@ahkcs ahkcs changed the title Inject parquet settings in testSearchCommandWithSpecialIndexName Lower structured search predicates to native RexCall; analytics-route coverage fixes May 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0cef8c8

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from 602e1f0 to 0cef8c8 Compare May 15, 2026 21:17
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0cef8c8

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0cef8c8

@ahkcs ahkcs added calcite calcite migration releated PPL Piped processing language enhancement New feature or request labels May 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 828b80d

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 52ffeb8

@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 96b33cf

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from 96b33cf to c6e07a0 Compare May 15, 2026 22:34
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c6e07a0

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from c6e07a0 to 51f4213 Compare May 18, 2026 19:03
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 51f4213

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from 51f4213 to 8416a3b Compare May 18, 2026 19:16
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8416a3b

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from 8416a3b to 02807c7 Compare May 18, 2026 19:54
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 02807c7

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch 3 times, most recently from 20b7bc3 to 57178c1 Compare May 19, 2026 17:01
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 57178c1

@ahkcs ahkcs changed the title Lower structured search predicates to native RexCall; analytics-route coverage fixes Correctness fixes for analytics-engine search route (datetime schema + IP/BINARY byte[]) May 19, 2026
@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from 57178c1 to adf0887 Compare May 19, 2026 18:23
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit adf0887

// `OpenSearchSchemaBuilder` + `schema_coerce.rs`) and route by type here.
if (bytes.length == 4 || bytes.length == 16) {
try {
return new ExprIpValue(java.net.InetAddress.getByAddress(bytes).getHostAddress());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This is going to trip someone up at some point. In particular, any 4-letter words in ASCII will hit this. I don't want to Norway our customers.

Is there not a way to somehow tag the bytes with whether they're actually IP data or not?

@ahkcs ahkcs May 19, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right. Dropped the length-based heuristic in the latest force-push: byte[] cells now always render as base64, no IP-guessing.

Trade-off: genuine IPv4 / IPv6 cells render as base64 strings instead of dotted-decimal until the schema actually tells us "this column is IP." That's the upstream TODO — OpenSearchSchemaBuilder.mapFieldType currently collapses both ip and binary into VARBINARY. Splitting them into separate UDTs is a cross-repo refactor (analytics-api + isthmus + Substrait + SQL plugin), so leaving it as a follow-up.
The conservative fix here still resolves the original crash (every PPL query scanning an ip column was throwing unsupported object class [B) — just without the misinterpretation risk you flagged.

@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from adf0887 to cdac62e Compare May 19, 2026 19:20
@github-actions

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cdac62e

@opensearch-project opensearch-project deleted a comment from github-actions Bot May 19, 2026
@opensearch-project opensearch-project deleted a comment from github-actions Bot May 19, 2026
@ahkcs ahkcs requested a review from Swiddis May 19, 2026 22:13
ahkcs added 2 commits May 28, 2026 15:56
The test creates `logs-2021.01.11` and `logs-7.10.0-2021.01.11` directly
via raw PUTs to exercise the parser's handling of dot/version-prefixed
index names. The raw path bypasses `TestUtils.createIndexByRestClient`,
so the analytics-engine `tests.analytics.parquet_indices=true` toggle
never injects the composite/parquet settings. On the analytics-engine
route this surfaces as `Table 'logs-2021.01.11' not found` (the
catalog-builder only exposes indices that match an analytics-engine-
recognized data format), failing the test.

Routing the creates through `TestUtils.createIndexByRestClient` makes
both code paths honor the toggle: legacy / Calcite continues to create
plain Lucene-backed indices, analytics-engine gets parquet-backed
composite indices. The index names are still tested verbatim.

Surfaced by `CalciteSearchCommandIT` on the analytics-engine route.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ialIndexName

`OpenSearchSchemaBuilder.buildSchema` skips any index whose mapping has
no `properties` block — the analytics catalog then doesn't surface the
index, and the PPL parser fails with `Table 'logs-2021.01.11' not
found` before reaching the row type. Routing the special-index create
through `TestUtils.createIndexByRestClient` with `null` mapping
produces an empty `{}` body, so the index is created with no mapping
at all and the catalog still skips it.

Pass a minimal `{"mappings":{"properties":{}}}` body. OpenSearch
strips the empty `properties` to `{}` server-side but the schema
builder sees a non-null mapping and registers the (zero-column) table.
The query then parses cleanly and the test's `verifyDataRows()` (no
expected rows) succeeds against the empty index.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the pr/search-special-index-parquet branch from cdac62e to 01a0aff Compare May 29, 2026 00:10
@ahkcs ahkcs changed the title Correctness fixes for analytics-engine search route (datetime schema + IP/BINARY byte[]) test(integ-test): provision special-index search test through parquet-backed create on the analytics route May 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Add cleanup for created test indices

The test creates indices but doesn't clean them up after execution. This can cause
test pollution and failures in subsequent test runs. Add cleanup logic in a teardown
method or try-finally block to delete the created indices.

integ-test/src/test/java/org/opensearch/sql/ppl/SearchCommandIT.java [62-69]

 String minimalMapping = "{\"mappings\":{\"properties\":{}}}";
-org.opensearch.sql.legacy.TestUtils.createIndexByRestClient(
-    client(), "logs-2021.01.11", minimalMapping);
-verifyDataRows(executeQuery("search source=logs-2021.01.11"));
+try {
+  org.opensearch.sql.legacy.TestUtils.createIndexByRestClient(
+      client(), "logs-2021.01.11", minimalMapping);
+  verifyDataRows(executeQuery("search source=logs-2021.01.11"));
 
-org.opensearch.sql.legacy.TestUtils.createIndexByRestClient(
-    client(), "logs-7.10.0-2021.01.11", minimalMapping);
-verifyDataRows(executeQuery("search source=logs-7.10.0-2021.01.11"));
+  org.opensearch.sql.legacy.TestUtils.createIndexByRestClient(
+      client(), "logs-7.10.0-2021.01.11", minimalMapping);
+  verifyDataRows(executeQuery("search source=logs-7.10.0-2021.01.11"));
+} finally {
+  executeRequest(new Request("DELETE", "/logs-2021.01.11"));
+  executeRequest(new Request("DELETE", "/logs-7.10.0-2021.01.11"));
+}
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that test indices are created but not cleaned up, which could cause test pollution. However, the score is moderate because: (1) the test framework may already handle cleanup through other means (e.g., test class teardown methods or test infrastructure), and (2) the suggestion asks to verify/ensure proper cleanup rather than fixing a critical bug. The improved code demonstrates a valid approach using try-finally blocks.

Medium

@ahkcs

ahkcs commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

Closing this — after rebasing onto current main, there's nothing left here worth landing.

The two correctness fixes are now redundant with merged work

  • Preserve datetime schema-column types → superseded by Remove DatetimeOutputCastRule on the analytics-engine route (#5420) #5454 + opensearch-project/OpenSearch#21748, which delete DatetimeOutputCastRule / DatetimeOutputCastRewriter. Datetime root columns are no longer cast to VARCHAR, so AnalyticsExecutionEngine reports timestamp / date / time natively. Verified on a current-main force-routed parquet cluster: the CalciteSearchCommandIT time-modifier tests pass verifySchema(..., "timestamp") and fail only on the downstream query_string row-count assertions. No recovery walk needed.
  • Convert byte[] cells to base64 for IP / BINARY → superseded by Added UDT for IP and Binary support #5463 (Added UDT for IP and Binary support), which dispatches on IpType / BinaryType in AnalyticsExecutionEngine.toExprValue. The unsupported object class [B crash is gone.

The remaining test plumbing makes the score worse, not better

The two testSearchCommandWithSpecialIndexName commits don't unblock anything. On a -Dtests.analytics.parquet_indices=true run:

special-index test total
main (raw PUTLucene index → routed to Calcite) ✅ passes 27 / 52
with these commits (parquet-backed → routed to analytics engine) ❌ fails 26 / 52

The pre-PR "pass" is a false green: under parquet_indices=true the raw-PUT index is Lucene-backed, so RestUnifiedQueryAction.isAnalyticsIndex routes it to Calcite and the test never touches the analytics engine. Making it genuinely parquet-backed (the point of the commits) exposes a real OpenSearch-side failure when the analytics route searches the empty composite shard:

java.lang.IllegalStateException: Cannot apply function on indexer class
  org.opensearch.index.engine.DataFormatAwareEngine directly on IndexShard
    at org.opensearch.index.shard.IndexShard.applyOnEngine(IndexShard.java:6331)
    at org.opensearch.index.shard.IndexShard.acquireSearcherSupplier(IndexShard.java:2391)
    at org.opensearch.search.SearchService.createOrGetReaderContext(SearchService.java:1131)
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:877)

That fix belongs in the analytics engine / composite-engine (DataFormatAwareEngine searcher acquisition), not in this repo. Tracking it upstream; this SQL-side IT will flip green on its own once that lands. Closing.

@ahkcs ahkcs closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants