test(integ-test): provision special-index search test through parquet-backed create on the analytics route#5447
Conversation
|
Persistent review updated to latest commit 0cef8c8 |
602e1f0 to
0cef8c8
Compare
|
Persistent review updated to latest commit 0cef8c8 |
1 similar comment
|
Persistent review updated to latest commit 0cef8c8 |
|
Persistent review updated to latest commit 828b80d |
|
Persistent review updated to latest commit 52ffeb8 |
|
Persistent review updated to latest commit 96b33cf |
96b33cf to
c6e07a0
Compare
|
Persistent review updated to latest commit c6e07a0 |
c6e07a0 to
51f4213
Compare
|
Persistent review updated to latest commit 51f4213 |
51f4213 to
8416a3b
Compare
|
Persistent review updated to latest commit 8416a3b |
8416a3b to
02807c7
Compare
|
Persistent review updated to latest commit 02807c7 |
20b7bc3 to
57178c1
Compare
|
Persistent review updated to latest commit 57178c1 |
57178c1 to
adf0887
Compare
|
Persistent review updated to latest commit adf0887 |
| // `OpenSearchSchemaBuilder` + `schema_coerce.rs`) and route by type here. | ||
| if (bytes.length == 4 || bytes.length == 16) { | ||
| try { | ||
| return new ExprIpValue(java.net.InetAddress.getByAddress(bytes).getHostAddress()); |
There was a problem hiding this comment.
issue: This is going to trip someone up at some point. In particular, any 4-letter words in ASCII will hit this. I don't want to Norway our customers.
Is there not a way to somehow tag the bytes with whether they're actually IP data or not?
There was a problem hiding this comment.
Good catch — you're right. Dropped the length-based heuristic in the latest force-push: byte[] cells now always render as base64, no IP-guessing.
Trade-off: genuine IPv4 / IPv6 cells render as base64 strings instead of dotted-decimal until the schema actually tells us "this column is IP." That's the upstream TODO — OpenSearchSchemaBuilder.mapFieldType currently collapses both ip and binary into VARBINARY. Splitting them into separate UDTs is a cross-repo refactor (analytics-api + isthmus + Substrait + SQL plugin), so leaving it as a follow-up.
The conservative fix here still resolves the original crash (every PPL query scanning an ip column was throwing unsupported object class [B) — just without the misinterpretation risk you flagged.
adf0887 to
cdac62e
Compare
|
Persistent review updated to latest commit cdac62e |
The test creates `logs-2021.01.11` and `logs-7.10.0-2021.01.11` directly via raw PUTs to exercise the parser's handling of dot/version-prefixed index names. The raw path bypasses `TestUtils.createIndexByRestClient`, so the analytics-engine `tests.analytics.parquet_indices=true` toggle never injects the composite/parquet settings. On the analytics-engine route this surfaces as `Table 'logs-2021.01.11' not found` (the catalog-builder only exposes indices that match an analytics-engine- recognized data format), failing the test. Routing the creates through `TestUtils.createIndexByRestClient` makes both code paths honor the toggle: legacy / Calcite continues to create plain Lucene-backed indices, analytics-engine gets parquet-backed composite indices. The index names are still tested verbatim. Surfaced by `CalciteSearchCommandIT` on the analytics-engine route. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…ialIndexName
`OpenSearchSchemaBuilder.buildSchema` skips any index whose mapping has
no `properties` block — the analytics catalog then doesn't surface the
index, and the PPL parser fails with `Table 'logs-2021.01.11' not
found` before reaching the row type. Routing the special-index create
through `TestUtils.createIndexByRestClient` with `null` mapping
produces an empty `{}` body, so the index is created with no mapping
at all and the catalog still skips it.
Pass a minimal `{"mappings":{"properties":{}}}` body. OpenSearch
strips the empty `properties` to `{}` server-side but the schema
builder sees a non-null mapping and registers the (zero-column) table.
The query then parses cleanly and the test's `verifyDataRows()` (no
expected rows) succeeds against the empty index.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
cdac62e to
01a0aff
Compare
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
Closing this — after rebasing onto current The two correctness fixes are now redundant with merged work
The remaining test plumbing makes the score worse, not betterThe two
The pre-PR "pass" is a false green: under That fix belongs in the analytics engine / composite-engine ( |
What this PR does
Rebased onto current
main. The two original correctness commits were dropped — both are now redundant with merged work:Preserve datetime schema-column types in AnalyticsExecutionEngineDatetimeOutputCastRule/DatetimeOutputCastRewriter. Datetime root columns are no longer wrapped inCAST(... AS VARCHAR), soAnalyticsExecutionEnginereports the realtimestamp/date/timetype natively. There is no cast slot left to recover from.Convert byte[] cells to base64 for IP / BINARYIpType/BinaryTypeinAnalyticsExecutionEngine.toExprValueand renders IP cells in canonical dotted form — strictly better than the base64 fallback. Theunsupported object class [Bcrash is gone.What remains are two
integ-test-only commits fortestSearchCommandWithSpecialIndexName:logs-2021.01.11/logs-7.10.0-2021.01.11creates throughTestUtils.createIndexByRestClientso thetests.analytics.parquet_indicestoggle backs them with composite/parquet storage (the rawPUTs bypassed it).{"mappings":{"properties":{}}}body soOpenSearchSchemaBuilder.buildSchemasurfaces the otherwise mapping-less index in the analytics catalog.No
main-code changes.Effect on the analytics-engine route
These commits fix the SQL-side plumbing but do not flip the test green on the analytics route — they advance its failure throw site:
Table 'logs-2021.01.11' not found(the catalog never surfaced the index).IllegalStateException: Cannot apply function on indexer class org.opensearch.index.engine.DataFormatAwareEngine directly on IndexShard— the composite engine can't open an empty parquet shard. That's an analytics-engine-side concern, tracked separately.On the legacy / Calcite path the create still produces a plain Lucene index and the test passes exactly as before.
Pass rate —
CalciteSearchCommandITon the analytics route (-Dtests.analytics.parquet_indices=true)Run against current
main(which already carries #5454 / #21748 / #5463) on a force-routed parquet cluster:mainbaselineThe two commits don't change the pass rate — they correct SQL-side plumbing and shift one test's failure to the analytics-engine layer. The 25 failures are all the Lucene-secondary
query_stringfamily (numeric / date-range literal re-typing, compound-AND, wildcard vs NOT semantics), which is fixed upstream in the analytics engine, not in this repo.Check list
--signoff