|
| 1 | +# `search` command on the analytics-engine route — current status |
| 2 | + |
| 3 | +Snapshot of `CalciteSearchCommandIT` against the analytics-engine path |
| 4 | +(`tests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true` |
| 5 | +on `:integ-test:analyticsCompatibilityTest`), as of 2026-05-15. |
| 6 | + |
| 7 | +## Pass / fail summary |
| 8 | + |
| 9 | +| Run | Pass | Fail | Notes | |
| 10 | +|---|---|---|---| |
| 11 | +| Baseline (current `main` + `feature/ppl-coverage-bundle` toggles, default cluster build) | 3 / 52 | 49 | Only `testSearchAllFields`, `testSearchCommandWithoutSearchKeyword`, `@Ignore`d test pass; everything else fails at `OpenSearchTableScanRule` ("No backend can scan all requested fields") or at runtime ("Failed to start streaming fragment") | |
| 12 | +| + OS-side: `IP/BINARY/MATCH_ONLY_TEXT` in `SUPPORTED_FIELD_TYPES` ([opensearch-project/OpenSearch#21681][pr-os]) | 4 | 48 | First test the visitor's native lowering unlocks: `testSearchCommandWithLogicalExpression` (BANK `firstname='Hattie'`) | |
| 13 | +| + OS-side: `date_nanos → TIMESTAMP` in `OpenSearchSchemaBuilder.mapFieldType` | 4 | 48 | No new passes but unblocks Calcite's TIMESTAMP type recognition — required for subsequent native lowerings of `@timestamp`-comparisons | |
| 14 | +| + OS-side: catch `Litmus.THROW` `AssertionError` in `DefaultPlanExecutor` | 4 | 48 | No new passes but stops the cluster JVM from dying on Calcite assertion paths (`severityNumber="not-a-number"` previously took down the cluster, killing 21 cascading tests) | |
| 15 | +| + OS-side: temp `ip` field skip in `OpenSearchSchemaBuilder.addLeafFields` | 24 | 28 | **+20 OTEL-logs tests** that don't touch the IP column — the BinaryView/Utf8 Substrait mismatch no longer blocks the full row-type schema | |
| 16 | +| + Test-infra: `composite.secondary_data_formats=[lucene]` on parquet-backed test indices | 35 | 17 | **+11 free-text / wildcard / time-modifier tests** — the Lucene secondary provides a reader for the `query_string(...)` fragments the visitor leaves in fallback form | |
| 17 | +| + SQL visitor v3 (NOT + wildcard value guards) | 35 | 17 | No new passes, but cleaner failure modes for `testWildcardPatternMatching` and `testDifferenceBetweenNOTAndNotEquals` (was Calcite-side regressions, now correct Lucene fallback) | |
| 18 | +| + OS-side: `ScalarFunction.TIMESTAMP` in `STANDARD_PROJECT_OPS` | 35 | 17 | Moves `testSearchWithDateFormats` one layer deeper (was "No backend supports", now "Substrait conversion error") | |
| 19 | +| + SQL test fix: minimal `{"properties":{}}` mapping in `testSearchCommandWithSpecialIndexName` | 35 | 17 | Empty `properties` still gets stripped server-side; placeholder field would change row count assertions. Real fix requires OS-side empty-shard handling | |
| 20 | +| **Final** | **35 / 52** | **17** | — | |
| 21 | + |
| 22 | +Reproduce: |
| 23 | + |
| 24 | +```bash |
| 25 | +./gradlew :integ-test:analyticsCompatibilityTest \ |
| 26 | + -Dtests.rest.cluster=localhost:9200 \ |
| 27 | + -Dtests.cluster=localhost:9300 \ |
| 28 | + -Dtests.clustername=runTask \ |
| 29 | + --tests "org.opensearch.sql.calcite.remote.CalciteSearchCommandIT" |
| 30 | +``` |
| 31 | + |
| 32 | +(The `tests.analytics.*` system properties are set automatically by the |
| 33 | +`analyticsCompatibilityTest` task. `integTestRemote` in this repo |
| 34 | +still auto-bootstraps a per-task test cluster that has zero sandbox |
| 35 | +plugins, so `-Dtests.rest.cluster=...` would be silently ignored.) |
| 36 | + |
| 37 | +## Remaining 17 failures, well-bucketed |
| 38 | + |
| 39 | +| # | Bucket | Failures | Where the fix lands | |
| 40 | +|---|---|---|---| |
| 41 | +| 1 | Datetime output cast nanos & schema type | 7 | OS-side follow-up to [#21650](https://github.com/opensearch-project/OpenSearch/pull/21650). PR closed [opensearch-project/sql#5420](https://github.com/opensearch-project/sql/issues/5420) but the `to_char` format string is seconds-only (`'%Y-%m-%d %H:%M:%S'`) — loses nanoseconds — and TO_CHAR returns VARCHAR so the schema layer reports `@timestamp` as `string` not `timestamp`. Either widen the format string or keep a TIMESTAMP-typed wrapper that prints as string but advertises as timestamp | |
| 42 | +| 2 | Lucene-secondary edge cases | 5 | `testWildcardPatternMatching`, `testWildcardEscaping`, `testSearchWithIPAddress`, `testDifferenceBetweenNOTAndNotEquals`, `testSearchWithDateRangeComparisons`. My standalone probes against a fresh `composite.secondary_data_formats=[lucene]` index reproduce wildcard `*` / `?` matching correctly — the IT failures look fixture-specific (multi-shard merge timing? compound predicates routed independently?). Worth a dedicated Lucene-secondary investigation | |
| 43 | +| 3 | Substrait conversion for TIMESTAMP scalar | 1 | `testSearchWithDateFormats` — `Unable to convert call TIMESTAMP(string)`. The capability is now registered (after this PR cycle) so the planner accepts it, but the Substrait emission still fails. Needs an isthmus adapter mapping `TIMESTAMP(string) → to_timestamp(string)` | |
| 44 | +| 4 | IN expression type coercion (TIMESTAMP) | 1 | `testSearchWithDateINOperator` — `In expression types are incompatible: fields type TIMESTAMP, values type [STRING, STRING]`. Analyzer-side fix: coerce string literals in an IN list to the field type when the field is TIMESTAMP. Adjacent to the existing `Compare` coercion | |
| 45 | +| 5 | Calcite RelCompositeTrait cast on impossible-range fold | 1 | `testSearchWithImpossibleRange` — `severityNumber>30 AND severityNumber<5` simplifies to an empty rel; the simplifier produces a `RelCompositeTrait` that Calcite's sort rule then can't cast back to `RelCollation`. Calcite-library issue; could file upstream or wrap with a defensive fold | |
| 46 | +| 6 | Calcite NULL-comparison assertion | 1 | `testSearchWithTypeMismatch` — `severityNumber="not-a-number"` folds to `=(SAFE_CAST, null)` which trips `Comparison with NULL in pulledUpPredicates`. Now caught by the Litmus catch (cluster survives) but still surfaces as HTTP 500. Either suppress the assert or pre-fold the comparison to FALSE | |
| 47 | +| 7 | Special-index empty-mapping streaming | 1 | `testSearchCommandWithSpecialIndexName` — index created with `{"mappings":{"properties":{}}}` ends up with an empty composite shard, which the streaming runtime can't open. Either OS-side empty-shard handling (return empty stream) or convert the test to skip-on-empty | |
| 48 | + |
| 49 | +## What this PR cycle ships |
| 50 | + |
| 51 | +**[opensearch-project/OpenSearch#21681][pr-os] — analytics-engine search-command coverage fixes (4 commits):** |
| 52 | +- `[analytics-backend-datafusion] Declare scan support for IP / BINARY / MATCH_ONLY_TEXT` (original commit, unchanged) |
| 53 | +- `[analytics-engine] Map date_nanos to TIMESTAMP and skip ip columns in OpenSearchSchemaBuilder` — the date_nanos schema match + the temp ip-skip workaround until BinaryView ↔ Utf8 conversion lands |
| 54 | +- `[analytics-backend-datafusion] Register TIMESTAMP in STANDARD_PROJECT_OPS` — wires PPL `timestamp(expr)` through the project rule |
| 55 | +- `[analytics-engine] Catch Calcite Litmus.THROW AssertionError in DefaultPlanExecutor` — mirrors the SQL-plugin-side catch so neither layer can take down the cluster |
| 56 | + |
| 57 | +**[opensearch-project/sql#5447][pr-sql] — search-command native lowering + test fixes (3 commits):** |
| 58 | +- `Inject parquet settings in testSearchCommandWithSpecialIndexName` (original commit) |
| 59 | +- `Lower structured search predicates to native RexCall in visitSearch` — the meat: walks the typed `SearchExpression` AST and emits native PPL filter AST for the structured fragments |
| 60 | +- `Pass minimal {"properties":{}} mapping in testSearchCommandWithSpecialIndexName` — refinement so the analytics catalog actually surfaces the special-named index |
| 61 | + |
| 62 | +[pr-os]: https://github.com/opensearch-project/OpenSearch/pull/21681 |
| 63 | +[pr-sql]: https://github.com/opensearch-project/sql/pull/5447 |
| 64 | + |
| 65 | +## Files to read once before adjacent work |
| 66 | + |
| 67 | +- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` — `visitSearch` lowering entrypoint, `buildSearchFilter`, `lowerSearchExpression`, `isOpenSearchDateMath`, `containsLuceneWildcard`, `isLowerableField` |
| 68 | +- `core/src/main/java/org/opensearch/sql/ast/expression/Search{Expression,And,Or,Not,Group,Comparison,In,Literal}.java` — the typed AST the visitor consumes |
| 69 | +- `core/src/main/java/org/opensearch/sql/ast/tree/Search.java` — carries both the raw `queryString` (fallback) and the typed `originalExpression` |
| 70 | +- `OpenSearch/sandbox/libs/analytics-api/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java` — analytics-engine catalog builder; the per-field-type mapping that needs date_nanos / ip handling |
| 71 | +- `OpenSearch/sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionAnalyticsBackendPlugin.java` — capability registry the visitor's lowered RexCalls have to match |
| 72 | + |
| 73 | +## Related per-command status docs |
| 74 | + |
| 75 | +- [`stats-command-analytics-route-status.md`](stats-command-analytics-route-status.md) |
| 76 | +- [`top-rare-command-analytics-route-status.md`](top-rare-command-analytics-route-status.md) |
| 77 | +- [`appendcol-command-analytics-route-status.md`](appendcol-command-analytics-route-status.md) |
| 78 | +- [`union-command-analytics-route-status.md`](union-command-analytics-route-status.md) |
| 79 | +- [`ppl-analytics-engine-routing.md`](ppl-analytics-engine-routing.md) — playbook that produced this work |
0 commit comments