Skip to content

Commit 602e1f0

Browse files
committed
Add search-command analytics-route status doc
Documents the 3 → 35 / 52 progression for `CalciteSearchCommandIT` on the analytics-engine route across this PR + opensearch-project/OpenSearch#21681, plus the well-bucketed remaining 17 failures and where each one's fix lands. Mirrors the existing per-command status docs for stats, top/rare, union, and appendcol. Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 0cef8c8 commit 602e1f0

1 file changed

Lines changed: 79 additions & 0 deletions

File tree

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# `search` command on the analytics-engine route — current status
2+
3+
Snapshot of `CalciteSearchCommandIT` against the analytics-engine path
4+
(`tests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true`
5+
on `:integ-test:analyticsCompatibilityTest`), as of 2026-05-15.
6+
7+
## Pass / fail summary
8+
9+
| Run | Pass | Fail | Notes |
10+
|---|---|---|---|
11+
| Baseline (current `main` + `feature/ppl-coverage-bundle` toggles, default cluster build) | 3 / 52 | 49 | Only `testSearchAllFields`, `testSearchCommandWithoutSearchKeyword`, `@Ignore`d test pass; everything else fails at `OpenSearchTableScanRule` ("No backend can scan all requested fields") or at runtime ("Failed to start streaming fragment") |
12+
| + OS-side: `IP/BINARY/MATCH_ONLY_TEXT` in `SUPPORTED_FIELD_TYPES` ([opensearch-project/OpenSearch#21681][pr-os]) | 4 | 48 | First test the visitor's native lowering unlocks: `testSearchCommandWithLogicalExpression` (BANK `firstname='Hattie'`) |
13+
| + OS-side: `date_nanos → TIMESTAMP` in `OpenSearchSchemaBuilder.mapFieldType` | 4 | 48 | No new passes but unblocks Calcite's TIMESTAMP type recognition — required for subsequent native lowerings of `@timestamp`-comparisons |
14+
| + OS-side: catch `Litmus.THROW` `AssertionError` in `DefaultPlanExecutor` | 4 | 48 | No new passes but stops the cluster JVM from dying on Calcite assertion paths (`severityNumber="not-a-number"` previously took down the cluster, killing 21 cascading tests) |
15+
| + OS-side: temp `ip` field skip in `OpenSearchSchemaBuilder.addLeafFields` | 24 | 28 | **+20 OTEL-logs tests** that don't touch the IP column — the BinaryView/Utf8 Substrait mismatch no longer blocks the full row-type schema |
16+
| + Test-infra: `composite.secondary_data_formats=[lucene]` on parquet-backed test indices | 35 | 17 | **+11 free-text / wildcard / time-modifier tests** — the Lucene secondary provides a reader for the `query_string(...)` fragments the visitor leaves in fallback form |
17+
| + SQL visitor v3 (NOT + wildcard value guards) | 35 | 17 | No new passes, but cleaner failure modes for `testWildcardPatternMatching` and `testDifferenceBetweenNOTAndNotEquals` (was Calcite-side regressions, now correct Lucene fallback) |
18+
| + OS-side: `ScalarFunction.TIMESTAMP` in `STANDARD_PROJECT_OPS` | 35 | 17 | Moves `testSearchWithDateFormats` one layer deeper (was "No backend supports", now "Substrait conversion error") |
19+
| + SQL test fix: minimal `{"properties":{}}` mapping in `testSearchCommandWithSpecialIndexName` | 35 | 17 | Empty `properties` still gets stripped server-side; placeholder field would change row count assertions. Real fix requires OS-side empty-shard handling |
20+
| **Final** | **35 / 52** | **17** ||
21+
22+
Reproduce:
23+
24+
```bash
25+
./gradlew :integ-test:analyticsCompatibilityTest \
26+
-Dtests.rest.cluster=localhost:9200 \
27+
-Dtests.cluster=localhost:9300 \
28+
-Dtests.clustername=runTask \
29+
--tests "org.opensearch.sql.calcite.remote.CalciteSearchCommandIT"
30+
```
31+
32+
(The `tests.analytics.*` system properties are set automatically by the
33+
`analyticsCompatibilityTest` task. `integTestRemote` in this repo
34+
still auto-bootstraps a per-task test cluster that has zero sandbox
35+
plugins, so `-Dtests.rest.cluster=...` would be silently ignored.)
36+
37+
## Remaining 17 failures, well-bucketed
38+
39+
| # | Bucket | Failures | Where the fix lands |
40+
|---|---|---|---|
41+
| 1 | Datetime output cast nanos & schema type | 7 | OS-side follow-up to [#21650](https://github.com/opensearch-project/OpenSearch/pull/21650). PR closed [opensearch-project/sql#5420](https://github.com/opensearch-project/sql/issues/5420) but the `to_char` format string is seconds-only (`'%Y-%m-%d %H:%M:%S'`) — loses nanoseconds — and TO_CHAR returns VARCHAR so the schema layer reports `@timestamp` as `string` not `timestamp`. Either widen the format string or keep a TIMESTAMP-typed wrapper that prints as string but advertises as timestamp |
42+
| 2 | Lucene-secondary edge cases | 5 | `testWildcardPatternMatching`, `testWildcardEscaping`, `testSearchWithIPAddress`, `testDifferenceBetweenNOTAndNotEquals`, `testSearchWithDateRangeComparisons`. My standalone probes against a fresh `composite.secondary_data_formats=[lucene]` index reproduce wildcard `*` / `?` matching correctly — the IT failures look fixture-specific (multi-shard merge timing? compound predicates routed independently?). Worth a dedicated Lucene-secondary investigation |
43+
| 3 | Substrait conversion for TIMESTAMP scalar | 1 | `testSearchWithDateFormats``Unable to convert call TIMESTAMP(string)`. The capability is now registered (after this PR cycle) so the planner accepts it, but the Substrait emission still fails. Needs an isthmus adapter mapping `TIMESTAMP(string) → to_timestamp(string)` |
44+
| 4 | IN expression type coercion (TIMESTAMP) | 1 | `testSearchWithDateINOperator``In expression types are incompatible: fields type TIMESTAMP, values type [STRING, STRING]`. Analyzer-side fix: coerce string literals in an IN list to the field type when the field is TIMESTAMP. Adjacent to the existing `Compare` coercion |
45+
| 5 | Calcite RelCompositeTrait cast on impossible-range fold | 1 | `testSearchWithImpossibleRange``severityNumber>30 AND severityNumber<5` simplifies to an empty rel; the simplifier produces a `RelCompositeTrait` that Calcite's sort rule then can't cast back to `RelCollation`. Calcite-library issue; could file upstream or wrap with a defensive fold |
46+
| 6 | Calcite NULL-comparison assertion | 1 | `testSearchWithTypeMismatch``severityNumber="not-a-number"` folds to `=(SAFE_CAST, null)` which trips `Comparison with NULL in pulledUpPredicates`. Now caught by the Litmus catch (cluster survives) but still surfaces as HTTP 500. Either suppress the assert or pre-fold the comparison to FALSE |
47+
| 7 | Special-index empty-mapping streaming | 1 | `testSearchCommandWithSpecialIndexName` — index created with `{"mappings":{"properties":{}}}` ends up with an empty composite shard, which the streaming runtime can't open. Either OS-side empty-shard handling (return empty stream) or convert the test to skip-on-empty |
48+
49+
## What this PR cycle ships
50+
51+
**[opensearch-project/OpenSearch#21681][pr-os] — analytics-engine search-command coverage fixes (4 commits):**
52+
- `[analytics-backend-datafusion] Declare scan support for IP / BINARY / MATCH_ONLY_TEXT` (original commit, unchanged)
53+
- `[analytics-engine] Map date_nanos to TIMESTAMP and skip ip columns in OpenSearchSchemaBuilder` — the date_nanos schema match + the temp ip-skip workaround until BinaryView ↔ Utf8 conversion lands
54+
- `[analytics-backend-datafusion] Register TIMESTAMP in STANDARD_PROJECT_OPS` — wires PPL `timestamp(expr)` through the project rule
55+
- `[analytics-engine] Catch Calcite Litmus.THROW AssertionError in DefaultPlanExecutor` — mirrors the SQL-plugin-side catch so neither layer can take down the cluster
56+
57+
**[opensearch-project/sql#5447][pr-sql] — search-command native lowering + test fixes (3 commits):**
58+
- `Inject parquet settings in testSearchCommandWithSpecialIndexName` (original commit)
59+
- `Lower structured search predicates to native RexCall in visitSearch` — the meat: walks the typed `SearchExpression` AST and emits native PPL filter AST for the structured fragments
60+
- `Pass minimal {"properties":{}} mapping in testSearchCommandWithSpecialIndexName` — refinement so the analytics catalog actually surfaces the special-named index
61+
62+
[pr-os]: https://github.com/opensearch-project/OpenSearch/pull/21681
63+
[pr-sql]: https://github.com/opensearch-project/sql/pull/5447
64+
65+
## Files to read once before adjacent work
66+
67+
- `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java``visitSearch` lowering entrypoint, `buildSearchFilter`, `lowerSearchExpression`, `isOpenSearchDateMath`, `containsLuceneWildcard`, `isLowerableField`
68+
- `core/src/main/java/org/opensearch/sql/ast/expression/Search{Expression,And,Or,Not,Group,Comparison,In,Literal}.java` — the typed AST the visitor consumes
69+
- `core/src/main/java/org/opensearch/sql/ast/tree/Search.java` — carries both the raw `queryString` (fallback) and the typed `originalExpression`
70+
- `OpenSearch/sandbox/libs/analytics-api/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java` — analytics-engine catalog builder; the per-field-type mapping that needs date_nanos / ip handling
71+
- `OpenSearch/sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionAnalyticsBackendPlugin.java` — capability registry the visitor's lowered RexCalls have to match
72+
73+
## Related per-command status docs
74+
75+
- [`stats-command-analytics-route-status.md`](stats-command-analytics-route-status.md)
76+
- [`top-rare-command-analytics-route-status.md`](top-rare-command-analytics-route-status.md)
77+
- [`appendcol-command-analytics-route-status.md`](appendcol-command-analytics-route-status.md)
78+
- [`union-command-analytics-route-status.md`](union-command-analytics-route-status.md)
79+
- [`ppl-analytics-engine-routing.md`](ppl-analytics-engine-routing.md) — playbook that produced this work

0 commit comments

Comments
 (0)