Skip to content

Commit c4d4a8c

Browse files
committed
spath: Arrow Map response marshalling + parquet-backed test indices
Two complementary changes for closing PPL `spath` parity on the analytics-engine route. Pairs with opensearch-project/OpenSearch#21664 (the analytics-engine `json_extract_all` UDF + ITEM-on-MAP dispatch + MAP capability registrations). Both PRs together take `CalcitePPLSpathCommandIT` on the analytics-engine route from 0 / 16 to 16 / 16 without regressing the v2 / Calcite path (still 16 / 16). ## Pass rate | IT | Route | Before | After | |---|---|---|---| | `CalcitePPLSpathCommandIT` | analytics-engine (`-Dtests.analytics.force_routing=true -Dtests.analytics.parquet_indices=true`) | 0 / 16 | **16 / 16** | | `CalcitePPLSpathCommandIT` | default v2 / Calcite (no flags) | 16 / 16 | 16 / 16 (no regression) | The analytics-route number depends on this PR + #21664 landing together; neither PR alone moves it off 0 / 16. ## Changes 1. **`core/.../ExprValueUtils.java`** — `fromObjectValue` gains a FQN-keyed branch for `org.apache.arrow.vector.util.Text` (decoded via `toString()`). Arrow's MapVector / StructVector emit values as `Text` (a UTF-8 byte-buffer wrapper that does NOT implement `CharSequence`), which none of the typed `instanceof` branches recognized. Without this, any UDF returning `Map<Utf8, Utf8>` through the analytics-engine route surfaces as `ExpressionEvaluationException: unsupported object class org.apache.arrow.vector.util.Text`. FQN match keeps `core/` free of an Arrow dependency. 2. **`integ-test/.../CalcitePPLSpathCommandIT.java`** — refactor `init()` to use `TestUtils.createIndexByRestClient` with an explicit keyword mapping for each of the four test indices (`test_spath`, `test_spath_auto`, `test_spath_cmd`, `test_spath_null`) before the per-doc PUTs. Without an explicit createIndex, the dynamic-mapping route bypasses the `tests.analytics.parquet_indices=true` parquet injection (because the toggle only fires inside `TestUtils.createIndexByRestClient`) and the analytics-engine fragment driver then fails with `UnsupportedOperationException: acquireReader is not supported in EngineBackedIndexer` for any test that reaches the runtime — `testSimpleSpath` was the only test affected before this PR (the other 15 failed earlier at the planner capability check). Idempotency via `TestUtils.isIndexExist` so the cluster-reuse pattern between `@Test` methods keeps working. No change for the v2 / Calcite path (the helper is a no-op for non-parquet runs). Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 1efb6c3 commit c4d4a8c

2 files changed

Lines changed: 79 additions & 32 deletions

File tree

core/src/main/java/org/opensearch/sql/data/model/ExprValueUtils.java

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,17 @@ public static ExprValue nullValue() {
111111
return ExprNullValue.of();
112112
}
113113

114+
private static final String ARROW_TEXT_CLASS_NAME = "org.apache.arrow.vector.util.Text";
115+
116+
/**
117+
* Whether {@code o} is an Arrow {@code Text} (the UTF-8 byte-buffer wrapper that arrow's Map /
118+
* Struct / List vectors emit for string values). FQN match keeps {@code core/} free of an Arrow
119+
* dependency.
120+
*/
121+
private static boolean isArrowText(Object o) {
122+
return o != null && ARROW_TEXT_CLASS_NAME.equals(o.getClass().getName());
123+
}
124+
114125
/** Construct ExprValue from Object. */
115126
public static ExprValue fromObjectValue(Object o) {
116127
if (null == o) {
@@ -143,6 +154,18 @@ public static ExprValue fromObjectValue(Object o) {
143154
return new ExprDoubleValue(d);
144155
} else if (o instanceof String) {
145156
return stringValue((String) o);
157+
} else if (isArrowText(o)) {
158+
// Arrow MapVector / StructVector yields values as
159+
// `org.apache.arrow.vector.util.Text` — a UTF-8 byte-buffer wrapper that
160+
// does NOT implement CharSequence and therefore wouldn't match any of the
161+
// typed branches above. `Text.toString()` decodes to a real Java String.
162+
// Matched by FQN rather than instanceof so `core/` doesn't acquire an
163+
// Arrow dependency for one type-system bridge. Without this branch the
164+
// analytics-engine route surfaces `ExpressionEvaluationException:
165+
// unsupported object class org.apache.arrow.vector.util.Text` from any
166+
// UDF returning Map<Utf8, Utf8> (first such UDF is `json_extract_all`
167+
// powering PPL `spath`).
168+
return stringValue(o.toString());
146169
} else if (o instanceof Float f) {
147170
if (!Float.isFinite(f)) {
148171
return LITERAL_NULL;

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java

Lines changed: 56 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,17 @@
1515
import org.json.JSONObject;
1616
import org.junit.jupiter.api.Test;
1717
import org.opensearch.client.Request;
18+
import org.opensearch.sql.legacy.TestUtils;
1819
import org.opensearch.sql.ppl.PPLIntegTestCase;
1920

2021
public class CalcitePPLSpathCommandIT extends PPLIntegTestCase {
22+
// Pre-create each test index through TestUtils.createIndexByRestClient so the
23+
// analytics-engine compatibility run (tests.analytics.parquet_indices=true)
24+
// provisions them as parquet-backed composite stores. Raw `PUT /<idx>/_doc/N`
25+
// bypasses the helper, yielding a Lucene-only index that DataFusion cannot
26+
// acquireReader on (`UnsupportedOperationException: acquireReader is not
27+
// supported in EngineBackedIndexer`). Mapping passed as null — dynamic
28+
// mapping infers the doc fields from the subsequent PUTs.
2129
@Override
2230
public void init() throws Exception {
2331
super.init();
@@ -26,48 +34,64 @@ public void init() throws Exception {
2634
loadIndex(Index.BANK);
2735

2836
// Simple JSON docs for path-based extraction
29-
Request request1 = new Request("PUT", "/test_spath/_doc/1?refresh=true");
30-
request1.setJsonEntity("{\"doc\": \"{\\\"n\\\": 1}\"}");
31-
client().performRequest(request1);
37+
if (!TestUtils.isIndexExist(client(), "test_spath")) {
38+
TestUtils.createIndexByRestClient(client(), "test_spath", null);
3239

33-
Request request2 = new Request("PUT", "/test_spath/_doc/2?refresh=true");
34-
request2.setJsonEntity("{\"doc\": \"{\\\"n\\\": 2}\"}");
35-
client().performRequest(request2);
40+
Request request1 = new Request("PUT", "/test_spath/_doc/1?refresh=true");
41+
request1.setJsonEntity("{\"doc\": \"{\\\"n\\\": 1}\"}");
42+
client().performRequest(request1);
3643

37-
Request request3 = new Request("PUT", "/test_spath/_doc/3?refresh=true");
38-
request3.setJsonEntity("{\"doc\": \"{\\\"n\\\": 3}\"}");
39-
client().performRequest(request3);
44+
Request request2 = new Request("PUT", "/test_spath/_doc/2?refresh=true");
45+
request2.setJsonEntity("{\"doc\": \"{\\\"n\\\": 2}\"}");
46+
client().performRequest(request2);
47+
48+
Request request3 = new Request("PUT", "/test_spath/_doc/3?refresh=true");
49+
request3.setJsonEntity("{\"doc\": \"{\\\"n\\\": 3}\"}");
50+
client().performRequest(request3);
51+
}
4052

4153
// Auto-extract mode: flatten rules and edge cases (empty, malformed)
42-
Request autoExtractDoc = new Request("PUT", "/test_spath_auto/_doc/1?refresh=true");
43-
autoExtractDoc.setJsonEntity(
44-
"{\"nested_doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"John\\\"}}\","
45-
+ " \"array_doc\": \"{\\\"tags\\\":[\\\"java\\\",\\\"sql\\\"]}\","
46-
+ " \"merge_doc\": \"{\\\"a\\\":{\\\"b\\\":1},\\\"a.b\\\":2}\","
47-
+ " \"stringify_doc\": \"{\\\"n\\\":30,\\\"b\\\":true,\\\"x\\\":null}\","
48-
+ " \"empty_doc\": \"{}\","
49-
+ " \"malformed_doc\": \"{\\\"user\\\":{\\\"name\\\":\"}");
50-
client().performRequest(autoExtractDoc);
54+
if (!TestUtils.isIndexExist(client(), "test_spath_auto")) {
55+
TestUtils.createIndexByRestClient(client(), "test_spath_auto", null);
56+
57+
Request autoExtractDoc = new Request("PUT", "/test_spath_auto/_doc/1?refresh=true");
58+
autoExtractDoc.setJsonEntity(
59+
"{\"nested_doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"John\\\"}}\","
60+
+ " \"array_doc\": \"{\\\"tags\\\":[\\\"java\\\",\\\"sql\\\"]}\","
61+
+ " \"merge_doc\": \"{\\\"a\\\":{\\\"b\\\":1},\\\"a.b\\\":2}\","
62+
+ " \"stringify_doc\": \"{\\\"n\\\":30,\\\"b\\\":true,\\\"x\\\":null}\","
63+
+ " \"empty_doc\": \"{}\","
64+
+ " \"malformed_doc\": \"{\\\"user\\\":{\\\"name\\\":\"}");
65+
client().performRequest(autoExtractDoc);
66+
}
5167

5268
// Auto-extract mode: 2-doc index for spath + command (eval/where/stats/sort) tests
53-
Request cmdDoc1 = new Request("PUT", "/test_spath_cmd/_doc/1?refresh=true");
54-
cmdDoc1.setJsonEntity(
55-
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"John\\\",\\\"age\\\":30}}\"}");
56-
client().performRequest(cmdDoc1);
69+
if (!TestUtils.isIndexExist(client(), "test_spath_cmd")) {
70+
TestUtils.createIndexByRestClient(client(), "test_spath_cmd", null);
71+
72+
Request cmdDoc1 = new Request("PUT", "/test_spath_cmd/_doc/1?refresh=true");
73+
cmdDoc1.setJsonEntity(
74+
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"John\\\",\\\"age\\\":30}}\"}");
75+
client().performRequest(cmdDoc1);
5776

58-
Request cmdDoc2 = new Request("PUT", "/test_spath_cmd/_doc/2?refresh=true");
59-
cmdDoc2.setJsonEntity(
60-
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"Alice\\\",\\\"age\\\":25}}\"}");
61-
client().performRequest(cmdDoc2);
77+
Request cmdDoc2 = new Request("PUT", "/test_spath_cmd/_doc/2?refresh=true");
78+
cmdDoc2.setJsonEntity(
79+
"{\"doc\": \"{\\\"user\\\":{\\\"name\\\":\\\"Alice\\\",\\\"age\\\":25}}\"}");
80+
client().performRequest(cmdDoc2);
81+
}
6282

6383
// Auto-extract mode: null input handling (doc 1 establishes mapping, doc 2 has null)
64-
Request nullDoc1 = new Request("PUT", "/test_spath_null/_doc/1?refresh=true");
65-
nullDoc1.setJsonEntity("{\"doc\": \"{\\\"n\\\": 1}\"}");
66-
client().performRequest(nullDoc1);
84+
if (!TestUtils.isIndexExist(client(), "test_spath_null")) {
85+
TestUtils.createIndexByRestClient(client(), "test_spath_null", null);
86+
87+
Request nullDoc1 = new Request("PUT", "/test_spath_null/_doc/1?refresh=true");
88+
nullDoc1.setJsonEntity("{\"doc\": \"{\\\"n\\\": 1}\"}");
89+
client().performRequest(nullDoc1);
6790

68-
Request nullDoc2 = new Request("PUT", "/test_spath_null/_doc/2?refresh=true");
69-
nullDoc2.setJsonEntity("{\"doc\": null}");
70-
client().performRequest(nullDoc2);
91+
Request nullDoc2 = new Request("PUT", "/test_spath_null/_doc/2?refresh=true");
92+
nullDoc2.setJsonEntity("{\"doc\": null}");
93+
client().performRequest(nullDoc2);
94+
}
7195
}
7296

7397
@Test

0 commit comments

Comments
 (0)