Commit f718b15
committed
[Analytics Engine] Carry array-typed cells through RowResponseCodec without JSON-stringifying
The row-oriented fragment-execution wire format (`FragmentExecutionResponse`,
used when arrow-flight streaming is disabled — every single-node test cluster
today) shipped each cell through OpenSearch's `writeGenericValue` /
`readGenericValue`, which preserves `List` values as `ArrayList<Object>`. On
the coordinator side, `RowResponseCodec.decode` then re-materialized the rows
into a `VectorSchemaRoot` for `Iterable<VectorSchemaRoot>`-style consumers.
Two bugs in that re-materialization were eating array values:
1. `inferArrowType` walked rows for the first non-null cell and matched
against {Long, Integer, …, CharSequence, byte[], Number}. {@code List}
wasn't in the chain, so it fell through to {@code break} and the
fallback {@link ArrowType.Utf8} — every array column became a VARCHAR
column.
2. `setVectorValue` for {@link VarCharVector} called {@code value.toString()}.
For a {@code JsonStringArrayList} that returns the JSON form
{@code "[2,3,4]"}, which then got serialized as a JSON string in the
final response. Tests like {@code testMvindexRangePositive} saw their
array result come back as a string `"[2,3,4]"` instead of an array
`[2, 3, 4]`.
Fix:
* Replace {@code inferArrowType} with {@code inferField} that returns a
full {@link Field}. For {@code List} cells, build a list field with the
inner element type inferred from the first non-null element (with a
fallback that scans later rows in case the first list is empty/all-null).
* Add a {@code ListVector} arm to {@code setVectorValue} that delegates to
a new {@code writeListValue}. The writer bypasses {@link UnionListWriter}
entirely — it writes directly to the list's offset / validity buffers and
to the inner data vector via the inner vector's typed `setSafe`. The
writer-based API requires per-element `ArrowBuf` allocations for varchar
elements that are easy to leak or use-after-free; the direct path is
simpler and avoids both classes of bug.
Plus a separate Arrow gotcha that surfaced once arrays started flowing
through correctly:
* {@code ListVector.getObject} for a {@code VarCharVector} child returns a
{@code JsonStringArrayList} whose elements are Arrow's {@link Text} class,
not Java {@link String}. {@code ExprValueUtils.fromObjectValue} doesn't
recognize {@code Text} and threw "unsupported object class
org.apache.arrow.vector.util.Text". {@code ArrowValues.toJavaValue} now
mirrors its top-level VarChar branch for list cells: when a list value
comes back from a {@code ListVector}, normalize each {@code Text} element
to a {@link String} before handing the list upward.
* Before: 12/60 (mvindex range tests still showed expected-vs-actual
diff because `[2,3,4]` came back as a JSON string, not an array).
* After: 26/60.
Newly passing:
testMvindexRangePositive, testMvindexRangeNegative, testMvindexRangeMixed,
testMvindexRangeFirstThree, testMvindexRangeLastThree,
testMvindexRangeSingleElement,
testMvdedupWithDuplicates, testMvdedupWithAllDuplicates,
testMvdedupWithNoDuplicates, testMvdedupWithStrings,
testArrayWithString,
testSplitWithSemicolonDelimiter, testSplitWithMultiCharDelimiter,
testSplitWithEmptyDelimiter.
Signed-off-by: Kai Huang <ahkcs@amazon.com>1 parent 27f1c99 commit f718b15
2 files changed
Lines changed: 131 additions & 20 deletions
File tree
- sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec
- stage
Lines changed: 18 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| 17 | + | |
| 18 | + | |
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| |||
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
38 | | - | |
39 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
40 | 55 | | |
41 | 56 | | |
42 | | - | |
| 57 | + | |
43 | 58 | | |
44 | 59 | | |
Lines changed: 113 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
60 | 63 | | |
61 | 64 | | |
62 | 65 | | |
63 | | - | |
64 | | - | |
| 66 | + | |
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| |||
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
89 | | - | |
90 | | - | |
91 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
92 | 102 | | |
93 | | - | |
| 103 | + | |
94 | 104 | | |
95 | 105 | | |
96 | 106 | | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
108 | 133 | | |
109 | 134 | | |
110 | 135 | | |
111 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
112 | 152 | | |
113 | 153 | | |
114 | 154 | | |
| |||
137 | 177 | | |
138 | 178 | | |
139 | 179 | | |
| 180 | + | |
| 181 | + | |
140 | 182 | | |
141 | 183 | | |
142 | 184 | | |
143 | 185 | | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
144 | 240 | | |
0 commit comments