Commit 5dd98b5
committed
Enable PPL parse command on the analytics-engine route
Wire PPL `parse <field> '<regex>'` through PPL β Calcite β Substrait β DataFusion.
The command lowers to one `ITEM(PARSE(input, regex, "regex"), '<group>')` per named
group; PARSE returns `map<utf8, utf8>` of all named groups, ITEM extracts each
value. Both UDFs sit on the Rust side of the analytics backend.
Highlights:
* New Rust UDFs `parse` and `item` (`sandbox/plugins/analytics-backend-datafusion/
rust/src/udf/{parse,item}.rs`). `parse` anchors the user pattern with `^(?:β¦)$`
to honour Java's `Matcher.matches()` semantic that legacy `RegexExpression`
relies on, so a row that doesn't match consumes nothing and every named group
yields `""` β same observable behaviour as the legacy path.
* `ParseAdapter` validates the pattern + method operands as non-null string
literals at plan time and gates the method to `regex` (rejects `grok` /
`patterns` with a clear error pointing users at the legacy engine until those
methods land).
* `FieldType.MAP`, `ScalarFunction.PARSE`, `ScalarFunction.ITEM`,
`STANDARD_PROJECT_OPS` (ITEM is added; PARSE is registered separately for
`FieldType.MAP` because no real OS mapping is a map and we don't want every
scalar registering against the MAP bucket), `FunctionMappings.s` entries for
`parse` and `item`, and YAML extension declarations.
* Codec MapVector handling at three sites (`ArrowValues`, `DatafusionResultStream`,
`RowResponseCodec`) β `MapVector.getObject()` builds a `JsonStringHashMap`
whose `<clinit>` references jackson-datatype-jsr310 not on the
arrow-flight-rpc parent plugin's classloader, so each site reads the
offset buffer + key/value sub-vectors directly.
* `session_context::create_session_context` now calls `udf::register_all`. The
`executeWithContextAsync` fragment path was the only SessionContext creator
that wasn't registering OpenSearch UDFs, so any analytics query through that
path (the production fragment route) failed with "Unsupported function name".
Pre-existing UDFs (`convert_tz`, `to_unixtime`) shared this gap silently
because no IT exercised them through the same path.
`grok` and `patterns` parse methods are deliberately left on the legacy engine.
The Rust UDF rejects them with an explicit message; future onboardings will be
deliberate flips rather than silent semantics changes.
Verified end-to-end via `CalciteParseCommandIT` under
`tests.analytics.force_routing=true`: 7/7 passing (was 4/7 before β only the
testParseError* set passed, which throws at AST builder time before reaching
the analytics planner). The +3 delta covers `testParseCommand`,
`testParseCommandReplaceOriginalField`, and `testParseCommandWithOtherRunTimeFields`.
Sandbox QA `ParseCommandIT` (8/8) covers the same code paths against the
analytics path directly without depending on the SQL plugin worktree.
Signed-off-by: Jialiang Liang <jiallian@amazon.com>1 parent 6aa070b commit 5dd98b5
15 files changed
Lines changed: 1432 additions & 8 deletions
File tree
- sandbox
- libs/analytics-framework/src/main/java/org/opensearch/analytics/spi
- plugins
- analytics-backend-datafusion
- rust
- src
- udf
- src
- main
- java/org/opensearch/be/datafusion
- resources
- test/java/org/opensearch/be/datafusion
- analytics-engine/src/main/java/org/opensearch/analytics/exec
- qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa
Lines changed: 11 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
| |||
64 | 66 | | |
65 | 67 | | |
66 | 68 | | |
67 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
68 | 77 | | |
69 | 78 | | |
70 | 79 | | |
| |||
127 | 136 | | |
128 | 137 | | |
129 | 138 | | |
| 139 | + | |
130 | 140 | | |
131 | 141 | | |
132 | 142 | | |
| |||
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
92 | 101 | | |
93 | 102 | | |
94 | 103 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
71 | | - | |
| 71 | + | |
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| |||
Lines changed: 5 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
123 | 124 | | |
124 | 125 | | |
125 | 126 | | |
| |||
0 commit comments