Commit bec171e
committed
refactor: route regex expressions through codegen dispatcher instead of hand-written UDFs
Replace the six hand-written `RegExp*UDF` / `StringSplitUDF` JVM UDF
implementations with the Arrow-direct codegen dispatcher introduced in
PR apache#4417 (`CometScalaUDF.emitJvmCodegenDispatch`). The dispatcher
Janino-compiles Spark's own `doGenCode` for the expression, so the
regex family inherits Spark-identical semantics with no per-expression
glue code.
Changes:
- Delete `spark/src/main/scala/org/apache/comet/udf/RegExp*UDF.scala`
and `StringSplitUDF.scala`. Their behavior is now provided by
Spark's `doGenCode` running inside the dispatcher.
- Rewrite the regex serdes in `strings.scala`. Expressions with no
native Rust path (`RegExpExtract`, `RegExpExtractAll`, `RegExpInStr`)
share a new `CometRegexpCodegenOnly` base; expressions with a native
path (`RLike`, `RegExpReplace`, `StringSplit`) keep an explicit
route table where the JVM arm now delegates to
`CometScalaUDF.emitJvmCodegenDispatch`.
- Drop the `spark.comet.jvmUdf.enabled` config. The codegen dispatcher
already has its own master switch
(`spark.comet.exec.scalaUDF.codegen.enabled`); gating the regex
family on the same flag avoids two flags for the same path.
`spark.comet.exec.regexp.engine` keeps the `java`/`rust` selector
semantics, and `engine=java` now requires the codegen flag.
- Revert the native Rust additions in `jvm_udf/mod.rs` and
`jni-bridge/src/lib.rs`. The codegen dispatcher constructs Arrow
output fields JVM-side via `CometBatchKernelCodegenOutput.toFfiArrowField`,
so the list-vector field-name normalization cast is unnecessary.
- Update `CometRegExpJvmSuite`, `CometRegExpBenchmark`, the regex SQL
test fixtures, and the regex compatibility doc to reflect the new
gating.
Test plan:
- `CometRegExpJvmSuite`: 45/45 pass (covers all six regex expressions
through the codegen dispatcher).
- `CometSqlFileTestSuite`: 289/289 pass.
- `CometStringExpressionSuite`: 33/33 pass.
- `CometCodegenSuite`: 60/60 pass.
- `cargo clippy --all-targets --workspace -- -D warnings`: clean.1 parent 29428e5 commit bec171e
19 files changed
Lines changed: 121 additions & 952 deletions
File tree
- docs/source/user-guide/latest/compatibility
- native
- jni-bridge/src
- spark-expr/src/jvm_udf
- spark/src
- main/scala/org/apache/comet
- serde
- udf
- test
- resources/sql-tests/expressions/string
- scala/org/apache
- comet
- spark/sql/benchmark
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
28 | | - | |
| 28 | + | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | | - | |
35 | | - | |
| 35 | + | |
| 36 | + | |
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| |||
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
50 | | - | |
51 | | - | |
52 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
234 | | - | |
235 | | - | |
| 234 | + | |
236 | 235 | | |
237 | 236 | | |
238 | 237 | | |
| |||
305 | 304 | | |
306 | 305 | | |
307 | 306 | | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | 307 | | |
312 | 308 | | |
313 | 309 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
182 | | - | |
183 | | - | |
| 182 | + | |
184 | 183 | | |
185 | 184 | | |
186 | 185 | | |
| |||
244 | 243 | | |
245 | 244 | | |
246 | 245 | | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
| 246 | + | |
260 | 247 | | |
261 | 248 | | |
262 | 249 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
362 | 362 | | |
363 | 363 | | |
364 | 364 | | |
365 | | - | |
366 | | - | |
| 365 | + | |
| 366 | + | |
367 | 367 | | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
374 | 374 | | |
375 | 375 | | |
376 | 376 | | |
| |||
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
385 | | - | |
386 | | - | |
387 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
410 | | - | |
411 | 399 | | |
412 | 400 | | |
413 | 401 | | |
| |||
0 commit comments