@@ -30,19 +30,36 @@ expression inside Comet's Arrow-direct codegen dispatcher (the same dispatcher u
3030- ` rust ` — run the Rust engine when an expression has a native implementation. Setting this is itself
3131 the opt-in for the semantic differences between Java and Rust regex (no separate ` allowIncompatible `
3232 flag needed). Expressions without a native Rust implementation (` regexp_extract ` ,
33- ` regexp_extract_all ` , ` regexp_instr ` ) fall back to Spark.
33+ ` regexp_extract_all ` , ` regexp_instr ` ) fall through to the Java engine so users still get Comet
34+ acceleration with full Spark semantics.
3435
3536With pure defaults (` engine=java ` , ` scalaUDF.codegen.enabled=true ` ), all regex expressions run on
3637the Comet path with full Spark compatibility.
3738
39+ ## Disabling Comet for individual regex expressions
40+
41+ Each regex expression has a per-class ` spark.comet.expression.<ClassName>.enabled ` flag (default
42+ ` true ` ) that disables Comet's serde for that expression and forces a Spark fallback. This is
43+ useful for narrowing a regression or comparing performance on a single operator without changing
44+ the engine selector:
45+
46+ | Expression | Config |
47+ | ------------------- | -------------------------------------------------------- |
48+ | ` rlike ` | ` spark.comet.expression.RLike.enabled=false ` |
49+ | ` regexp_extract ` | ` spark.comet.expression.RegExpExtract.enabled=false ` |
50+ | ` regexp_extract_all ` | ` spark.comet.expression.RegExpExtractAll.enabled=false ` |
51+ | ` regexp_instr ` | ` spark.comet.expression.RegExpInStr.enabled=false ` |
52+ | ` regexp_replace ` | ` spark.comet.expression.RegExpReplace.enabled=false ` |
53+ | ` split ` | ` spark.comet.expression.StringSplit.enabled=false ` |
54+
3855## Choosing an engine
3956
40- | | Rust engine | Java engine (default) |
41- | -------------------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
42- | ** Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
43- | ** Feature coverage** | ` rlike ` , ` regexp_replace ` , ` split ` only | All regexp expressions (` rlike ` , ` regexp_extract ` , ` regexp_extract_all ` , ` regexp_instr ` , ` regexp_replace ` , ` split ` ) |
44- | ** Performance** | Fully native, no JNI overhead | One JNI round-trip per batch (Arrow vectors stay columnar) |
45- | ** Pattern support** | Linear-time subset only | All Java regex features (backreferences, lookaround, etc.) |
57+ | | Rust engine | Java engine (default) |
58+ | -------------------- | ----------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
59+ | ** Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
60+ | ** Feature coverage** | ` rlike ` , ` regexp_replace ` , ` split ` natively; ` regexp_extract ` , ` regexp_extract_all ` , ` regexp_instr ` via fallthrough | All regexp expressions (` rlike ` , ` regexp_extract ` , ` regexp_extract_all ` , ` regexp_instr ` , ` regexp_replace ` , ` split ` ) |
61+ | ** Performance** | Fully native, no JNI overhead | One JNI round-trip per batch (Arrow vectors stay columnar) |
62+ | ** Pattern support** | Linear-time subset only | All Java regex features (backreferences, lookaround, etc.) |
4663
4764The ** Rust engine** is faster but cannot match Java regex semantics for every pattern. Because the engine
4865choice is itself the opt-in, setting ` spark.comet.exec.regexp.engine=rust ` declares acceptance of those
0 commit comments