@@ -20,7 +20,7 @@ under the License.
2020# Regular Expressions
2121
2222Comet provides two regexp engines for evaluating regular expressions: a ** Rust engine** that uses the Rust
23- [ ` regex ` ] crate natively, and an experimental ** Java engine** that runs Spark's own ` doGenCode ` for the
23+ [ ` regex ` ] crate natively, and a ** Java engine** that runs Spark's own ` doGenCode ` for the
2424expression inside Comet's Arrow-direct codegen dispatcher (the same dispatcher used by Comet's
2525` ScalaUDF ` codegen path). The engine is selected with ` spark.comet.exec.regexp.engine ` , which accepts:
2626
@@ -33,8 +33,8 @@ expression inside Comet's Arrow-direct codegen dispatcher (the same dispatcher u
3333 ` regexp_extract_all ` , ` regexp_instr ` ) fall through to the Java engine so users still get Comet
3434 acceleration with full Spark semantics.
3535
36- The codegen dispatcher is experimental and disabled by default. With pure defaults
37- ( ` engine=java ` , ` scalaUDF.codegen.enabled=false ` ), all regex expressions fall back to Spark.
36+ With ` engine=java ` and ` scalaUDF.codegen.enabled=true ` , all regex expressions run on the Comet
37+ path with full Spark compatibility .
3838
3939## Disabling Comet for individual regex expressions
4040
@@ -54,7 +54,7 @@ the engine selector:
5454
5555## Choosing an engine
5656
57- | | Rust engine | Java engine (experimental, default) |
57+ | | Rust engine | Java engine (default) |
5858| -------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
5959| ** Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
6060| ** Feature coverage** | ` rlike ` , ` regexp_replace ` , ` split ` natively; ` regexp_extract ` , ` regexp_extract_all ` , ` regexp_instr ` via fallthrough | All regexp expressions (` rlike ` , ` regexp_extract ` , ` regexp_extract_all ` , ` regexp_instr ` , ` regexp_replace ` , ` split ` ) |
@@ -65,9 +65,8 @@ The **Rust engine** is faster but cannot match Java regex semantics for every pa
6565choice is itself the opt-in, setting ` spark.comet.exec.regexp.engine=rust ` declares acceptance of those
6666differences without a separate per-expression flag.
6767
68- The ** Java engine** is the default but the underlying codegen dispatcher is experimental and gated behind
69- ` spark.comet.exec.scalaUDF.codegen.enabled=true ` ; the behavior, configuration, and supported expressions
70- may change in future releases.
68+ The ** Java engine** is the default and is gated behind ` spark.comet.exec.scalaUDF.codegen.enabled `
69+ so the codegen dispatcher can be disabled globally without changing the regex engine selector.
7170
7271## Why the engines differ
7372
@@ -129,7 +128,7 @@ shape and want to avoid the JNI overhead of the Java engine, switching to the Ru
129128` allowIncompatible=true ` is generally safe.
130129
131130For anything that uses backreferences, lookaround, or relies on Java's specific Unicode or line-handling
132- defaults, use the experimental Java engine.
131+ defaults, use the Java engine.
133132
134133[ `java.util.regex` ] : https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
135134[ `regex` ] : https://docs.rs/regex/latest/regex/
0 commit comments