Skip to content

Commit 7be7783

Browse files
committed
docs: drop experimental language from regex compatibility guide
Remove the remaining experimental/disabled-by-default framing from regex.md so the Java engine reads as a normal, supported regex engine gated behind spark.comet.exec.scalaUDF.codegen.enabled.
1 parent 57c471c commit 7be7783

1 file changed

Lines changed: 7 additions & 8 deletions

File tree

  • docs/source/user-guide/latest/compatibility

docs/source/user-guide/latest/compatibility/regex.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ under the License.
2020
# Regular Expressions
2121

2222
Comet provides two regexp engines for evaluating regular expressions: a **Rust engine** that uses the Rust
23-
[`regex`] crate natively, and an experimental **Java engine** that runs Spark's own `doGenCode` for the
23+
[`regex`] crate natively, and a **Java engine** that runs Spark's own `doGenCode` for the
2424
expression inside Comet's Arrow-direct codegen dispatcher (the same dispatcher used by Comet's
2525
`ScalaUDF` codegen path). The engine is selected with `spark.comet.exec.regexp.engine`, which accepts:
2626

@@ -33,8 +33,8 @@ expression inside Comet's Arrow-direct codegen dispatcher (the same dispatcher u
3333
`regexp_extract_all`, `regexp_instr`) fall through to the Java engine so users still get Comet
3434
acceleration with full Spark semantics.
3535

36-
The codegen dispatcher is experimental and disabled by default. With pure defaults
37-
(`engine=java`, `scalaUDF.codegen.enabled=false`), all regex expressions fall back to Spark.
36+
With `engine=java` and `scalaUDF.codegen.enabled=true`, all regex expressions run on the Comet
37+
path with full Spark compatibility.
3838

3939
## Disabling Comet for individual regex expressions
4040

@@ -54,7 +54,7 @@ the engine selector:
5454

5555
## Choosing an engine
5656

57-
| | Rust engine | Java engine (experimental, default) |
57+
| | Rust engine | Java engine (default) |
5858
| -------------------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
5959
| **Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
6060
| **Feature coverage** | `rlike`, `regexp_replace`, `split` natively; `regexp_extract`, `regexp_extract_all`, `regexp_instr` via fallthrough | All regexp expressions (`rlike`, `regexp_extract`, `regexp_extract_all`, `regexp_instr`, `regexp_replace`, `split`) |
@@ -65,9 +65,8 @@ The **Rust engine** is faster but cannot match Java regex semantics for every pa
6565
choice is itself the opt-in, setting `spark.comet.exec.regexp.engine=rust` declares acceptance of those
6666
differences without a separate per-expression flag.
6767

68-
The **Java engine** is the default but the underlying codegen dispatcher is experimental and gated behind
69-
`spark.comet.exec.scalaUDF.codegen.enabled=true`; the behavior, configuration, and supported expressions
70-
may change in future releases.
68+
The **Java engine** is the default and is gated behind `spark.comet.exec.scalaUDF.codegen.enabled`
69+
so the codegen dispatcher can be disabled globally without changing the regex engine selector.
7170

7271
## Why the engines differ
7372

@@ -129,7 +128,7 @@ shape and want to avoid the JNI overhead of the Java engine, switching to the Ru
129128
`allowIncompatible=true` is generally safe.
130129

131130
For anything that uses backreferences, lookaround, or relies on Java's specific Unicode or line-handling
132-
defaults, use the experimental Java engine.
131+
defaults, use the Java engine.
133132

134133
[`java.util.regex`]: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
135134
[`regex`]: https://docs.rs/regex/latest/regex/

0 commit comments

Comments
 (0)