Skip to content

Commit 0f21e19

Browse files
committed
fix: address CI failures for java-regexp PR
Drop redundant interpolators in COMET_REGEXP_ENGINE doc string and remove the redundant CometConf self-import in CometStringExpressionSuite to satisfy scalafix. Switch existing rlike/regexp_replace tests to opt in via COMET_REGEXP_ENGINE=rust now that the engine selector is the gate for the Rust path, and reformat regex.md via prettier.
1 parent be487f1 commit 0f21e19

6 files changed

Lines changed: 14 additions & 15 deletions

File tree

docs/source/user-guide/latest/compatibility/regex.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ The JVM UDF framework is experimental and disabled by default. With pure default
3636

3737
## Choosing an engine
3838

39-
| | Rust engine | Java engine (experimental, default) |
40-
| -------------------- | --------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
41-
| **Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
42-
| **Feature coverage** | `rlike`, `regexp_replace`, `split` only | All regexp expressions (`rlike`, `regexp_extract`, `regexp_extract_all`, `regexp_instr`, `regexp_replace`, `split`) |
43-
| **Performance** | Fully native, no JNI overhead | One JNI round-trip per batch (Arrow vectors stay columnar) |
44-
| **Pattern support** | Linear-time subset only | All Java regex features (backreferences, lookaround, etc.) |
39+
| | Rust engine | Java engine (experimental, default) |
40+
| -------------------- | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
41+
| **Compatibility** | Differs from Java regex (see below) | 100% compatible with Spark |
42+
| **Feature coverage** | `rlike`, `regexp_replace`, `split` only | All regexp expressions (`rlike`, `regexp_extract`, `regexp_extract_all`, `regexp_instr`, `regexp_replace`, `split`) |
43+
| **Performance** | Fully native, no JNI overhead | One JNI round-trip per batch (Arrow vectors stay columnar) |
44+
| **Pattern support** | Linear-time subset only | All Java regex features (backreferences, lookaround, etc.) |
4545

4646
The **Rust engine** is faster but cannot match Java regex semantics for every pattern. Because the engine
4747
choice is itself the opt-in, setting `spark.comet.exec.regexp.engine=rust` declares acceptance of those

spark/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -400,15 +400,15 @@ object CometConf extends ShimCometConf {
400400
conf("spark.comet.exec.regexp.engine")
401401
.category(CATEGORY_EXEC)
402402
.doc(
403-
s"Selects the engine used to evaluate Spark regular-expression expressions. " +
403+
"Selects the engine used to evaluate Spark regular-expression expressions. " +
404404
s"`$REGEXP_ENGINE_JAVA` (default) routes through a JVM-side UDF " +
405-
s"(java.util.regex.Pattern) for Spark-compatible semantics, at the cost of JNI " +
405+
"(java.util.regex.Pattern) for Spark-compatible semantics, at the cost of JNI " +
406406
s"roundtrips per batch; this requires ${COMET_JVM_UDF_ENABLED.key}=true and " +
407407
s"falls back to Spark otherwise. `$REGEXP_ENGINE_RUST` runs the native DataFusion " +
408408
"regexp engine when an implementation exists; setting this is itself the opt-in " +
409409
"for the semantic differences between Java and Rust regex. Affected expressions: " +
410410
"rlike, regexp_extract, regexp_extract_all, regexp_replace, regexp_instr, and " +
411-
s"split (the extract/instr family has no native Rust path; they fall back to Spark " +
411+
"split (the extract/instr family has no native Rust path; they fall back to Spark " +
412412
s"under `$REGEXP_ENGINE_RUST`).")
413413
.stringConf
414414
.transform(_.toLowerCase(Locale.ROOT))

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -934,7 +934,7 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
934934
// add repetitive data to trigger dictionary encoding
935935
Range(0, 100).map(_ => "John Smith")
936936
withParquetFile(data.zipWithIndex, withDictionary) { file =>
937-
withSQLConf(CometConf.getExprAllowIncompatConfigKey("regexp") -> "true") {
937+
withSQLConf(CometConf.COMET_REGEXP_ENGINE.key -> CometConf.REGEXP_ENGINE_RUST) {
938938
spark.read.parquet(file).createOrReplaceTempView(table)
939939
val query = sql(s"select _2 as id, _1 rlike 'R[a-z]+s [Rr]ose' from $table")
940940
checkSparkAnswerAndOperator(query)
@@ -1006,7 +1006,7 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
10061006
// "Smith$",
10071007
"Smith\\Z",
10081008
"Smith\\z")
1009-
withSQLConf(CometConf.getExprAllowIncompatConfigKey("regexp") -> "true") {
1009+
withSQLConf(CometConf.COMET_REGEXP_ENGINE.key -> CometConf.REGEXP_ENGINE_RUST) {
10101010
patterns.foreach { pattern =>
10111011
val query2 = sql(s"select name, '$pattern', name rlike '$pattern' from $table")
10121012
checkSparkAnswerAndOperator(query2)
@@ -1066,7 +1066,7 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
10661066
"\\V")
10671067
val qualifiers = Seq("", "+", "*", "?", "{1,}")
10681068

1069-
withSQLConf(CometConf.getExprAllowIncompatConfigKey("regexp") -> "true") {
1069+
withSQLConf(CometConf.COMET_REGEXP_ENGINE.key -> CometConf.REGEXP_ENGINE_RUST) {
10701070
// testing every possible combination takes too long, so we pick some
10711071
// random combinations
10721072
for (_ <- 0 until 100) {

spark/src/test/scala/org/apache/comet/CometFuzzIcebergSuite.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ class CometFuzzIcebergSuite extends CometFuzzIcebergBase {
133133
}
134134

135135
test("regexp_replace") {
136-
withSQLConf(CometConf.getExprAllowIncompatConfigKey("regexp") -> "true") {
136+
withSQLConf(CometConf.COMET_REGEXP_ENGINE.key -> CometConf.REGEXP_ENGINE_RUST) {
137137
val df = spark.table(icebergTableName)
138138
// We want to make sure that the schema generator wasn't modified to accidentally omit
139139
// StringType, since then this test would not run any queries and silently pass.

spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ class CometFuzzTestSuite extends CometFuzzTestBase {
219219
}
220220

221221
test("regexp_replace") {
222-
withSQLConf(CometConf.getExprAllowIncompatConfigKey("regexp") -> "true") {
222+
withSQLConf(CometConf.COMET_REGEXP_ENGINE.key -> CometConf.REGEXP_ENGINE_RUST) {
223223
val df = spark.read.parquet(filename)
224224
df.createOrReplaceTempView("t1")
225225
// We want to make sure that the schema generator wasn't modified to accidentally omit

spark/src/test/scala/org/apache/comet/CometStringExpressionSuite.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ import org.apache.spark.sql.{CometTestBase, DataFrame}
2626
import org.apache.spark.sql.internal.SQLConf
2727
import org.apache.spark.sql.types.{DataTypes, StructField, StructType}
2828

29-
import org.apache.comet.CometConf
3029
import org.apache.comet.testing.{DataGenOptions, FuzzDataGenerator}
3130

3231
class CometStringExpressionSuite extends CometTestBase {

0 commit comments

Comments
 (0)