Skip to content

Commit 87fbb79

Browse files
committed
feat: enable JVM Scala UDF codegen dispatch by default
Flip `spark.comet.exec.scalaUDF.codegen.enabled` to default `true` so that eligible Spark `ScalaUDF` expressions are routed through Comet's Arrow-direct codegen dispatcher without requiring opt-in. The feature is no longer marked experimental. Update the Scala/Java UDF and Iceberg user guides to reflect that the dispatcher is on by default and document how to disable it.
1 parent 053080b commit 87fbb79

3 files changed

Lines changed: 10 additions & 10 deletions

File tree

docs/source/user-guide/latest/iceberg.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -157,12 +157,12 @@ Iceberg ships several `ScalaUDF`s that surface in user queries and maintenance a
157157
(`INT_ORDERED_BYTES`, `LONG_ORDERED_BYTES`, ..., `INTERLEAVE_BYTES`) over the sort key columns
158158
during compaction.
159159

160-
By default these UDFs cause the enclosing operator to fall back to Spark, which forces a
161-
columnar-to-row roundtrip and demotes the surrounding shuffle from `CometExchange` to
162-
`CometColumnarExchange`. Enabling the experimental
163-
[Scala UDF and Java UDF Support](scala_java_udfs.md) feature
164-
(`spark.comet.exec.scalaUDF.codegen.enabled=true`) routes these UDFs through native execution so
165-
the project, exchange, and sort operators around them stay on the Comet path end-to-end.
160+
[Scala UDF and Java UDF Support](scala_java_udfs.md) is enabled by default
161+
(`spark.comet.exec.scalaUDF.codegen.enabled=true`), so these UDFs run through native execution and
162+
the project, exchange, and sort operators around them stay on the Comet path end-to-end. Setting
163+
`spark.comet.exec.scalaUDF.codegen.enabled=false` causes the enclosing operator to fall back to
164+
Spark, which forces a columnar-to-row roundtrip and demotes the surrounding shuffle from
165+
`CometExchange` to `CometColumnarExchange`.
166166

167167
### Task input metrics
168168

docs/source/user-guide/latest/scala_java_udfs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,13 @@ Comet executes Spark's Scala and Java [scalar user-defined functions (UDFs)](htt
2323

2424
This page covers Spark's `ScalaUDF` (Scala `udf(...)`, `spark.udf.register(...)` over Scala or Java functional interfaces, and SQL `CREATE FUNCTION ... AS 'com.example.MyUDF'`). Other UDF kinds (Python / Pandas, Hive, aggregate) are out of scope and continue to fall back to Spark.
2525

26-
This feature is experimental and disabled by default.
26+
This feature is enabled by default. Set `spark.comet.exec.scalaUDF.codegen.enabled` to `false` to route plans containing a `ScalaUDF` back to Spark for the enclosing operator.
2727

2828
## Configuration
2929

3030
| Key | Default | Description |
3131
| ------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------ |
32-
| `spark.comet.exec.scalaUDF.codegen.enabled` | `false` | When `true`, eligible `ScalaUDF`s run on the Comet path. When `false`, the enclosing operator falls back to Spark. |
32+
| `spark.comet.exec.scalaUDF.codegen.enabled` | `true` | When `true`, eligible `ScalaUDF`s run on the Comet path. When `false`, the enclosing operator falls back to Spark. |
3333

3434
## Supported
3535

spark/src/main/scala/org/apache/comet/CometConf.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -365,13 +365,13 @@ object CometConf extends ShimCometConf {
365365
val COMET_SCALA_UDF_CODEGEN_ENABLED: ConfigEntry[Boolean] =
366366
conf("spark.comet.exec.scalaUDF.codegen.enabled")
367367
.category(CATEGORY_EXEC)
368-
.doc("Experimental. Whether to route Spark `ScalaUDF` expressions through Comet's " +
368+
.doc("Whether to route Spark `ScalaUDF` expressions through Comet's " +
369369
"Arrow-direct codegen dispatcher. When enabled, a supported ScalaUDF is compiled into " +
370370
"a per-batch kernel that reads and writes Arrow vectors directly from native " +
371371
"execution. When disabled, plans containing a ScalaUDF fall back to Spark for the " +
372372
"enclosing operator.")
373373
.booleanConf
374-
.createWithDefault(false)
374+
.createWithDefault(true)
375375

376376
val COMET_EXEC_SHUFFLE_WITH_HASH_PARTITIONING_ENABLED: ConfigEntry[Boolean] =
377377
conf("spark.comet.native.shuffle.partitioning.hash.enabled")

0 commit comments

Comments
 (0)