Skip to content

Commit 1c00e1b

Browse files
committed
test: disable Comet shuffle in pyarrow UDF pytest session
CometSparkSessionExtensions.isCometLoaded short-circuits the whole extension (returning false; no rules registered) when spark.comet.exec.shuffle.enabled is true but spark.shuffle.manager is not Comet's manager. The pytest conftest only sets the basic Comet configs, so this guard fired and CometScanRule never ran. The plan stayed vanilla Parquet, the rewrite chain never had a Comet columnar producer to match, and every [accelerated] assertion that checks for CometMapInBatch failed. These tests do not exercise shuffle, so disable Comet shuffle in the session. Comet's scan and exec rules then run normally and the rewrite fires. Diagnoses the wholesale PyArrow UDF Spark 4.0 CI failure on #4234.
1 parent a520321 commit 1c00e1b

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

spark/src/test/resources/pyspark/test_pyarrow_udf.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,12 @@ def spark():
6363
.config("spark.plugins", "org.apache.spark.CometPlugin")
6464
.config("spark.comet.enabled", "true")
6565
.config("spark.comet.exec.enabled", "true")
66+
# spark.comet.exec.shuffle.enabled defaults to true, and
67+
# CometSparkSessionExtensions.isCometLoaded refuses to register Comet's rules
68+
# at all when shuffle is on but spark.shuffle.manager is not the Comet manager.
69+
# These tests do not need Comet shuffle, so disable it explicitly to keep
70+
# Comet's scan and exec rules active without configuring shuffle.
71+
.config("spark.comet.exec.shuffle.enabled", "false")
6672
.config("spark.memory.offHeap.enabled", "true")
6773
.config("spark.memory.offHeap.size", "2g")
6874
.getOrCreate()

0 commit comments

Comments
 (0)