Skip to content

Commit d58c541

Browse files
committed
fix(ci): skip Spark 4.1 Python data source probe in local SQL tests
Spark 4.1's DataSourceManager probes for Python data sources during query analysis by spawning a python3 worker. The CI amd64/rust container has no python3, so the probe is skipped there. On a developer machine that has python3 the worker can hang indefinitely, since the JVM-side read has no idle timeout by default, stalling suites such as GlobalTempViewSuite. Point PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON at a nonexistent interpreter so the probe is skipped, matching CI. The value is overridable for developers who want to run the Python-dependent suites.
1 parent 21ce7b6 commit d58c541

2 files changed

Lines changed: 25 additions & 0 deletions

File tree

dev/ci/spark-sql-tests/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,15 @@ PASS/FAIL summary is printed at the end.
7878
| `SPARK_REF` | `v4.1.1` | Git ref checked out for the Spark sources. |
7979
| `SBT_MEM` | `4096` | sbt heap size in MB. |
8080
| `LC_ALL` | `C.UTF-8` | Locale for the sbt run. Use `en_US.UTF-8` on macOS if `C.UTF-8` is unavailable. |
81+
| `PYSPARK_PYTHON` | a nonexistent path | Python interpreter for Spark. The default skips Spark 4.1's Python data source probe, which can hang on machines that have `python3`. Export a real interpreter to run the Python-dependent suites. |
82+
83+
> **Note on Python:** Spark 4.1 probes for Python data sources during query
84+
> analysis by spawning a Python worker. The CI `amd64/rust` container has no
85+
> `python3`, so the probe is skipped. On a developer machine that has `python3`
86+
> the worker can hang indefinitely (the JVM-side read has no idle timeout),
87+
> stalling suites such as `GlobalTempViewSuite`. `run.sh` therefore points
88+
> `PYSPARK_PYTHON` / `PYSPARK_DRIVER_PYTHON` at a nonexistent path by default so
89+
> the probe is skipped, matching CI.
8190
8291
## How it works
8392

dev/ci/spark-sql-tests/run.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ Environment variables:
4848
SPARK_REF Git ref for the Spark sources (default: v$SPARK_VERSION).
4949
SBT_MEM sbt heap size in MB (default: 4096).
5050
LC_ALL Locale for the sbt run (default: C.UTF-8; use en_US.UTF-8 on macOS).
51+
PYSPARK_PYTHON Python interpreter for Spark. Defaults to a nonexistent
52+
path so Spark 4.1's Python data source probe is skipped
53+
(it can hang on machines that have python3). Export a
54+
real interpreter to run the Python-dependent suites.
5155
EOF
5256
}
5357

@@ -139,13 +143,25 @@ for m in "${modules_to_run[@]}"; do
139143
# Stale Parquet cache workaround (mirrors spark_sql_test.yml).
140144
rm -rf "$maven_repo/org/apache/parquet"
141145

146+
# Spark 4.1's DataSourceManager probes for Python data sources during query
147+
# analysis by spawning a Python worker. The CI amd64/rust container has no
148+
# python3, so the probe is skipped there. On a developer machine that does
149+
# have python3 (every macOS install does) the worker can hang indefinitely:
150+
# the JVM-side read has no idle timeout by default, so suites such as
151+
# GlobalTempViewSuite stall forever instead of failing fast. Point PySpark at
152+
# a nonexistent interpreter so the probe is skipped, matching CI. A developer
153+
# who wants the Python suites can export PYSPARK_PYTHON themselves.
154+
no_python="/nonexistent/comet-disable-python-datasources"
155+
142156
(
143157
cd "$COMET_SPARK_DIR" || exit 1
144158
NOLINT_ON_COMPILE=true \
145159
ENABLE_COMET=true \
146160
ENABLE_COMET_ONHEAP=true \
147161
ENABLE_COMET_LOG_FALLBACK_REASONS=false \
148162
SERIAL_SBT_TESTS=1 \
163+
PYSPARK_DRIVER_PYTHON="${PYSPARK_DRIVER_PYTHON:-$no_python}" \
164+
PYSPARK_PYTHON="${PYSPARK_PYTHON:-$no_python}" \
149165
build/sbt -Dsbt.log.noformat=true -mem "$SBT_MEM" \
150166
'set Global / concurrentRestrictions := Seq(Tags.limit(Tags.ForkedTestGroup, 1))' \
151167
"$sbt_args"

0 commit comments

Comments
 (0)