fix(gso): stop 9-judge eval SIGSEGV from concurrent Spark Connect#227
Open
hiydavid wants to merge 1 commit into
Open
fix(gso): stop 9-judge eval SIGSEGV from concurrent Spark Connect#227hiydavid wants to merge 1 commit into
hiydavid wants to merge 1 commit into
Conversation
The syntax_validity scorer issued spark.sql("EXPLAIN ...") directly on the
shared Spark Connect session. MLflow runs scorers across an 8-worker thread
pool, so up to 8 threads drove that session concurrently — mutating session
state (USE CATALOG/SCHEMA) and issuing EXPLAINs over the same gRPC channel.
Spark Connect's client and the underlying gRPC/pyarrow C extensions aren't
thread-safe under that load, crashing the kernel with a native SIGSEGV
(exit 139) during Step 2b of baseline_eval. Preflight passed because it uses
Spark single-threaded; only the concurrent eval crashed, on any compute.
Route the scorer's EXPLAIN through the thread-safe SQL Warehouse Statement
Execution API (_execute_sql_via_warehouse) when a warehouse is available —
each call is an independent HTTP request with no shared session, matching the
benchmark precheck. Add a process-wide spark_serialized() lock for the
no-warehouse fallback so the shared session is never driven by two threads at
once. EXPLAIN's full catalog-aware validation is preserved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
GSO's
baseline_evaltask crashed the Python kernel with a fatalSIGSEGV(exit code 139, "The Python kernel is unresponsive") during Step 2b: Run
9-Judge Evaluation. It was reproducible across serverless and dedicated
compute, and reducing
target_benchmark_countdid not help.Root cause
The 9-judge evaluation runs scorers across an 8-worker thread pool
(
scorer_workers=8). Of the 9 judges, onlysyntax_validitytouchesSpark — it called
spark.sql("USE CATALOG …")/spark.sql("EXPLAIN …")directly inside its scoring closure. Under the thread pool, up to 8 threads
drove the shared Spark Connect session concurrently, mutating session
state and issuing EXPLAINs over the same gRPC channel. Spark Connect's client
and the underlying gRPC / pyarrow C extensions are not safe under that
concurrency, producing a native segfault.
Evidence:
only Step 2b (8-way concurrent Spark) crashes.
join; the watchdog did not time out, so the crash came from inside the eval.Fix
Execution API (
_execute_sql_via_warehouse) when awarehouse_idisavailable — each call is an independent HTTP request with no shared session.
This mirrors the existing benchmark precheck, and surfaces planning errors
from the returned
plancolumn (the warehouse EXPLAIN returns a plan ratherthan throwing).
spark_serialized()lock (common/spark_concurrency.py)as defense-in-depth for the no-warehouse fallback, so the shared Spark Connect
session is never driven by two threads at once.
EXPLAIN's full catalog-aware validation (unresolved columns/tables/functions,not just syntax) is preserved — the warehouse path runs the same
EXPLAIN.Changes
common/spark_concurrency.py(new) — module-level lock +spark_serialized().scorers/syntax_validity.py— new_explain_sql()helper (warehouse-first,serialized-Spark fallback); factory gains
w+warehouse_id.scorers/__init__.py—make_all_scorers(...)gainswarehouse_id, threadedinto the syntax scorer.
harness.py— all 3make_all_scorerscall sites passwarehouse_id=resolve_warehouse_id("").Verification
No local dev server — verify by deploying (
./scripts/deploy.sh --update) andrunning an Auto-Optimize pass against a test Genie Space:
baseline_eval→ Step 2b completes without the exit-139 SIGSEGV.syntax_validitystill emits yes/no verdicts — valid SQL →yes, SQL with abad column/function →
nowith the right failure type (proves thewarehouse-routed EXPLAIN does real catalog-aware validation).