Skip to content

[SPARK-57787][CONNECT] Reuse a persistent local Spark Connect server for faster local startup#56907

Open
ericm-db wants to merge 2 commits into
apache:masterfrom
ericm-db:local-connect-reuse
Open

[SPARK-57787][CONNECT] Reuse a persistent local Spark Connect server for faster local startup#56907
ericm-db wants to merge 2 commits into
apache:masterfrom
ericm-db:local-connect-reuse

Conversation

@ericm-db

@ericm-db ericm-db commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds an opt-in fast path for local Spark Connect development. Today
SparkSession.builder.remote("local[*]").getOrCreate() starts a fresh in-process Connect server in
every process (SparkSession._start_connect_server), so each run re-pays the cold start (JVM warmup

  • SparkContext + server boot).

When SPARK_LOCAL_CONNECT_REUSE=1 (or .config("spark.local.connect.reuse", "true")) is set, a
local-mode remote session instead reconnects to a persistent local Connect server, starting one on
the first run:

  1. it reads a discovery file (~/.spark/connect-local.json, overridable via
    SPARK_LOCAL_CONNECT_DISCOVERY) and reuses the recorded server if its pid is alive, its port is
    open, and its Spark version matches; otherwise
  2. it launches a detached server (pyspark/sql/connect/local_server.py), waits until it is
    reachable, and records it for the next process.

The user's code is unchanged. The first run pays the cold start once; later runs reconnect in a
fraction of a second.

Notes:

  • Off by default; the existing in-process path is untouched. Python-only -- no protocol or Scala
    changes.
  • The server mints a stable token, written with host/port/pid/version to the discovery file (mode
    0600), which the client uses to authenticate.
  • Each run is its own Connect session, so session-local state (temp views, runtime SQL confs,
    isolated artifacts) is fresh per run; only shared SparkContext state (catalog, global temp
    views, cached data) carries across runs.
  • The server self-terminates after spark.local.connect.server.idleTimeout seconds idle (default
    3600; 0 disables).

Open questions for the dev list:

  • the opt-in name (SPARK_LOCAL_CONNECT_REUSE / spark.local.connect.reuse);
  • whether auto-spawning a detached server from a library call is acceptable, or an explicit start
    command is preferred;
  • the idle-timeout default and mechanism.

Why are the changes needed?

Creating a local Spark session for a quick edit/run loop takes a few seconds, and that cost is
one-time-per-process -- it does not amortize across separate runs. Keeping a warm server alive and
reconnecting to it is the only way to make a repeated local dev/test loop fast. This makes that
behavior available behind a single opt-in, without changing user code or default behavior.

Does this PR introduce any user-facing change?

Only when the opt-in is enabled. With SPARK_LOCAL_CONNECT_REUSE=1 (or
spark.local.connect.reuse=true), SparkSession.builder.remote("local[*]").getOrCreate() starts a
persistent local Connect server on the first run and reconnects on later runs, instead of booting a
fresh in-process server each time. With the opt-in unset (the default), behavior is unchanged. A new
documentation section describes the feature.

How was this patch tested?

New python/pyspark/sql/tests/connect/test_connect_local_server.py: unit tests for the discovery
and reuse-decision logic (version mismatch, dead pid, alive-and-listening, safe no-op stop), plus an
end-to-end test that starts a real detached server, confirms a second call reconnects to it (same
pid, no respawn), runs queries over two independent connections, and checks that a temp view in one
does not leak into the other.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

ericm-db added 2 commits June 30, 2026 17:46
…for faster local startup

Adds an opt-in (SPARK_LOCAL_CONNECT_REUSE / spark.local.connect.reuse) so that
`SparkSession.builder.remote("local[*]").getOrCreate()` reconnects to a persistent local
Spark Connect server -- starting a detached one on the first run and reconnecting to it on
later runs -- instead of booting a fresh in-process server in every process. The first run
pays the cold start once; later runs reconnect in a fraction of a second.

Default behavior is unchanged when the opt-in is off. No protocol or Scala changes.
Trim verbose comments/docstrings, drop redundant noqa and the private-method example from the
user docs, and consolidate exception handling. No functional change.
@gaogaotiantian

Copy link
Copy Markdown
Contributor

@ericm-db

ericm-db commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Oh yeah this was to address a comment that Nicholas had left on the doc he had linked in that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants