[SPARK-57787][CONNECT] Reuse a persistent local Spark Connect server for faster local startup by ericm-db · Pull Request #56907 · apache/spark

ericm-db · 2026-06-30T17:47:05Z

What changes were proposed in this pull request?

Adds an opt-in fast path for local Spark Connect development. Today
SparkSession.builder.remote("local[*]").getOrCreate() starts a fresh in-process Connect server in
every process (SparkSession._start_connect_server), so each run re-pays the cold start (JVM warmup

SparkContext + server boot).

When SPARK_LOCAL_CONNECT_REUSE=1 (or .config("spark.local.connect.reuse", "true")) is set, a
local-mode remote session instead reconnects to a persistent local Connect server, starting one on
the first run:

it reads a discovery file (~/.spark/connect-local.json, overridable via
SPARK_LOCAL_CONNECT_DISCOVERY) and reuses the recorded server if its pid is alive, its port is
open, and its Spark version matches; otherwise
it launches a detached server (pyspark/sql/connect/local_server.py), waits until it is
reachable, and records it for the next process.

The user's code is unchanged. The first run pays the cold start once; later runs reconnect in a
fraction of a second.

Notes:

Off by default; the existing in-process path is untouched. Python-only -- no protocol or Scala
changes.
The server mints a stable token, written with host/port/pid/version to the discovery file (mode
0600), which the client uses to authenticate.
Each run is its own Connect session, so session-local state (temp views, runtime SQL confs,
isolated artifacts) is fresh per run; only shared SparkContext state (catalog, global temp
views, cached data) carries across runs.
The server self-terminates after spark.local.connect.server.idleTimeout seconds idle (default
3600; 0 disables).

Open questions for the dev list:

the opt-in name (SPARK_LOCAL_CONNECT_REUSE / spark.local.connect.reuse);
whether auto-spawning a detached server from a library call is acceptable, or an explicit start
command is preferred;
the idle-timeout default and mechanism.

Why are the changes needed?

Creating a local Spark session for a quick edit/run loop takes a few seconds, and that cost is
one-time-per-process -- it does not amortize across separate runs. Keeping a warm server alive and
reconnecting to it is the only way to make a repeated local dev/test loop fast. This makes that
behavior available behind a single opt-in, without changing user code or default behavior.

Does this PR introduce any user-facing change?

Only when the opt-in is enabled. With SPARK_LOCAL_CONNECT_REUSE=1 (or
spark.local.connect.reuse=true), SparkSession.builder.remote("local[*]").getOrCreate() starts a
persistent local Connect server on the first run and reconnects on later runs, instead of booting a
fresh in-process server each time. With the opt-in unset (the default), behavior is unchanged. A new
documentation section describes the feature.

How was this patch tested?

New python/pyspark/sql/tests/connect/test_connect_local_server.py: unit tests for the discovery
and reuse-decision logic (version mismatch, dead pid, alive-and-listening, safe no-op stop), plus an
end-to-end test that starts a real detached server, confirms a second call reconnects to it (same
pid, no respawn), runs queries over two independent connections, and checks that a temp view in one
does not leak into the other.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…for faster local startup Adds an opt-in (SPARK_LOCAL_CONNECT_REUSE / spark.local.connect.reuse) so that `SparkSession.builder.remote("local[*]").getOrCreate()` reconnects to a persistent local Spark Connect server -- starting a detached one on the first run and reconnecting to it on later runs -- instead of booting a fresh in-process server in every process. The first run pays the cold start once; later runs reconnect in a fraction of a second. Default behavior is unchanged when the opt-in is off. No protocol or Scala changes.

Trim verbose comments/docstrings, drop redundant noqa and the private-method example from the user docs, and consolidate exception handling. No functional change.

gaogaotiantian · 2026-06-30T18:55:09Z

Is this part of https://lists.apache.org/thread/sg9o2gbb3nttz74f0s01v8f167zy8ltt ?

ericm-db · 2026-06-30T19:01:03Z

Oh yeah this was to address a comment that Nicholas had left on the doc he had linked in that thread.

ericm-db added 2 commits June 30, 2026 17:46

[SPARK-57787][CONNECT] Tighten comments, docstrings and docs

7976488

Trim verbose comments/docstrings, drop redundant noqa and the private-method example from the user docs, and consolidate exception handling. No functional change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57787][CONNECT] Reuse a persistent local Spark Connect server for faster local startup#56907

[SPARK-57787][CONNECT] Reuse a persistent local Spark Connect server for faster local startup#56907
ericm-db wants to merge 2 commits into
apache:masterfrom
ericm-db:local-connect-reuse

ericm-db commented Jun 30, 2026 •

edited

Loading

Uh oh!

gaogaotiantian commented Jun 30, 2026

Uh oh!

ericm-db commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ericm-db commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Jun 30, 2026

Uh oh!

ericm-db commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericm-db commented Jun 30, 2026 •

edited

Loading

ericm-db commented Jun 30, 2026 •

edited

Loading