[SPARK-57793][CONNECT] Add PathAwareChannelBuilder for SparkConnect python client#56933
[SPARK-57793][CONNECT] Add PathAwareChannelBuilder for SparkConnect python client#56933hiboyang wants to merge 3 commits into
Conversation
…ython client Support URL path in the Spark Connect connection string, e.g. `sc://host1/path1:15002`, so a Kubernetes Ingress can route by URL path to the right Spark Connect driver endpoint. Adds PathAwareChannelBuilder which parses an ingress path prefix from the connection string and prepends it to every gRPC method via an interceptor. See apache#56816 and https://issues.apache.org/jira/browse/SPARK-57793 Generated-by: Claude Code
Review feedbackTwo issues in 1. IPv6 endpoints are rejected (
|
… methods The gRPC interceptor methods lacked type annotations, failing mypy's disallow_untyped_defs check ([no-untyped-def]).
Thanks for the comments! Will make these changes after I get initial PR check passing. |
|
Thanks for the update — the two issues from the earlier round (IPv6 handling and the path port being lost with the trailing-slash A few more issues from a closer pass, roughly in order of importance: 1. The 443-implies-TLS rule doesn't match the docstring, and one form gets port 443 without TLS The docstring says "TLS is enabled implicitly when the resolved port is 443 and
2. A
3. Consider subclassing The scheme check in Since 4. The connection-string spec and other clients
5. Path-derived port skips validation the netloc port gets The netloc form goes through 6.
Minor: the class docstring's doctest is malformed (the |
What changes were proposed in this pull request?
This PR adds a new
PathAwareChannelBuilderto the Spark Connect Python client to support a URL path in the Spark Connect connection string, e.g.sc://host1:15002/path1.The new builder accepts both the standard connection form (
sc://host[:port][/;params]) and the path-routed formsc://gateway:<port>/<prefix>. When a path prefix is present, it is prepended to every gRPC method via a client interceptor.Related discussion: #56816
JIRA: https://issues.apache.org/jira/browse/SPARK-57793
Why are the changes needed?
In Kubernetes, users commonly set up an Ingress to expose a Spark Connect driver endpoint behind a matching URL, e.g.
http://host1/path1routed to the driver endpoint. The existing channel builder cannot carry a path prefix, so the Spark Connect client needs to be updated to support this path-based routing scenario.Does this PR introduce any user-facing change?
No behavior change for existing connection strings — the existing host-based
sc://connection strings continue to work unchanged. This PR adds a new opt-inPathAwareChannelBuilderthat additionally supports a path inside the URL, e.g.How was this patch tested?
Added unit tests in
python/pyspark/sql/tests/connect/test_connect_channel.pycovering path parsing, port extraction from the final path segment, and the path-prefix interceptor. Also manually tested in a local environment.Was this patch authored or co-authored using generative AI tooling?
assisted by: Claude Code