Fall back to remote runtime on Spark Connect when the legacy namespace is unavailable by Divyansh-db · Pull Request #1469 · databricks/databricks-sdk-py

Divyansh-db · 2026-06-09T15:07:31Z

Summary

Importing databricks.sdk.runtime on a Spark Connect runtime (e.g. shared-access-mode clusters) no longer raises CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT at import time. When the legacy user namespace cannot be materialized, the import now logs a warning and falls back to the existing Spark Connect-compatible remote implementation, so WorkspaceClient() construction succeeds on such clusters.

Fixes #1463. Carries forward @sd-db's work from #1464 (closed because fork PRs in this repo cannot run tests).

Why

WorkspaceClient.__init__ eagerly builds dbutils via _make_dbutils, which on a cluster does from databricks.sdk.runtime import dbutils. That import calls UserNamespaceInitializer.getOrCreate().get_namespace_globals(), materializing a legacy SparkContext. On a Spark Connect cluster this raises CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT — a pyspark.errors.PySparkRuntimeError, not an ImportError — so the existing except ImportError: does not catch it and the error escapes the import, crashing WorkspaceClient construction before any API call. This is what databricks/dbt-databricks#1252 hits in Python models on shared clusters.

The existing except ImportError branch is already the Spark Connect-compatible path (it builds spark via DatabricksSession and dbutils via RemoteDbUtils), so this PR routes the materialization failure there.

A complementary follow-up — making WorkspaceClient.dbutils lazy via a cached_property so consumers that never touch it skip the build entirely — is noted in #1463 as a separate discussion since it touches generated code. Related issue #986 (off-cluster eager RemoteDbUtils auth failure) is the symmetric case and is intentionally not addressed here; the lazy-dbutils follow-up would unify both.

What changed

Behavioral changes

On a Spark Connect runtime, importing databricks.sdk.runtime now logs a WARNING and uses the remote implementation instead of raising at import time. When dbruntime is absent (off-cluster) or the namespace materializes successfully (classic runtime), behavior is unchanged.

Internal changes

databricks/sdk/runtime/__init__.py: the runtime-namespace block is restructured into a single try with sibling except ImportError (existing — "not in a classic runtime") and except Exception (new — Spark Connect / CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT, logged) clauses, plus an if not _use_runtime_namespace: guard over the existing — unchanged — OSS/remote block. The catch is intentionally broad rather than typed on PySparkRuntimeError to avoid pulling pyspark in at SDK import time just to narrow the exception type; the inline comment notes this.

How is this tested?

New tests/test_runtime.py simulates a Spark Connect runtime by injecting a fake dbruntime whose get_namespace_globals() raises CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT, and asserts that:

reloading databricks.sdk.runtime survives the failure and falls back (is_local_implementation is True, dbutils is not None)
WorkspaceClient(config=…) constructs without raising — the direct reproduction of the reported failure

Verified red→green locally. Full unit test suite (2098 tests) passes with no regressions.

…e is unavailable On a Databricks shared-access-mode (Spark Connect) cluster, importing databricks.sdk.runtime (which happens when WorkspaceClient.__init__ eagerly builds dbutils) materializes a legacy SparkContext via UserNamespaceInitializer.get_namespace_globals() and raises CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT (a PySparkRuntimeError, not ImportError). The surrounding 'except ImportError' does not catch it, so the error escapes the import and crashes WorkspaceClient construction. Treat a namespace-materialization failure the same as 'not in a classic runtime': log a warning and fall back to the existing OSS/remote implementation, which is Spark Connect-compatible (DatabricksSession + RemoteDbUtils). Fixes #1463 Signed-off-by: Shubham Dhal <shubham.dhal@databricks.com>

Add a sentence to the inline comment explaining why the catch is broad rather than typed on PySparkRuntimeError — avoids importing pyspark at SDK import time and keeps unexpected runtime-namespace errors surfaced as a warning + safe fallback instead of a constructor crash.

Use monkeypatch.setitem for the sys.modules injection (auto-teardown instead of manual save/restore), move the runtime reload into the fixture so test bodies stay focused on the assertion, inline the fake initializer, and strengthen the first assertion to isinstance( RemoteDbUtils) so it explicitly proves the Spark Connect fallback path was taken rather than just that some dbutils exists.

…tests test_notebook_oauth.py caches a fake ``databricks.sdk.runtime`` directly in ``sys.modules`` without going through the import machinery, which leaves the ``runtime`` attribute on ``databricks.sdk`` unset. The previous fixture's ``import databricks.sdk.runtime`` then hit the cached fake (skipping the loader), and the follow-up ``importlib.reload( databricks.sdk.runtime)`` died with AttributeError when CI happened to run test_notebook_oauth.py first. Drop the eager import + reload from the fixture; just delitem the stale ``sys.modules`` entry via monkeypatch so the next ``import`` in the test body triggers a fresh load (which correctly sets both ``sys.modules`` and the parent attribute). Verified locally that the suite passes both in isolation and when ordered after test_notebook_oauth.py.

Regenerated ``databricks/sdk/__init__.py`` with the updated template (imports ``functools.cached_property``, drops the eager ``self._dbutils = _make_dbutils(self._config)`` from ``__init__``, emits ``dbutils`` as a ``@cached_property`` that calls ``_make_dbutils`` on first access). Adds four ``tests/test_client.py`` tests that lock in the contract: - ``dbutils`` is a ``functools.cached_property`` descriptor on ``WorkspaceClient``. - ``WorkspaceClient.__init__`` does not invoke ``_make_dbutils``. - The first ``ws.dbutils`` read invokes ``_make_dbutils`` once; subsequent reads return the cached value without re-invoking. - Constructing ``WorkspaceClient`` on a faked Spark Connect runtime (whose ``dbruntime`` raises ``CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT`` on any namespace materialization) succeeds without importing ``databricks.sdk.runtime`` at all — the durable sidestep of databricks/dbt-databricks#1252. Complements #1469 (which catches the same failure at runtime-module import time as a defense-in-depth fallback).

Signed-off-by: Divyansh Vijayvergia <171924202+Divyansh-db@users.noreply.github.com>

github-actions · 2026-06-11T12:16:44Z

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

PR number: 1469
Commit SHA: ad5a86173022e030bd96501c11e3fcd33bee8681

Checks will be approved automatically on success.

## Summary Makes `WorkspaceClient.dbutils` a `functools.cached_property` so consumers that never read it pay no construction cost — and, on Spark Connect runtimes, never touch the legacy `SparkContext` path that `databricks.sdk.runtime` materializes on import. Includes four regression tests that lock in the contract. ## Why `WorkspaceClient.__init__` used to call `_make_dbutils(self._config)` eagerly, which on a cluster imports `databricks.sdk.runtime`. On a Spark Connect (shared-access-mode) cluster, that import materializes a legacy `SparkContext` and raises `CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT`, crashing the constructor before any API call. Downstream consumers that never touch `.dbutils` (notably `dbt-databricks` Python models) hit this for no reason — see databricks#1463 and databricks/dbt-databricks#1252. databricks#1469 patches the runtime side as a defense-in-depth fallback (catch the materialization failure, fall back to `RemoteDbUtils`). This PR is the durable fix: callers that don't read `.dbutils` never trigger the build at all, sidestepping the entire code path. The first read still calls `_make_dbutils` once, lazily; subsequent reads hit the cached attribute in `__dict__` at plain-attribute speed. ## What changed `databricks/sdk/__init__.py` (generated from updated template): - `from functools import cached_property` added to the imports. - The eager `self._dbutils = _make_dbutils(self._config)` line is removed from `__init__`. - `@property def dbutils` (which returned the cached `self._dbutils`) becomes `@cached_property def dbutils` that calls `_make_dbutils(self._config)` on first access. `_dbutils` was a private attribute with no external consumers (verified across the codebase), so removing it does not break any public surface. `tests/test_client.py` — four new tests: - `test_dbutils_is_a_cached_property` — descriptor type check. - `test_workspace_client_init_does_not_build_dbutils` — spies `_make_dbutils`, constructs a `WorkspaceClient`, asserts the spy was never called. - `test_dbutils_first_access_builds_exactly_once` — first read invokes `_make_dbutils` once (returns the spy's sentinel); second read still shows `call_count == 1` and same identity. - `test_workspace_client_constructs_on_spark_connect_without_touching_runtime` — fakes `dbruntime` to raise `CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT` on any namespace materialization; asserts `WorkspaceClient(config=...)` succeeds and `databricks.sdk.runtime` is never imported during construction. This is the strongest evidence that the dbt-databricks failure mode is sidestepped by this change alone. ## How is this tested? - 4/4 new tests pass locally (0.03s). - Existing `tests/test_client.py` autospec tests untouched, still pass. - The fourth test is the negative-space proof: asserts `databricks.sdk.runtime` is *not* in `sys.modules` after `WorkspaceClient(config=...)` — i.e., the constructor literally does not reach for the runtime module. NO_CHANGELOG=true

sd-db and others added 2 commits June 8, 2026 14:08

Merge branch 'main' into fix/runtime-spark-connect-import

337cc38

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 15:07 — with GitHub Actions Inactive

Divyansh-db force-pushed the fix/runtime-spark-connect-import branch from 3d8e600 to 61cc935 Compare June 9, 2026 15:14

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 15:14 — with GitHub Actions Inactive

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 15:15 — with GitHub Actions Inactive

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 15:19 — with GitHub Actions Inactive

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 16:45 — with GitHub Actions Inactive

Divyansh-db temporarily deployed to test-trigger-is June 9, 2026 16:46 — with GitHub Actions Inactive

Divyansh-db mentioned this pull request Jun 9, 2026

Fall back to remote runtime on Spark Connect when the legacy namespace is unavailable #1464

Closed

Divyansh-db requested a review from hectorcast-db June 9, 2026 18:17

Divyansh-db mentioned this pull request Jun 9, 2026

Make WorkspaceClient.dbutils lazy via cached_property #1470

Merged

hectorcast-db approved these changes Jun 10, 2026

View reviewed changes

Merge branch 'main' into fix/runtime-spark-connect-import

ad5a861

Signed-off-by: Divyansh Vijayvergia <171924202+Divyansh-db@users.noreply.github.com>

Divyansh-db temporarily deployed to test-trigger-is June 11, 2026 12:16 — with GitHub Actions Inactive

Divyansh-db enabled auto-merge June 11, 2026 12:16

Divyansh-db temporarily deployed to test-trigger-is June 11, 2026 12:17 — with GitHub Actions Inactive

Divyansh-db added this pull request to the merge queue Jun 11, 2026

Merged via the queue into main with commit 8d80062 Jun 11, 2026
14 checks passed

Divyansh-db deleted the fix/runtime-spark-connect-import branch June 11, 2026 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall back to remote runtime on Spark Connect when the legacy namespace is unavailable#1469

Fall back to remote runtime on Spark Connect when the legacy namespace is unavailable#1469
Divyansh-db merged 6 commits into
mainfrom
fix/runtime-spark-connect-import

Divyansh-db commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Divyansh-db commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Behavioral changes

Internal changes

How is this tested?

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Divyansh-db commented Jun 9, 2026 •

edited

Loading