Skip to content

Commit d01f89a

Browse files
authored
Make WorkspaceClient.dbutils lazy via cached_property (#1470)
## Summary Makes `WorkspaceClient.dbutils` a `functools.cached_property` so consumers that never read it pay no construction cost — and, on Spark Connect runtimes, never touch the legacy `SparkContext` path that `databricks.sdk.runtime` materializes on import. Includes four regression tests that lock in the contract. ## Why `WorkspaceClient.__init__` used to call `_make_dbutils(self._config)` eagerly, which on a cluster imports `databricks.sdk.runtime`. On a Spark Connect (shared-access-mode) cluster, that import materializes a legacy `SparkContext` and raises `CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT`, crashing the constructor before any API call. Downstream consumers that never touch `.dbutils` (notably `dbt-databricks` Python models) hit this for no reason — see #1463 and databricks/dbt-databricks#1252. #1469 patches the runtime side as a defense-in-depth fallback (catch the materialization failure, fall back to `RemoteDbUtils`). This PR is the durable fix: callers that don't read `.dbutils` never trigger the build at all, sidestepping the entire code path. The first read still calls `_make_dbutils` once, lazily; subsequent reads hit the cached attribute in `__dict__` at plain-attribute speed. ## What changed `databricks/sdk/__init__.py` (generated from updated template): - `from functools import cached_property` added to the imports. - The eager `self._dbutils = _make_dbutils(self._config)` line is removed from `__init__`. - `@property def dbutils` (which returned the cached `self._dbutils`) becomes `@cached_property def dbutils` that calls `_make_dbutils(self._config)` on first access. `_dbutils` was a private attribute with no external consumers (verified across the codebase), so removing it does not break any public surface. `tests/test_client.py` — four new tests: - `test_dbutils_is_a_cached_property` — descriptor type check. - `test_workspace_client_init_does_not_build_dbutils` — spies `_make_dbutils`, constructs a `WorkspaceClient`, asserts the spy was never called. - `test_dbutils_first_access_builds_exactly_once` — first read invokes `_make_dbutils` once (returns the spy's sentinel); second read still shows `call_count == 1` and same identity. - `test_workspace_client_constructs_on_spark_connect_without_touching_runtime` — fakes `dbruntime` to raise `CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT` on any namespace materialization; asserts `WorkspaceClient(config=...)` succeeds and `databricks.sdk.runtime` is never imported during construction. This is the strongest evidence that the dbt-databricks failure mode is sidestepped by this change alone. ## How is this tested? - 4/4 new tests pass locally (0.03s). - Existing `tests/test_client.py` autospec tests untouched, still pass. - The fourth test is the negative-space proof: asserts `databricks.sdk.runtime` is *not* in `sys.modules` after `WorkspaceClient(config=...)` — i.e., the constructor literally does not reach for the runtime module. NO_CHANGELOG=true
1 parent c05cc6f commit d01f89a

2 files changed

Lines changed: 71 additions & 3 deletions

File tree

databricks/sdk/__init__.py

Lines changed: 6 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/test_client.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
import functools
2+
import sys
3+
import types
14
from unittest.mock import create_autospec
25

36
import pytest
@@ -15,3 +18,65 @@ def test_autospec_fails_on_setting_unknown_property():
1518
w = create_autospec(WorkspaceClient, spec_set=True)
1619
with pytest.raises(AttributeError):
1720
w.bar = 1
21+
22+
23+
def test_dbutils_is_a_cached_property():
24+
"""``dbutils`` is a ``functools.cached_property`` so consumers that never read it
25+
pay no build cost — and, on Spark Connect runtimes, never touch the legacy
26+
``SparkContext`` path that ``databricks.sdk.runtime`` materializes on import."""
27+
descriptor = WorkspaceClient.__dict__["dbutils"]
28+
assert isinstance(descriptor, functools.cached_property)
29+
30+
31+
def test_workspace_client_init_does_not_build_dbutils(config, mocker):
32+
"""Constructing a ``WorkspaceClient`` must not invoke ``_make_dbutils``."""
33+
spy = mocker.patch("databricks.sdk._make_dbutils")
34+
35+
WorkspaceClient(config=config)
36+
37+
spy.assert_not_called()
38+
39+
40+
def test_dbutils_first_access_builds_exactly_once(config, mocker):
41+
"""First read of ``.dbutils`` calls ``_make_dbutils`` once; subsequent reads
42+
return the cached value without re-invoking."""
43+
sentinel = object()
44+
spy = mocker.patch("databricks.sdk._make_dbutils", return_value=sentinel)
45+
ws = WorkspaceClient(config=config)
46+
47+
first = ws.dbutils
48+
assert spy.call_count == 1
49+
assert first is sentinel
50+
51+
second = ws.dbutils
52+
assert spy.call_count == 1 # still 1 — cached_property short-circuits via __dict__
53+
assert second is sentinel
54+
55+
56+
def test_workspace_client_constructs_on_spark_connect_without_touching_runtime(monkeypatch, config):
57+
"""End-to-end Layer 2 win: with the lazy property, ``WorkspaceClient(config=...)``
58+
on a Spark Connect cluster succeeds without ever importing
59+
``databricks.sdk.runtime`` — so the legacy ``SparkContext`` materialization that
60+
raises ``CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT`` is never even attempted.
61+
62+
Faked ``dbruntime`` raises on any namespace materialization; if anything during
63+
construction triggered ``databricks.sdk.runtime``'s import, this test would crash.
64+
"""
65+
66+
class _Initializer:
67+
@staticmethod
68+
def getOrCreate():
69+
raise RuntimeError(
70+
"[CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT] Calls to SparkContext are not "
71+
"supported on a Spark Connect cluster."
72+
)
73+
74+
fake = types.ModuleType("dbruntime")
75+
fake.UserNamespaceInitializer = _Initializer
76+
monkeypatch.setitem(sys.modules, "dbruntime", fake)
77+
monkeypatch.delitem(sys.modules, "databricks.sdk.runtime", raising=False)
78+
79+
ws = WorkspaceClient(config=config)
80+
81+
assert ws is not None
82+
assert "databricks.sdk.runtime" not in sys.modules

0 commit comments

Comments
 (0)