Skip to content

Commit b0b0c48

Browse files
committed
Add documentation of the new feature
Signed-off-by: Federico Manuel Gomez Peter <federico.gomez@payclip.com>
1 parent 003b634 commit b0b0c48

File tree

2 files changed

+70
-1
lines changed

2 files changed

+70
-1
lines changed

dbt/adapters/databricks/python_models/python_submissions.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,12 @@ def submit(self, compiled_code: str) -> None:
3333
def _prepare_code_with_notebook_scoped_packages(
3434
self, compiled_code: str, separator: str = NOTEBOOK_SEPARATOR
3535
) -> str:
36-
"""Prepend notebook-scoped package installation commands to the compiled code."""
36+
"""
37+
Prepend notebook-scoped package installation commands to the compiled code.
38+
39+
If notebook-scoped flag is not set, or if there are no packages to install,
40+
returns the original compiled code.
41+
"""
3742
if not self.packages_config.packages or not self.packages_config.notebook_scoped:
3843
return compiled_code
3944

docs/workflow-job-submission.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,11 @@ models:
7070
runtime_engine: "{{ var('job_cluster_defaults.runtime_engine') }}"
7171
data_security_mode: "{{ var('job_cluster_defaults.data_security_mode') }}"
7272
autoscale: { "min_workers": 1, "max_workers": 4 }
73+
74+
# Python package configuration
75+
packages: ["pandas", "numpy==1.24.0"]
76+
index_url: "https://pypi.org/simple" # Optional custom PyPI index
77+
notebook_scoped_libraries: false # Set to true for notebook-scoped installation
7378
```
7479
7580
### Configuration
@@ -173,6 +178,65 @@ grants:
173178
manage: []
174179
```
175180

181+
#### Python Packages
182+
183+
You can install Python packages for your models using the `packages` configuration. There are two ways to install packages:
184+
185+
##### Cluster-level installation (default)
186+
187+
By default, packages are installed at the cluster level using Databricks libraries. This is the traditional approach where packages are installed when the cluster starts.
188+
189+
```yaml
190+
models:
191+
- name: my_model
192+
config:
193+
packages: ["pandas", "numpy==1.24.0", "scikit-learn>=1.0"]
194+
index_url: "https://pypi.org/simple" # Optional: custom PyPI index
195+
notebook_scoped_libraries: false # Default behavior
196+
```
197+
198+
**Benefits:**
199+
- Packages are available for the entire cluster lifecycle
200+
- Faster model execution (no installation overhead per run)
201+
202+
**Limitations:**
203+
- Requires cluster restart to update packages
204+
- All tasks on the cluster share the same package versions
205+
206+
##### Notebook-scoped installation
207+
208+
When `notebook_scoped_libraries: true`, packages are installed at the notebook level using `%pip install` magic commands. This prepends installation commands to your compiled code.
209+
210+
```yaml
211+
models:
212+
- name: my_model
213+
config:
214+
packages: ["pandas", "numpy==1.24.0", "scikit-learn>=1.0"]
215+
index_url: "https://pypi.org/simple" # Optional: custom PyPI index
216+
notebook_scoped_libraries: true # Enable notebook-scoped installation
217+
```
218+
219+
**Benefits:**
220+
- Packages are installed per model execution
221+
- No cluster restart required to change packages
222+
- Different models can use different package versions
223+
- Works with serverless compute and all-purpose clusters
224+
225+
**How it works:**
226+
The adapter prepends the following commands to your model code:
227+
```python
228+
%pip install -q pandas numpy==1.24.0 scikit-learn>=1.0
229+
dbutils.library.restartPython()
230+
# Your model code follows...
231+
```
232+
233+
**Supported submission methods:**
234+
- `all_purpose_cluster` (Command API)
235+
- `job_cluster` (Notebook Job Run)
236+
- `workflow_job` (Workflow Job)
237+
238+
**Note:** For Databricks Runtime 13.0 and above, `dbutils.library.restartPython()` is automatically added after package installation to ensure packages are properly loaded.
239+
176240
#### Post hooks
177241

178242
It is possible to add in python hooks by using the `config.python_job_config.post_hook_tasks`

0 commit comments

Comments
 (0)