Genie Workbench is deployed as a Databricks App. This guide covers both supported install paths:
- Local terminal installer:
scripts/install.shplusscripts/deploy.sh - Databricks notebook installer:
notebooks/install.pyfrom a Databricks Git folder
Both paths deploy the same app and provision the same core resources. The local path uses the Databricks CLI and Asset Bundles. The notebook path uses notebook-native WorkspaceClient() authentication and deploys from a generated workspace source folder.
- A Databricks workspace with:
- Apps enabled
- A SQL Warehouse (Serverless recommended)
- A Unity Catalog where the GSO schema can be created
- Lakebase Autoscaling available (optional but recommended for persistent scan history, starred spaces, and agent sessions)
- MLflow Prompt Registry enabled (required for Auto-Optimize judge prompts)
- Databricks Foundation Model APIs enabled for the curated Create Agent and Auto-Optimize model list
The installer creates apps, UC objects, Lakebase projects, MLflow experiments, and jobs, and grants the app SP across all of them. The person running the installer typically needs workspace-admin equivalents. Non-admins can install only if they hold every entitlement listed in Authentication & Permissions — Installer Permissions, which also maps each install step to the specific permission it needs.
If you are uncertain, the safest path is to have a workspace admin run the installer (or pair with you while you run it).
Additional local tooling:
- Databricks CLI v0.297.2+ (validated by preflight)
- uv - Python package manager
- Node.js ^20.19.0 or >=22.12.0 and npm
- Python 3.11+
- Network access to your configured npm registry. Databricks internal users can use
npm config set registry https://npm-proxy.dev.databricks.com/; external users should usenpm config set registry https://registry.npmjs.org/.
Additional requirements:
- Repo cloned into a Databricks Git folder
- A Databricks compute session that can run
%pip install - No local Databricks CLI profile, Node, npm, or uv setup required
Choose one install path. Do not mix the local terminal installer and notebook installer for the same app instance unless you intentionally understand which source path is being deployed.
git clone <repo-url>
cd databricks-genie-workbenchdatabricks auth login --profile <workspace-profile>Do NOT run
databricks bundle init— it overwrites the project configuration.
./scripts/install.shThe installer will:
- Check prerequisites (CLI version, Node, Python, npm, uv)
- Ask for your Databricks CLI profile
- Ask for catalog (auto-discovered from your workspace)
- Ask for SQL warehouse (auto-discovered)
- Optionally configure MLflow tracing (creates or links an experiment)
- Ask for app name
- Create a fresh Lakebase Autoscaling project, choose a different new name, skip persistence, or use advanced existing-project attachment
- Write
.env.deploywith your configuration, including the default LLM endpoint - Run
scripts/deploy.shto build and deploy the app - Resolve the app's service principal
- Optionally grant the SP access to your existing Genie Spaces
Use this path when you are already working inside Databricks and do not want a local terminal, Databricks CLI profile, local Node/npm, or local uv setup.
- Clone the repo into a Databricks Git folder.
- Open
notebooks/install.py. - Set the notebook widgets:
| Widget | Required | Description |
|---|---|---|
app_name |
Yes | Databricks App name to create or update |
catalog |
Yes | Unity Catalog for GSO tables and artifacts |
warehouse_id |
Yes | SQL Warehouse ID used by the app and GSO |
lakebase_mode |
Yes | create, existing, or skip |
lakebase_project_name |
Conditional | Lakebase project name for create or existing; defaults to <app-name>-lakebase for create |
The notebook user must hold the same permissions listed in Authentication & Permissions — Installer Permissions. The notebook installer automatically grants visible Genie Spaces to the app service principal, so the notebook user needs CAN_MANAGE on each visible space they want the app to manage.
- Run the notebook from the top.
The notebook:
- Uses notebook-native Databricks auth via
WorkspaceClient() - Creates or updates the Databricks App
- Resolves the app service principal
- Generates a clean source folder under
/Workspace/Users/<you>/.genie-workbench-deploy/<app-name>/app - Excludes deploy-only files, docs, tests, notebooks,
scripts/,.git,.databricks,.env*,node_modules, andrequirements.txt - Provisions the UC schema, volume, GSO tables, CDF, and permissions
- Provisions or attaches Lakebase when requested
- Creates or updates the
gso-optimization-jobjob with the SDK/Jobs API - Renders a patched
app.yamlinto the generated source folder - Patches app OAuth scopes and resources
- Deploys the app from the generated source folder
- Grants the app SP access to visible Genie Spaces
The Git folder remains unchanged. The generated workspace folder is deployment output; do not edit it by hand. To update a notebook-installed app, pull the latest repo changes in Databricks Git and re-run notebooks/install.py from the top.
Lakebase provides persistent storage for scan history, starred spaces, and agent sessions. Without it, the app uses in-memory storage (data lost on restart).
The guided installer recommends creating a fresh Lakebase Autoscaling project
for each new app instance. It defaults to <app-name>-lakebase and, if that
name already exists, suggests a numbered fresh name instead. If you choose to
skip Lakebase, the app still deploys but history and starred spaces are stored
only in memory.
For a new or deliberately attached Lakebase project, setup is fully automated by the installer:
- Creates the Lakebase Autoscaling project via the SDK (
scripts/setup_lakebase.py) if it does not exist - Creates a Postgres role for the app's service principal
- Grants database permissions (CONNECT, CREATE ON DATABASE)
- Attaches the
postgresresource to the app via the Apps API
The local terminal path runs this through deploy.sh. The notebook path runs the same resource flow through scripts.deploy_lib.lakebase. The app creates the genie schema and tables on first startup. Since the SP executes the DDL, it owns all objects - no manual grants needed.
The local terminal installer writes the project name as GENIE_LAKEBASE_INSTANCE in
.env.deploy. The notebook installer reads the Lakebase project from widgets.
If you skip Lakebase during install, set GENIE_LAKEBASE_INSTANCE later and
run ./scripts/deploy.sh --update for the local path, or set the notebook
lakebase_mode/lakebase_project_name widgets and rerun notebooks/install.py.
Attaching an existing Lakebase project is an advanced path that requires
explicit confirmation because cross-app reuse can fail on object ownership.
Note: The GRANT step requires
psycopg[binary]in the project venv (installed byuv sync). If unavailable, the script prints the commands to run manually in the Lakebase SQL Editor.
Lakebase app state is tied to the Databricks App service principal that first
created the genie schema. For normal updates, keep GENIE_APP_NAME
unchanged and update through the same install path:
./scripts/deploy.sh --updateFor notebook-installed apps, rerun notebooks/install.py with the same
app_name and Lakebase widget values.
Do not point a new app instance at a Lakebase project that already contains a
genie schema from an older app instance. A new Databricks App gets a new
service principal, so existing tables and sequences can remain owned by the
old app principal. In that state, IQ scans can fail with:
permission denied for sequence scan_results_id_seq
If you need a new app instance, use a fresh Lakebase project name. Cross-app
Lakebase reuse is not a supported install path unless a Lakebase project owner
or workspace admin deliberately migrates ownership of the existing genie
schema, tables, and sequences.
- Pre-flight checks — validates tools, CLI profile, warehouse, catalog, app state
- Build frontend —
npm ci+npm run build(strict lockfile) - Create app —
databricks apps create(skipped if app already exists) - Sync files —
databricks sync --full+ explicitfrontend/dist/upload - Grant UC permissions — resolves app SP, creates GSO schema/tables, grants SP access, enables CDF
- Set up optimization job — builds GSO wheel, uploads notebooks, creates/finds the Databricks job, grants SP CAN_MANAGE
- Redeploy app — patches
app.yamlwith config values, configures scopes, deploys - Verify — checks critical files, waits for deployment to succeed
The notebook installer uses the shared scripts.deploy_lib Python library. It keeps app.yaml in the Git folder as a template and writes only the patched copy into the generated workspace source folder.
Key differences from deploy.sh:
- Auth uses notebook-native
WorkspaceClient(), not a local CLI profile. - App source is generated under
/Workspace/Users/<you>/.genie-workbench-deploy/<app-name>/app. requirements.txtis intentionally excluded so Databricks Apps usesuv syncfrompyproject.tomlanduv.lock.- The GSO job is created or reset through the SDK/Jobs API instead of
databricks bundle deploy. - The checked-in
app.yaml,databricks.yml,scripts/install.sh, andscripts/deploy.share not mutated.
./scripts/deploy.sh # Full deploy
./scripts/deploy.sh --update # Code-only update (skips app creation)
./scripts/deploy.sh --destroy # Tear down app and clean up jobs
./scripts/deploy.sh --destroy --auto-approve # Tear down without confirmation--update skips step 3 (app creation). Use it for iterating on code changes after the initial deploy.
--destroy deletes:
- The Databricks App
- Runtime-created jobs
- The bundle-managed optimization job
It does not remove:
- Lakebase data (the
genieschema indatabricks_postgres) - Unity Catalog schema/tables (
<catalog>.genie_space_optimizerand its tables) - Genie Space SP permissions granted during install
- MLflow experiments created during install
- Synced tables (if manually created)
Clean these up manually if you want a full teardown.
For the local terminal installer, set these in .env.deploy or as environment variables. For the notebook installer, the equivalent values come from notebook widgets.
| Variable | Required | Default | Description |
|---|---|---|---|
GENIE_WAREHOUSE_ID |
Yes | — | SQL Warehouse ID |
GENIE_CATALOG |
Yes | — | Unity Catalog name (needs CREATE SCHEMA) |
GENIE_APP_NAME |
No | genie-workbench |
Databricks App name (unique in workspace) |
GENIE_DEPLOY_PROFILE |
No | DEFAULT |
Databricks CLI profile name |
GENIE_LLM_MODEL |
No | databricks-claude-sonnet-4-6 |
Default LLM serving endpoint; users can override per Create Agent session or Auto-Optimize run when the model is in the curated compatibility list |
GENIE_LAKEBASE_INSTANCE |
No | empty | Lakebase Autoscaling project to use or create; installer defaults new installs to <app-name>-lakebase; keep stable for the same app, use a fresh project for a new app instance |
If you prefer non-interactive local terminal setup:
cat > .env.deploy <<'EOF'
GENIE_WAREHOUSE_ID=<your-sql-warehouse-id>
GENIE_CATALOG=<your-catalog-name>
GENIE_APP_NAME=genie-workbench
GENIE_DEPLOY_PROFILE=genie-workbench
GENIE_LLM_MODEL=databricks-claude-sonnet-4-6
GENIE_LAKEBASE_INSTANCE=genie-workbench-lakebase
EOF./scripts/deploy.shThe Databricks Apps platform detects package.json at the root and runs npm install then npm run build. To avoid cross-platform failures and redundant rebuilds:
- Root
postinstall: No-op. It does not invoke nested npm commands duringnpm install. - Root
build: Checks for pre-builtfrontend/dist/index.html. If present (uploaded bydeploy.sh), skips the rebuild. If dist is missing, runscd frontend && npm ci && npm run build. - Python deps: Use
uv syncon the platform (becauserequirements.txtis excluded via.databricksignore). This gives a clean venv with SHA256-verified hashes.
All dependencies are pinned to exact versions with integrity hashes. Lock files are the source of truth.
| File | Covers | Verification |
|---|---|---|
uv.lock |
Root Python transitive deps | SHA256 hashes |
packages/genie-space-optimizer/uv.lock |
GSO Python deps | SHA256 hashes |
frontend/package-lock.json |
Frontend npm deps | SHA-512 integrity |
packages/genie-space-optimizer/package-lock.json |
GSO UI npm deps | SHA-512 integrity |
uv lock --upgrade-package <package-name>
uv export --frozen --no-dev --no-hashes --format requirements-txt > requirements.txt
git add uv.lock requirements.txtDo not edit
requirements.txtmanually. It is generated fromuv.lock.
cd frontend
npm install <package>@<new-version>
# Update package.json to exact version (remove ^ prefix)
git add package.json package-lock.jsonCommitted npm lockfiles must stay registry-neutral. Keep omit-lockfile-registry-resolved=true in project .npmrc files so future updates do not commit private registry hosts. Public registry.npmjs.org lockfile URLs are safe because npm can rewrite them to the configured registry; configure private/public npm registry hosts in user or global npm config only.
# Local terminal path, first time
./scripts/install.sh
# Local terminal path, after code changes
./scripts/deploy.sh --update
# Local terminal path, tear down
./scripts/deploy.sh --destroyFor the Databricks notebook path, pull the latest repo changes in the Databricks Git folder and rerun notebooks/install.py from the top.
- Operations Guide — post-deploy monitoring and management
- Authentication & Permissions — SP permissions granted during deploy
- Troubleshooting — common deployment issues
- Environment Variables — full variable reference