Skip to content

73 improving installsh experience#74

Merged
hiydavid merged 6 commits intomainfrom
73-improving-installsh-experience
Apr 18, 2026
Merged

73 improving installsh experience#74
hiydavid merged 6 commits intomainfrom
73-improving-installsh-experience

Conversation

@ryanbates99
Copy link
Copy Markdown
Collaborator

I created this code and then resolved with Claude - I'd love extra eyes on this to confirm it works properly

Improve install.sh UX and fix deploy resource configuration

Description:

This PR overhauls the install.sh interactive setup experience and fixes several bugs discovered during real-world install testing.

UX improvements (install.sh)

  • Step 2 — Profile selection: Replaced free-text profile input with a numbered menu. Profiles are displayed in sections: DEFAULT first, then a
    "Logged in" group (with checkmarks), then a "Not logged in" group. Auth status is discovered live via databricks auth profiles.
  • Step 3 — Catalog: Auto-discovers available catalogs and presents a numbered selection list instead of a free-text prompt.
  • Step 4 — Warehouse: Auto-discovers available SQL warehouses with a numbered selection list.
  • Step 5 — LLM model: Numbered model selection with Enter accepting the recommended model (databricks-claude-sonnet-4-6) as the default.
  • Step 6 — MLflow tracing: Default changed to enabled (Y). Experiment path prompt makes clear that Enter accepts the default
    /Shared/genie-workbench-agent-tracing.
  • Step 7 — Lakebase: Auto-discovers available Lakebase instances and includes a Skip option.
  • All Y/N prompts standardized to [Y/N] format.

Bug fixes

  • MLflow experiment creation was silently failing: The CLI subcommand databricks experiments create-experiment does not exist. Switched to the
    REST API (POST /api/2.0/mlflow/experiments/create) with a search-by-name fallback for experiments that already exist.
  • SQL Warehouse not assigned as App Resource: The PATCH to /api/2.0/apps/{name} included user_api_scopes, which fails silently unless the
    workspace has "user token passthrough" enabled. Removed user_api_scopes from the PATCH payload. Scopes are now set exclusively via app.yaml
    during apps deploy.
  • Lakebase defaulting to app name when empty: deploy-config.sh had LAKEBASE_INSTANCE="${GENIE_LAKEBASE_INSTANCE:-$APP_NAME}" which resolved to
    the app name when .env.deploy contained an empty string. Fixed to ${GENIE_LAKEBASE_INSTANCE:-} (no fallback).

Files changed

  • scripts/install.sh — Full UX overhaul of the installer
  • scripts/deploy-config.sh — Fix Lakebase empty-string default
  • scripts/deploy.sh — Remove user_api_scopes from app PATCH payload
  • app.yaml — Add resources block so apps deploy attaches the SQL warehouse as an App Resource

- Steps 2, 3, 4, 7: Replaced free-text prompts with numbered selection menus backed by auto-discovery. Profiles, catalogs, warehouses, and
  Lakebase instances are now fetched from the workspace and presented as a list — no more copy-pasting IDs.
  - Step 5: Added a recommended default to the model picker. Pressing Enter now accepts Claude Sonnet 4.6 without typing a number. The "Other" path
   no longer incorrectly defaulted to Claude when the user was trying to pick something different.
  - Step 6: MLflow tracing now defaults to enabled. Fixed a silent bug where databricks experiments create-experiment (a CLI subcommand that
  doesn't exist) was called and failing with no visible error, leaving MLflow disabled regardless of user input. Replaced with the REST API already
   used elsewhere in the script, and added a search-by-name fallback so re-runs don't fail when the experiment already exists.
  - Step 8: Added input validation and auto-fix for app names (enforces lowercase, hyphens only, suggests corrections).
  - Y/N prompts: Standardized all prompts to [Y/N] — previously mixed [Y/n] and [y/N] depending on which option was the default.
  - .env.deploy values: Quoted all variable assignments to handle edge cases with special characters.
  - Removed APP_NAME_DEFAULT — it was used to build the MLflow experiment path before the app name was collected, causing a mismatch if the user
  chose a different name.

  Test plan

  - Run ./scripts/install.sh end-to-end on a workspace with existing catalogs, warehouses, and Lakebase instances — verify selection menus populate
   correctly
  - Run with MLflow enabled, accept default path — verify experiment is created and ID appears in the Step 9 summary
  - Run a second time with the same MLflow path — verify the search fallback finds the existing experiment instead of failing
  - Enter an invalid app name — verify the auto-fix suggestion and confirm/reject flow works
  - Verify .env.deploy is written with quoted values
- install.sh: replace free-text prompts with numbered menus for
  profiles, catalogs, warehouses, Lakebase, and LLM model selection;
  add auth status sections (Default / Logged in / Not logged in) to
  profile picker; support Enter-to-accept for recommended model;
  default MLflow tracing to enabled; fix experiment creation to use
  REST API instead of non-existent CLI subcommand with search-by-name
  fallback; standardize Y/N prompts to [Y/N]; add app name validation
  with auto-fix; remove misleading defaults on LLM manual entry path
- app.yaml: add resources section declaring sql-warehouse with
  __WAREHOUSE_ID__ placeholder so apps deploy attaches the warehouse
  as an app resource automatically
- deploy.sh: inject __WAREHOUSE_ID__ via sed alongside other
  placeholders; remove user_api_scopes from resource PATCH (caused
  silent failure on workspaces without user token passthrough)
- deploy-config.sh: fix LAKEBASE_INSTANCE default from app name to
  empty so skipping Lakebase during install is respected on redeploy

Co-authored-by: Isaac
- install.sh: kept auto-discovery approach for Lakebase instances
  (main used APP_NAME_DEFAULT prompt which was removed in this branch)
- deploy.sh: took main's Lakebase Autoscaling setup and postgres
  resource resolution blocks; kept this branch's fix of omitting
  user_api_scopes from the resources PATCH (fails on workspaces
  without user token passthrough enabled)

Co-authored-by: Isaac
@ryanbates99 ryanbates99 linked an issue Apr 16, 2026 that may be closed by this pull request
@ryanbates99
Copy link
Copy Markdown
Collaborator Author

Test Plan

First-time install (./scripts/install.sh)

Step 2 — Profile selection

  • Profiles list displays in sections: DEFAULT first, then "Logged in" (with checkmarks), then "Not logged in"
  • Entering a valid number selects the correct profile
  • After selection, confirm message shows both the email and profile name (e.g. Authenticated as user@example.com (profile: my-profile))
  • Invalid input (letters, out-of-range number) shows an error and re-prompts

Step 3 — Catalog

  • Available catalogs are auto-discovered and shown as a numbered list
  • Selecting a number sets the correct catalog

Step 4 — SQL Warehouse

  • Available warehouses are auto-discovered and shown as a numbered list
  • Selecting a number sets the correct warehouse

Step 5 — LLM model

  • Pressing Enter (no input) accepts databricks-claude-sonnet-4-6 as the default without typing anything
  • Selecting a different number picks the correct model

Step 6 — MLflow tracing

  • Default answer is Y (enabled), not N
  • Pressing Enter accepts Y and proceeds to the experiment path prompt
  • Pressing Enter at the experiment path prompt accepts /Shared/genie-workbench-agent-tracing
  • After completing install, the summary at Step 9 shows MLflow as enabled with the correct experiment ID (not )
  • Entering N skips MLflow and Step 9 shows

Step 7 — Lakebase

  • If no Lakebase instances exist, the skip option is shown and selecting it leaves Lakebase empty
  • If instances exist, selecting one sets it correctly
  • After install with no Lakebase selected, deploying does not attempt to attach a Lakebase resource

Deploy (./scripts/deploy.sh)

SQL Warehouse as App Resource

  • After a full deploy, go to the Databricks App in the UI and check App Resources — the SQL warehouse should be listed automatically (not
    requiring manual addition)
  • The deploy log shows ✓ app.yaml patched (WAREHOUSE=, ...) with the correct warehouse ID

Lakebase empty-string fix

  • If .env.deploy has GENIE_LAKEBASE_INSTANCE="", deploy does not attempt to attach a Lakebase postgres resource and does not use the app name as
    a fallback instance name

No regression on existing deploys

  • ./scripts/deploy.sh --update completes without errors on a workspace that was previously deployed
  • App starts and loads correctly after deploy

@hiydavid
Copy link
Copy Markdown
Collaborator

Lakebase discovery (Step 7) queries the wrong API

install.sh Step 7 lists Lakebase Provisioned instances:

databricks api get /api/2.0/database/instances

But the entire downstream pipeline — deploy.sh, setup_lakebase.py, and the Lakebase database resolution — all treat LAKEBASE_INSTANCE as a Lakebase Autoscaling project name:

# deploy.sh
--project-name "$LAKEBASE_INSTANCE"
databricks api get "/api/2.0/postgres/projects/$LAKEBASE_INSTANCE/branches/production/databases"

# setup_lakebase.py
w.postgres.get_project(name=f"projects/{project_name}")
w.postgres.create_project(project_id=project_name, ...)

A user could select a provisioned instance from the menu, then deploy.sh would call the Autoscaling project API with that name — resulting in a 404 or hitting a different resource entirely.

The discovery call should be /api/2.0/postgres/projects to list Autoscaling projects, matching what the rest of the pipeline expects.

Updated the code to only support Lakebase autoscaling vs provisioned lakebase
@ryanbates99
Copy link
Copy Markdown
Collaborator Author

@hiydavid This has been updated!

Copy link
Copy Markdown
Collaborator

@hiydavid hiydavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing user_api_scopes in deploy PATCH

The PR removed user_api_scopes from the deploy.sh PATCH payload (line ~562) with the assumption that databricks apps deploy applies scopes from app.yaml. It doesn't — scopes defined in app.yaml are documentation-only; the API PATCH is what actually configures them on the app.

Result: After a fresh deploy, the app only has the two default scopes (iam.current-user:read, iam.access-control:read) and is missing all the required ones:

  • sql
  • dashboards.genie
  • serving.serving-endpoints
  • catalog.catalogs:read
  • catalog.schemas:read
  • catalog.tables:read
  • files.files

Fix: Add user_api_scopes back to the PATCH payload in deploy.sh (~line 562-595). The scopes list that was removed needs to be restored:

scopes = ['sql', 'dashboards.genie', 'serving.serving-endpoints',
           'catalog.catalogs:read', 'catalog.schemas:read',
           'catalog.tables:read', 'files.files']

And the print at the end should include them again:

print(json.dumps({'user_api_scopes': scopes, 'resources': list(by_name.values())}))

The comment about "user token passthrough" in the old code was incorrect — user_api_scopes on the PATCH API works in all workspaces. That's a separate feature from what the PR description describes.

The PATCH to /api/2.0/apps/{name} is the mechanism that actually
configures OAuth scopes on a Databricks App. app.yaml user_api_scopes
are not applied by apps deploy. The prior commit removed user_api_scopes
based on incorrect reasoning about user token passthrough — these scopes
work in all workspaces.

Without this, deployed apps only get the two default scopes
(iam.current-user:read, iam.access-control:read) and are missing the
seven required ones: sql, dashboards.genie, serving.serving-endpoints,
catalog.catalogs:read, catalog.schemas:read, catalog.tables:read,
files.files.

Co-authored-by: Isaac
@ryanbates99
Copy link
Copy Markdown
Collaborator Author

Fixed in 42d4722. Restored user_api_scopes to the PATCH payload with the seven required scopes. The user token passthrough reasoning was incorrect — the PATCH API configures scopes in all workspaces, and app.yaml scopes are not applied by apps deploy. Updated the comment to reflect this correctly.

- _prompt_yn now shows which option Enter accepts ([Y/N, Enter=Y])
  instead of the ambiguous static [Y/N] hint
- PROFILES_EXIT is now captured correctly by running databricks auth
  profiles before the pipeline so the exit code reaches the parent shell
  (process substitution subshell prevented the original assignment)
- MLflow experiment path is passed via sys.argv / os.environ instead
  of shell-interpolated into Python/JSON literals, preventing breakage
  on paths containing single or double quotes

Co-authored-by: Isaac
@hiydavid hiydavid self-requested a review April 18, 2026 15:00
@hiydavid hiydavid merged commit d164842 into main Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improving install.sh Experience

2 participants