Skip to content

[Enhancement] Allow DBSQL-only installation without Lakebase dependency #3

@varunrao

Description

@varunrao

Enhancement: Backend Storage Selection During Installation

Summary

The current deployment pipeline (deploy.sh) is tightly coupled to Lakebase -- every installation creates a Lakebase instance, configures PostgreSQL roles, sets up database schemas, links app resources, and manages OAuth token rotation. While the app already has a DB_BACKEND env var and a USE_LAKEBASE toggle, these are secondary switches layered on top of a Lakebase-first deployment. There is no clean way to deploy the workshop app using only a Databricks SQL (DBSQL) warehouse as the backend, skipping Lakebase entirely.

This enhancement requests adding a backend selection step during installation that lets users choose between Lakebase and DBSQL as their data backend, with the deployment script adapting accordingly.


1. Current State: Lakebase Is Required

The deployment script (scripts/deploy.sh) has Lakebase baked into every step:

Deploy Step Lakebase Dependency
Step 0 Discovers Lakebase host/endpoint, updates app.yaml with LAKEBASE_HOST, LAKEBASE_SCHEMA, ENDPOINT_NAME
Step 1 Bundle deploys Lakebase infrastructure (autoscaling project or provisioned instance via DAB)
Step 2a Grants Unity Catalog ALL_PRIVILEGES on the Lakebase catalog to the app service principal
Step 2b Creates PostgreSQL DATABRICKS_SUPERUSER roles for the app SP, deploying user, and workspace users
Step 2b Grants CAN_USE on the Lakebase instance to all workspace users
Step 3 Runs setup-lakebase.sh --recreate to create and seed PostgreSQL tables (DDL + DML)
Step 3b Tags the Lakebase catalog, schema, and tables in Unity Catalog
Step 4c Links Lakebase instance as an app resource (provisioned mode) or configures env vars for OAuth (autoscaling mode)
Step 5 Verifies and re-applies all Lakebase permissions (roles, resource link, CAN_USE)

The backend code has partial DBSQL support:

  • app.yaml already defines DB_BACKEND: "dbsql", LAKEHOUSE_CATALOG, LAKEHOUSE_SCHEMA, and DATABRICKS_SQL_WAREHOUSE_HTTP_PATH
  • user-config.yaml has a dbsql: section with warehouse config
  • routes.py uses USE_LAKEBASE to gate Lakebase reads with YAML fallback

But the deploy script has no DBSQL-only path -- it always attempts Lakebase discovery, role creation, table setup, and resource linking. Skipping these steps requires manual --skip-tables --skip-permissions flags and leaves orphaned Lakebase config in app.yaml.


2. Desired State: Backend Selection at Install Time

During installation (via ./vibe2value install or ./scripts/deploy.sh), the user should be prompted to choose their backend:

Select data backend for the workshop:
  [1] Lakebase (PostgreSQL) - Full database with autoscaling/provisioned modes
  [2] DBSQL (SQL Warehouse) - Uses existing Databricks SQL warehouse, no Lakebase required
  [3] YAML-only (No database) - Minimal install, static config from YAML files

Each option should result in a different deployment path:

Option 1: Lakebase (current behavior)

  • Full Lakebase deployment as it works today
  • Steps 0-5 all execute
  • Session state, config, and prompts stored in PostgreSQL

Option 2: DBSQL-Only (new)

  • Skip: Lakebase discovery (Step 0), Lakebase infrastructure (Step 1 Lakebase portion), PostgreSQL roles (Step 2b), table setup (Step 3), resource linking (Step 4c), Lakebase verification (Step 5)
  • Keep: Bundle deploy for the app itself (Step 1 App portion), UC permissions for the DBSQL catalog (Step 2a adapted), app CAN_USE (Step 4c-ii), app code deploy (Step 4d-g)
  • Add: DBSQL schema/table creation via SQL warehouse (equivalent of setup-lakebase.sh but using databricks sql statements), warehouse permission setup
  • app.yaml generated with: USE_LAKEBASE: "false", DB_BACKEND: "dbsql", LAKEBASE_HOST: "", ENDPOINT_NAME: ""
  • Session state and config stored in Delta tables via the SQL warehouse

Option 3: YAML-Only (new, lightweight)

  • Skip: All database steps (Lakebase and DBSQL)
  • Keep: Bundle deploy for the app, app CAN_USE, app code deploy
  • app.yaml generated with: USE_LAKEBASE: "false", DB_BACKEND: "none"
  • Static config from prompts_config.yaml, no session persistence
  • Useful for quick demos or environments without database access

3. What Needs to Change

3a. user-config.yaml

Add a top-level backend selection:

database:
  backend: "lakebase"  # "lakebase" | "dbsql" | "none"

This already partially exists (database.backend: "dbsql" is in the current user-config.yaml) but is not used by deploy.sh.

3b. scripts/deploy.sh

Read database.backend from user-config.yaml and branch the deployment flow:

  • lakebase: Current full path (Steps 0-5)
  • dbsql: Skip Lakebase steps, add DBSQL table creation, adapt permission steps for the SQL warehouse catalog
  • none: Skip all database steps, deploy app with YAML fallback

Key changes:

  • Wrap Steps 0, 2b, 3, 4c (resource link), 5 in if [[ "$DB_MODE" == "lakebase" ]] guards
  • Add a new DBSQL setup step that creates Delta tables equivalent to the Lakebase DDL
  • Adapt Step 2a UC permissions to use the correct catalog (LAKEBASE_CATALOG vs LAKEHOUSE_CATALOG)

3c. app.yaml / app.yaml.template

Template should conditionally include Lakebase env vars:

  • Lakebase mode: Include LAKEBASE_HOST, LAKEBASE_SCHEMA, ENDPOINT_NAME, USE_LAKEBASE: "true"
  • DBSQL mode: Include only LAKEHOUSE_CATALOG, LAKEHOUSE_SCHEMA, DATABRICKS_SQL_WAREHOUSE_HTTP_PATH, USE_LAKEBASE: "false"
  • YAML mode: Minimal env vars, USE_LAKEBASE: "false", DB_BACKEND: "none"

3d. databricks.yml / databricks.yml.template

The DAB template currently includes Lakebase resources (postgres projects/branches or database instances). These should be conditional:

  • Lakebase mode: Include Lakebase resource blocks
  • DBSQL mode: No Lakebase resources, optionally include SQL warehouse config
  • YAML mode: App resource only

3e. Backend Service Abstraction

src/backend/services/lakebase.py is currently 2,000+ lines of PostgreSQL-specific code. For DBSQL mode, the app needs an alternative service that:

  • Uses the Databricks SQL connector (or HTTP API) instead of psycopg
  • Reads/writes the same data model (sessions, workshop parameters, use case descriptions, section input prompts) from Delta tables
  • Implements the same interface (execute_query, execute_insert, is_lakebase_configured, etc.)

This could be:

  • A new src/backend/services/dbsql.py with a matching interface
  • A common abstract interface with Lakebase and DBSQL implementations
  • routes.py already uses USE_LAKEBASE to gate access, so the routing layer is partially ready

3f. DBSQL DDL Equivalent

Create DBSQL versions of the Lakebase DDL (db/lakebase/ddl/) as Delta table CREATE statements:

  • db/dbsql/ddl/ with equivalent table schemas using Delta Lake types
  • db/dbsql/dml_seed/ with equivalent seed data
  • A setup-dbsql.sh script (or extend setup-lakebase.sh) that runs these via databricks sql

4. Benefits

  • Lower barrier to entry: Not all workspaces have Lakebase enabled; DBSQL is universally available
  • Simpler deployment: DBSQL-only mode eliminates ~60% of the deployment script (Lakebase discovery, role management, resource linking, token rotation)
  • Faster installation: No waiting for Lakebase instance provisioning or autoscaling endpoint creation
  • Cost flexibility: Users can choose their price/performance tradeoff (serverless SQL warehouse vs. dedicated Lakebase instance)
  • Workshop portability: YAML-only mode enables instant demos with zero infrastructure setup

5. Risks and Considerations

  • Feature parity: DBSQL-backed session persistence may behave differently than PostgreSQL (e.g., transaction semantics, upsert syntax, concurrent writes)
  • Schema migration: If users switch backends after initial deployment, data migration is not addressed
  • DBSQL latency: SQL warehouse cold starts may add latency vs. Lakebase's always-on PostgreSQL endpoint
  • Maintenance surface: Two backend implementations means two code paths to maintain and test
  • vibe2value CLI: The ./vibe2value configure tool generates app.yaml and databricks.yml from templates -- it needs to understand the backend selection

References

  • Current deployment script: scripts/deploy.sh (1,735 lines, Steps 0-5 all Lakebase-dependent)
  • Current Lakebase service: src/backend/services/lakebase.py (2,000+ lines)
  • Current app config: app.yaml (already has DB_BACKEND and USE_LAKEBASE env vars)
  • User config: user-config.yaml (already has database.backend and dbsql: section)
  • Lakebase DDL: db/lakebase/ddl/
  • Lakebase DML seeds: db/lakebase/dml_seed/
  • Lakebase setup script: scripts/setup-lakebase.sh
  • Lakebase manager: scripts/lakebase_manager.py

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions