Enhancement: Backend Storage Selection During Installation
Summary
The current deployment pipeline (deploy.sh) is tightly coupled to Lakebase -- every installation creates a Lakebase instance, configures PostgreSQL roles, sets up database schemas, links app resources, and manages OAuth token rotation. While the app already has a DB_BACKEND env var and a USE_LAKEBASE toggle, these are secondary switches layered on top of a Lakebase-first deployment. There is no clean way to deploy the workshop app using only a Databricks SQL (DBSQL) warehouse as the backend, skipping Lakebase entirely.
This enhancement requests adding a backend selection step during installation that lets users choose between Lakebase and DBSQL as their data backend, with the deployment script adapting accordingly.
1. Current State: Lakebase Is Required
The deployment script (scripts/deploy.sh) has Lakebase baked into every step:
| Deploy Step |
Lakebase Dependency |
| Step 0 |
Discovers Lakebase host/endpoint, updates app.yaml with LAKEBASE_HOST, LAKEBASE_SCHEMA, ENDPOINT_NAME |
| Step 1 |
Bundle deploys Lakebase infrastructure (autoscaling project or provisioned instance via DAB) |
| Step 2a |
Grants Unity Catalog ALL_PRIVILEGES on the Lakebase catalog to the app service principal |
| Step 2b |
Creates PostgreSQL DATABRICKS_SUPERUSER roles for the app SP, deploying user, and workspace users |
| Step 2b |
Grants CAN_USE on the Lakebase instance to all workspace users |
| Step 3 |
Runs setup-lakebase.sh --recreate to create and seed PostgreSQL tables (DDL + DML) |
| Step 3b |
Tags the Lakebase catalog, schema, and tables in Unity Catalog |
| Step 4c |
Links Lakebase instance as an app resource (provisioned mode) or configures env vars for OAuth (autoscaling mode) |
| Step 5 |
Verifies and re-applies all Lakebase permissions (roles, resource link, CAN_USE) |
The backend code has partial DBSQL support:
app.yaml already defines DB_BACKEND: "dbsql", LAKEHOUSE_CATALOG, LAKEHOUSE_SCHEMA, and DATABRICKS_SQL_WAREHOUSE_HTTP_PATH
user-config.yaml has a dbsql: section with warehouse config
routes.py uses USE_LAKEBASE to gate Lakebase reads with YAML fallback
But the deploy script has no DBSQL-only path -- it always attempts Lakebase discovery, role creation, table setup, and resource linking. Skipping these steps requires manual --skip-tables --skip-permissions flags and leaves orphaned Lakebase config in app.yaml.
2. Desired State: Backend Selection at Install Time
During installation (via ./vibe2value install or ./scripts/deploy.sh), the user should be prompted to choose their backend:
Select data backend for the workshop:
[1] Lakebase (PostgreSQL) - Full database with autoscaling/provisioned modes
[2] DBSQL (SQL Warehouse) - Uses existing Databricks SQL warehouse, no Lakebase required
[3] YAML-only (No database) - Minimal install, static config from YAML files
Each option should result in a different deployment path:
Option 1: Lakebase (current behavior)
- Full Lakebase deployment as it works today
- Steps 0-5 all execute
- Session state, config, and prompts stored in PostgreSQL
Option 2: DBSQL-Only (new)
- Skip: Lakebase discovery (Step 0), Lakebase infrastructure (Step 1 Lakebase portion), PostgreSQL roles (Step 2b), table setup (Step 3), resource linking (Step 4c), Lakebase verification (Step 5)
- Keep: Bundle deploy for the app itself (Step 1 App portion), UC permissions for the DBSQL catalog (Step 2a adapted), app CAN_USE (Step 4c-ii), app code deploy (Step 4d-g)
- Add: DBSQL schema/table creation via SQL warehouse (equivalent of
setup-lakebase.sh but using databricks sql statements), warehouse permission setup
app.yaml generated with: USE_LAKEBASE: "false", DB_BACKEND: "dbsql", LAKEBASE_HOST: "", ENDPOINT_NAME: ""
- Session state and config stored in Delta tables via the SQL warehouse
Option 3: YAML-Only (new, lightweight)
- Skip: All database steps (Lakebase and DBSQL)
- Keep: Bundle deploy for the app, app CAN_USE, app code deploy
app.yaml generated with: USE_LAKEBASE: "false", DB_BACKEND: "none"
- Static config from
prompts_config.yaml, no session persistence
- Useful for quick demos or environments without database access
3. What Needs to Change
3a. user-config.yaml
Add a top-level backend selection:
database:
backend: "lakebase" # "lakebase" | "dbsql" | "none"
This already partially exists (database.backend: "dbsql" is in the current user-config.yaml) but is not used by deploy.sh.
3b. scripts/deploy.sh
Read database.backend from user-config.yaml and branch the deployment flow:
lakebase: Current full path (Steps 0-5)
dbsql: Skip Lakebase steps, add DBSQL table creation, adapt permission steps for the SQL warehouse catalog
none: Skip all database steps, deploy app with YAML fallback
Key changes:
- Wrap Steps 0, 2b, 3, 4c (resource link), 5 in
if [[ "$DB_MODE" == "lakebase" ]] guards
- Add a new DBSQL setup step that creates Delta tables equivalent to the Lakebase DDL
- Adapt Step 2a UC permissions to use the correct catalog (
LAKEBASE_CATALOG vs LAKEHOUSE_CATALOG)
3c. app.yaml / app.yaml.template
Template should conditionally include Lakebase env vars:
- Lakebase mode: Include
LAKEBASE_HOST, LAKEBASE_SCHEMA, ENDPOINT_NAME, USE_LAKEBASE: "true"
- DBSQL mode: Include only
LAKEHOUSE_CATALOG, LAKEHOUSE_SCHEMA, DATABRICKS_SQL_WAREHOUSE_HTTP_PATH, USE_LAKEBASE: "false"
- YAML mode: Minimal env vars,
USE_LAKEBASE: "false", DB_BACKEND: "none"
3d. databricks.yml / databricks.yml.template
The DAB template currently includes Lakebase resources (postgres projects/branches or database instances). These should be conditional:
- Lakebase mode: Include Lakebase resource blocks
- DBSQL mode: No Lakebase resources, optionally include SQL warehouse config
- YAML mode: App resource only
3e. Backend Service Abstraction
src/backend/services/lakebase.py is currently 2,000+ lines of PostgreSQL-specific code. For DBSQL mode, the app needs an alternative service that:
- Uses the Databricks SQL connector (or HTTP API) instead of psycopg
- Reads/writes the same data model (sessions, workshop parameters, use case descriptions, section input prompts) from Delta tables
- Implements the same interface (
execute_query, execute_insert, is_lakebase_configured, etc.)
This could be:
- A new
src/backend/services/dbsql.py with a matching interface
- A common abstract interface with Lakebase and DBSQL implementations
routes.py already uses USE_LAKEBASE to gate access, so the routing layer is partially ready
3f. DBSQL DDL Equivalent
Create DBSQL versions of the Lakebase DDL (db/lakebase/ddl/) as Delta table CREATE statements:
db/dbsql/ddl/ with equivalent table schemas using Delta Lake types
db/dbsql/dml_seed/ with equivalent seed data
- A
setup-dbsql.sh script (or extend setup-lakebase.sh) that runs these via databricks sql
4. Benefits
- Lower barrier to entry: Not all workspaces have Lakebase enabled; DBSQL is universally available
- Simpler deployment: DBSQL-only mode eliminates ~60% of the deployment script (Lakebase discovery, role management, resource linking, token rotation)
- Faster installation: No waiting for Lakebase instance provisioning or autoscaling endpoint creation
- Cost flexibility: Users can choose their price/performance tradeoff (serverless SQL warehouse vs. dedicated Lakebase instance)
- Workshop portability: YAML-only mode enables instant demos with zero infrastructure setup
5. Risks and Considerations
- Feature parity: DBSQL-backed session persistence may behave differently than PostgreSQL (e.g., transaction semantics, upsert syntax, concurrent writes)
- Schema migration: If users switch backends after initial deployment, data migration is not addressed
- DBSQL latency: SQL warehouse cold starts may add latency vs. Lakebase's always-on PostgreSQL endpoint
- Maintenance surface: Two backend implementations means two code paths to maintain and test
- vibe2value CLI: The
./vibe2value configure tool generates app.yaml and databricks.yml from templates -- it needs to understand the backend selection
References
- Current deployment script:
scripts/deploy.sh (1,735 lines, Steps 0-5 all Lakebase-dependent)
- Current Lakebase service:
src/backend/services/lakebase.py (2,000+ lines)
- Current app config:
app.yaml (already has DB_BACKEND and USE_LAKEBASE env vars)
- User config:
user-config.yaml (already has database.backend and dbsql: section)
- Lakebase DDL:
db/lakebase/ddl/
- Lakebase DML seeds:
db/lakebase/dml_seed/
- Lakebase setup script:
scripts/setup-lakebase.sh
- Lakebase manager:
scripts/lakebase_manager.py
Enhancement: Backend Storage Selection During Installation
Summary
The current deployment pipeline (
deploy.sh) is tightly coupled to Lakebase -- every installation creates a Lakebase instance, configures PostgreSQL roles, sets up database schemas, links app resources, and manages OAuth token rotation. While the app already has aDB_BACKENDenv var and aUSE_LAKEBASEtoggle, these are secondary switches layered on top of a Lakebase-first deployment. There is no clean way to deploy the workshop app using only a Databricks SQL (DBSQL) warehouse as the backend, skipping Lakebase entirely.This enhancement requests adding a backend selection step during installation that lets users choose between Lakebase and DBSQL as their data backend, with the deployment script adapting accordingly.
1. Current State: Lakebase Is Required
The deployment script (
scripts/deploy.sh) has Lakebase baked into every step:app.yamlwithLAKEBASE_HOST,LAKEBASE_SCHEMA,ENDPOINT_NAMEALL_PRIVILEGESon the Lakebase catalog to the app service principalDATABRICKS_SUPERUSERroles for the app SP, deploying user, and workspace usersCAN_USEon the Lakebase instance to all workspace userssetup-lakebase.sh --recreateto create and seed PostgreSQL tables (DDL + DML)The backend code has partial DBSQL support:
app.yamlalready definesDB_BACKEND: "dbsql",LAKEHOUSE_CATALOG,LAKEHOUSE_SCHEMA, andDATABRICKS_SQL_WAREHOUSE_HTTP_PATHuser-config.yamlhas adbsql:section with warehouse configroutes.pyusesUSE_LAKEBASEto gate Lakebase reads with YAML fallbackBut the deploy script has no DBSQL-only path -- it always attempts Lakebase discovery, role creation, table setup, and resource linking. Skipping these steps requires manual
--skip-tables --skip-permissionsflags and leaves orphaned Lakebase config inapp.yaml.2. Desired State: Backend Selection at Install Time
During installation (via
./vibe2value installor./scripts/deploy.sh), the user should be prompted to choose their backend:Each option should result in a different deployment path:
Option 1: Lakebase (current behavior)
Option 2: DBSQL-Only (new)
setup-lakebase.shbut usingdatabricks sqlstatements), warehouse permission setupapp.yamlgenerated with:USE_LAKEBASE: "false",DB_BACKEND: "dbsql",LAKEBASE_HOST: "",ENDPOINT_NAME: ""Option 3: YAML-Only (new, lightweight)
app.yamlgenerated with:USE_LAKEBASE: "false",DB_BACKEND: "none"prompts_config.yaml, no session persistence3. What Needs to Change
3a.
user-config.yamlAdd a top-level backend selection:
This already partially exists (
database.backend: "dbsql"is in the currentuser-config.yaml) but is not used bydeploy.sh.3b.
scripts/deploy.shRead
database.backendfromuser-config.yamland branch the deployment flow:lakebase: Current full path (Steps 0-5)dbsql: Skip Lakebase steps, add DBSQL table creation, adapt permission steps for the SQL warehouse catalognone: Skip all database steps, deploy app with YAML fallbackKey changes:
if [[ "$DB_MODE" == "lakebase" ]]guardsLAKEBASE_CATALOGvsLAKEHOUSE_CATALOG)3c.
app.yaml/app.yaml.templateTemplate should conditionally include Lakebase env vars:
LAKEBASE_HOST,LAKEBASE_SCHEMA,ENDPOINT_NAME,USE_LAKEBASE: "true"LAKEHOUSE_CATALOG,LAKEHOUSE_SCHEMA,DATABRICKS_SQL_WAREHOUSE_HTTP_PATH,USE_LAKEBASE: "false"USE_LAKEBASE: "false",DB_BACKEND: "none"3d.
databricks.yml/databricks.yml.templateThe DAB template currently includes Lakebase resources (postgres projects/branches or database instances). These should be conditional:
3e. Backend Service Abstraction
src/backend/services/lakebase.pyis currently 2,000+ lines of PostgreSQL-specific code. For DBSQL mode, the app needs an alternative service that:execute_query,execute_insert,is_lakebase_configured, etc.)This could be:
src/backend/services/dbsql.pywith a matching interfaceroutes.pyalready usesUSE_LAKEBASEto gate access, so the routing layer is partially ready3f. DBSQL DDL Equivalent
Create DBSQL versions of the Lakebase DDL (
db/lakebase/ddl/) as Delta table CREATE statements:db/dbsql/ddl/with equivalent table schemas using Delta Lake typesdb/dbsql/dml_seed/with equivalent seed datasetup-dbsql.shscript (or extendsetup-lakebase.sh) that runs these viadatabricks sql4. Benefits
5. Risks and Considerations
./vibe2value configuretool generatesapp.yamlanddatabricks.ymlfrom templates -- it needs to understand the backend selectionReferences
scripts/deploy.sh(1,735 lines, Steps 0-5 all Lakebase-dependent)src/backend/services/lakebase.py(2,000+ lines)app.yaml(already hasDB_BACKENDandUSE_LAKEBASEenv vars)user-config.yaml(already hasdatabase.backendanddbsql:section)db/lakebase/ddl/db/lakebase/dml_seed/scripts/setup-lakebase.shscripts/lakebase_manager.py