Skip to content

[PFR] Integrate Databricks AppKit SDK to replace custom React + FastAPI + Lakebase stack #1

@varunrao

Description

@varunrao

Product Feature Request: AppKit Integration

Summary

The Vibe Coding Workshop app currently uses a custom-built stack (React + FastAPI + psycopg + manual OAuth + custom deployment scripts) for its Databricks App and Lakebase integration. Databricks AppKit is the official TypeScript SDK for building production-ready Databricks applications, and it provides plugin-based replacements for nearly every custom component in this repo — including Lakebase connectivity, Lakehouse SQL queries, conversational AI (Genie), and file management.

This PFR requests evaluating and integrating AppKit to replace the current hand-rolled infrastructure.


1. Current State

The app is deployed as a Databricks App using a custom multi-layer stack:

Component Current Implementation
Frontend React 19 + Vite 7 + Tailwind CSS 4 (src/components/, src/hooks/)
Backend FastAPI (Python) with monolithic route file (src/backend/api/routes.py)
Lakebase connection Custom psycopg2/psycopg3 pool with manual OAuth token refresh (src/backend/services/lakebase.py)
Lakehouse queries Manual DBSQL integration via DB_BACKEND=dbsql env var and hand-coded HTTP calls
LLM / AI OpenAI-compatible client pointed at Foundation Model API (databricks-sdk + openai)
Deployment Custom deploy.sh + setup-lakebase.sh + lakebase_manager.py + DAB templates (databricks.yml.template, app.yaml.template)
Type safety Pydantic models (Python-side only); no frontend type generation from SQL
Config management YAML fallback (prompts_config.yaml) + Lakebase tables + user-config.yaml template system

Pain points with the current approach:

  • Manual OAuth token refresh logic for Lakebase (token expires every ~60 min)
  • No automatic type generation from SQL queries — frontend/backend contract is manual
  • Monolithic routes.py (~dozens of endpoints) with no plugin separation
  • Complex multi-step deployment pipeline (deploy.sh is 300+ lines orchestrating discovery, permissions, schema setup, bundle deploy, and app resource linking)
  • Python backend limits use of AppKit's TypeScript-native ecosystem

2. Desired State: What AppKit Provides

AppKit (v0.21.0+) offers plugin-based replacements for each of the above:

Component AppKit Equivalent
Lakebase connection Lakebase Plugin — automatic OAuth token management, connection pooling, OLTP operations out-of-the-box
Lakehouse queries Analytics Plugin — type-safe SQL queries in config/queries/*.sql with automatic caching, parameterization, and npm run typegen for TypeScript types
Conversational AI Genie Plugin — Databricks AI/BI Genie interface; could enhance the workshop's AI prompt generation flow
File management Files Plugin — browse, upload, and manage files in Unity Catalog Volumes
Frontend @databricks/appkit-ui/react components — BarChart, LineChart, DataTable, useAnalyticsQuery hook
Backend Express + tRPC (TypeScript) with plugin lifecycle phases; modular by design
Deployment databricks apps initdatabricks apps deploy (single command)
Type safety End-to-end TypeScript with auto-generated appKitTypes.d.ts from SQL files
Developer experience Remote hot reload, file-based queries, AI-assisted development via Agent Skills

AppKit core principles (from official docs):

  • Highly opinionated defaults with layered extensibility
  • Zero-trust security by default
  • Production-ready from day one (built-in caching, telemetry, retry logic, error handling)
  • Optimized for both human developers and AI agents

3. Gap Analysis

Area Current App AppKit Gap
Lakebase auth Manual psycopg + WorkspaceClient.postgres.generate_database_credential() with periodic refresh Lakebase plugin handles OAuth automatically AppKit eliminates ~100 lines of custom connection code
SQL type safety None — Pydantic models manually defined npm run typegenappKitTypes.d.ts Full end-to-end type safety from SQL to React
Data visualization Custom React components with manual data fetching <BarChart queryKey="..." />, <DataTable /> with automatic query binding Declarative, type-safe visualization components
API layer FastAPI routes (Python) tRPC (TypeScript) for mutations; SQL files for reads Language shift from Python to TypeScript
Deployment deploy.sh (300+ lines) + DAB templates + lakebase_manager.py databricks apps deploy (single command) Massive simplification
LLM integration Custom OpenAI client + Foundation Model API Not built-in (would still need custom tRPC route) Partial gap — LLM integration stays custom
Workshop-specific logic Session management, leaderboard, prompt generation in Lakebase tables No equivalent — workshop domain logic is custom AppKit provides the platform; domain logic stays
Config admin UI Custom /config route for editing prompts/use cases No equivalent admin scaffolding Would need custom tRPC routes

4. Proposed Migration Path

Phase 1: Scaffold and Evaluate

  • Run databricks apps init --features analytics alongside the existing app
  • Validate that the Lakebase plugin works with the existing autoscaling Lakebase instance
  • Test npm run typegen with the existing DDL (db/lakebase/ddl/)

Phase 2: Migrate Lakebase Layer

  • Replace src/backend/services/lakebase.py with AppKit Lakebase plugin
  • Port session CRUD, workshop parameters, and use case description queries
  • Validate OAuth token lifecycle is handled automatically

Phase 3: Migrate SQL Queries to Analytics Plugin

  • Move Lakehouse queries (DBSQL) into config/queries/*.sql
  • Run npm run typegen to generate TypeScript types
  • Replace manual DBSQL HTTP calls with useAnalyticsQuery hooks

Phase 4: Migrate Frontend

  • Replace custom React components with @databricks/appkit-ui/react where applicable
  • Use <DataTable>, <BarChart>, etc. for data display
  • Keep workshop-specific UI (workflow diagram, prompt editor, leaderboard) as custom components

Phase 5: Retire Custom Infrastructure

  • Replace deploy.sh / setup-lakebase.sh / lakebase_manager.py with databricks apps deploy
  • Remove DAB template generation (databricks.yml.template, app.yaml.template, vibe2value configure)
  • Archive src/backend/ (FastAPI) in favor of AppKit's Express + tRPC server

5. Benefits

  • Reduced maintenance: Eliminate ~1,000+ lines of custom infrastructure code (Lakebase connection, deployment scripts, token refresh)
  • Official support: AppKit is the Databricks-recommended SDK — bug fixes and new features flow automatically
  • End-to-end type safety: SQL → TypeScript types → React components with zero manual model definitions
  • Plugin ecosystem: Future Databricks plugins (e.g., new data sources, auth methods) integrate with zero custom code
  • AI-optimized DX: AppKit is designed for AI-assisted development — better Agent Skills integration
  • Simpler onboarding: New contributors use databricks apps init instead of learning custom deploy scripts

6. Risks and Considerations

  • Migration effort: Significant rewrite from Python (FastAPI) backend to TypeScript (Express + tRPC)
  • LLM integration: AppKit has no built-in Foundation Model plugin — prompt generation logic stays custom
  • Workshop domain logic: Session management, leaderboard, and config admin are app-specific and won't benefit from AppKit plugins directly
  • AppKit maturity: At v0.21.0, some plugins may have gaps vs. the battle-tested custom implementation
  • Python ecosystem: Any Python-specific libraries (PyMuPDF for PDF processing) would need TypeScript alternatives or a separate service

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions