|
1 | 1 | # Genie Workbench |
2 | 2 |
|
3 | | -Databricks App for creating, scoring, and optimizing Genie Spaces. FastAPI backend + React/Vite frontend deployed together on Databricks Apps. |
| 3 | +## Project Overview |
4 | 4 |
|
5 | | -## Commands |
| 5 | +Genie Workbench is a Databricks App that acts as a quality control and optimization platform for Genie Space administrators. It helps builders understand why their Genie Space isn't performing well and fix it. |
6 | 6 |
|
7 | | -```bash |
8 | | -# Backend (from project root) |
9 | | -uv pip install -e . # Install Python deps |
10 | | -uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload # Dev server |
11 | | - |
12 | | -# Frontend (from frontend/) |
13 | | -cd frontend && npm install && npm run build # Build for production |
14 | | -cd frontend && npm run dev # Vite dev server (port 5173, proxies /api to :8000) |
15 | | -cd frontend && npm run lint # ESLint |
16 | | - |
17 | | -# Full build (what Databricks Apps runs) |
18 | | -npm install # Triggers postinstall -> cd frontend && npm install |
19 | | -npm run build # Triggers cd frontend && npm run build |
20 | | - |
21 | | -# Deploy |
22 | | -databricks sync --watch . /Workspace/Users/<email>/genie-workbench |
23 | | -databricks apps deploy <app-name> --source-code-path /Workspace/Users/<email>/genie-workbench |
24 | | - |
25 | | -# Tests (require running backend at localhost:8000) |
26 | | -python tests/test_e2e_local.py # E2E create agent tests |
27 | | -python tests/test_full_schema.py # Schema validation |
28 | | -# Deployed E2E tests require: pip install playwright && playwright install chromium |
29 | | -python tests/test_e2e_deployed.py |
30 | | -``` |
31 | | - |
32 | | -## Architecture |
33 | | - |
34 | | -``` |
35 | | -backend/ |
36 | | - main.py # FastAPI app entry point, OBO middleware, static file serving |
37 | | - models.py # All Pydantic models (shared between routers/services) |
38 | | - routers/ |
39 | | - analysis.py # /api/space/*, /api/analyze/*, /api/optimize, /api/genie/*, /api/sql/* |
40 | | - spaces.py # /api/spaces/* (list, scan, history, star, fix) |
41 | | - admin.py # /api/admin/* (dashboard, leaderboard, alerts) |
42 | | - auth.py # /api/auth/me |
43 | | - create.py # /api/create/* (agent chat, UC discovery, wizard) |
44 | | - services/ |
45 | | - auth.py # OBO auth (ContextVar), SP fallback, WorkspaceClient mgmt |
46 | | - genie_client.py # Databricks Genie API (fetch space, list spaces, query for SQL) |
47 | | - scanner.py # Rule-based IQ scoring engine (0-100, 4 dimensions) |
48 | | - analyzer.py # LLM-based deep analysis against best-practices checklist |
49 | | - optimizer.py # LLM-based optimization from benchmark feedback |
50 | | - fix_agent.py # LLM agent that generates JSON patches and applies via Genie API |
51 | | - create_agent.py # Multi-turn LLM agent for creating new Genie Spaces |
52 | | - create_agent_session.py # Session persistence for create agent (Lakebase) |
53 | | - create_agent_tools.py # Tool definitions for create agent (UC discovery, SQL, etc.) |
54 | | - lakebase.py # PostgreSQL persistence (asyncpg pool, in-memory fallback) |
55 | | - llm_utils.py # OpenAI-compatible LLM client via Databricks serving endpoints |
56 | | - uc_client.py # Unity Catalog browsing (catalogs, schemas, tables) |
57 | | - prompts/ # Prompt templates for analysis |
58 | | - prompts_create/ # Prompt templates for create agent (multi-file, modular) |
59 | | - references/schema.md # Genie Space JSON schema reference |
60 | | -frontend/ |
61 | | - src/ |
62 | | - App.tsx # Root: SpaceList | SpaceDetail | AdminDashboard | CreateAgentChat |
63 | | - lib/api.ts # All API calls (fetch, SSE streaming helpers) |
64 | | - types/index.ts # TypeScript types mirroring backend Pydantic models |
65 | | - components/ # UI components (analysis, optimization, fix agent, etc.) |
66 | | - pages/ # SpaceList, SpaceDetail, AdminDashboard, HistoryTab, IQScoreTab |
67 | | - hooks/ # useAnalysis, useTheme |
68 | | - vite.config.ts # Vite config with /api proxy to localhost:8000 |
69 | | -``` |
70 | | - |
71 | | -## Key Patterns |
72 | | - |
73 | | -### Authentication (OBO) |
74 | | -On Databricks Apps, user identity flows via `x-forwarded-access-token` header. `OBOAuthMiddleware` in `main.py` stores the token in a `ContextVar`. All services call `get_workspace_client()` which returns the OBO client if set, otherwise the SP singleton. Some Genie API calls require SP auth (missing `genie` OAuth scope) — see `_is_scope_error()` fallback in `genie_client.py`. |
75 | | - |
76 | | -### SSE Streaming |
77 | | -Multiple endpoints use `StreamingResponse` with `text/event-stream`: |
78 | | -- `/api/analyze/stream` — analysis progress |
79 | | -- `/api/optimize` — optimization with heartbeat keepalives (15s) |
80 | | -- `/api/spaces/{id}/fix` — fix agent patches |
81 | | -- `/api/create/agent/chat` — multi-turn agent with typed events (session, step, thinking, tool_call, tool_result, message_delta, message, created, error, done) |
82 | | - |
83 | | -Frontend consumes these via manual `fetch` + `ReadableStream` in `lib/api.ts` (not EventSource). Buffer splitting on `\n\n`. |
| 7 | +- **Backend:** Python (FastAPI), deployed as a Databricks App |
| 8 | +- **Frontend:** React/TypeScript (Vite) |
| 9 | +- **Storage:** Lakebase (with in-memory fallback for local dev) |
| 10 | +- **Tracing:** Optional MLflow integration |
84 | 11 |
|
85 | | -### Lakebase Persistence |
86 | | -`services/lakebase.py` uses asyncpg with graceful fallback to in-memory dicts when `LAKEBASE_HOST` is not set. Credentials auto-generated via Databricks SDK (`/api/2.0/database/credentials`). Schema defined in `sql/setup_lakebase.sql`. |
| 12 | +## GenieRX Specification |
87 | 13 |
|
88 | | -### LLM Calls |
89 | | -All LLM calls go through Databricks model serving endpoints using OpenAI-compatible API. Model configured via `LLM_MODEL` env var (default: `databricks-claude-sonnet-4-6`). MLflow tracing is optional — controlled by `MLFLOW_EXPERIMENT_ID`. |
| 14 | +The GenieRX spec (`docs/genierx-spec.md`) defines the core analysis and recommendation framework used throughout this project. **Always consult it when working on analysis, scoring, or recommendation features.** |
90 | 15 |
|
91 | | -## Environment Variables |
| 16 | +Key concepts from the spec: |
92 | 17 |
|
93 | | -Defined in `app.yaml`. Key ones: |
94 | | -- `SQL_WAREHOUSE_ID` — from app resource `sql-warehouse` |
95 | | -- `LLM_MODEL` — serving endpoint name |
96 | | -- `LAKEBASE_HOST`, `LAKEBASE_PORT`, `LAKEBASE_DATABASE`, `LAKEBASE_INSTANCE_NAME` — Lakebase config |
97 | | -- `MLFLOW_EXPERIMENT_ID` — enables MLflow tracing (validated at startup, cleared if invalid) |
98 | | -- `GENIE_TARGET_DIRECTORY` — where new spaces are created (default `/Shared/`) |
99 | | -- `DEV_USER_EMAIL` — local dev only |
| 18 | +- **Authoritative Facts** — raw data from systems of record, safe to surface directly |
| 19 | +- **Canonical Metrics** — governed KPIs with stable definitions and cross-team agreement |
| 20 | +- **Heuristic Signals** — derived fields with subjective thresholds; must always carry caveats |
100 | 21 |
|
101 | | -Local dev uses `.env.local` (loaded first with override) then `.env`. |
| 22 | +When implementing or modifying any analyzer, scorer, or recommender logic, ensure field classifications align with this taxonomy. Heuristic signals must never be presented as authoritative facts in Genie answers. |
102 | 23 |
|
103 | | -## Dev/Test Workflow |
| 24 | +## Key Documentation |
104 | 25 |
|
105 | | -There is no local dev server — all testing is done by syncing code to Databricks and redeploying: |
| 26 | +- `docs/genierx-spec.md` — GenieRX analyzer/recommender specification |
| 27 | +- `docs/genie-space-schema.md` — Genie space schema reference |
| 28 | +- `docs/checklist-by-schema.md` — Analysis checklist organized by schema section |
| 29 | +- `CUJ.md` — Core user journeys and product analysis |
106 | 30 |
|
107 | | -1. Edit code locally |
108 | | -2. `databricks sync --watch . /Workspace/Users/<email>/genie-workbench` picks up changes automatically |
109 | | -3. Re-run `databricks apps deploy <app-name> --source-code-path /Workspace/Users/<email>/genie-workbench` to trigger a new deployment |
110 | | -4. Test in the deployed Databricks App |
| 31 | +## Development |
111 | 32 |
|
112 | | -Do NOT suggest running `uvicorn` or `npm run dev` locally. The app depends on Databricks-managed resources (OBO auth, Lakebase, serving endpoints) that aren't available outside a Databricks App environment. |
113 | | - |
114 | | -## Gotchas |
115 | | - |
116 | | -- **frontend/dist/ is gitignored but NOT databricksignored** — the built React app must be synced to workspace for deployment. Build before `databricks sync`. |
117 | | -- **`.databricksignore` excludes `*.md`** but explicitly includes `backend/references/schema.md` (needed at runtime by the analyzer). |
118 | | -- **OBO ContextVar and streaming** — for SSE endpoints, the ContextVar is NOT cleared after `call_next` because the response streams lazily. Streaming handlers stash the token on `request.state` and re-set it inside the generator. |
119 | | -- **Two separate "analysis" paths** — IQ Scan (`scanner.py`, rule-based, instant) and Deep Analysis (`analyzer.py`, LLM-based, streaming). They produce different outputs and don't cross-reference. |
120 | | -- **Two separate "fix" paths** — Fix Agent (from scan findings, auto-applies patches) and Optimize flow (from benchmark labeling, produces suggestions for a new space). They're independent. |
121 | | -- **Vite proxy** — dev frontend at :5173 proxies `/api` to :8000. In production, FastAPI serves static files from `frontend/dist/` directly. |
122 | | -- **Python 3.11+** required (`pyproject.toml`). Uses `uv` for dependency management (`uv.lock` present). |
123 | | -- **Root `package.json`** exists solely as a build hook for Databricks Apps — `postinstall` chains to `frontend/npm install`, `build` chains to `frontend/npm run build`. |
| 33 | +```bash |
| 34 | +# Backend (from repo root) |
| 35 | +uv run start-server |
124 | 36 |
|
125 | | -## Code Style |
| 37 | +# Frontend |
| 38 | +cd frontend && npm run dev |
| 39 | +``` |
126 | 40 |
|
127 | | -- Backend: Python, Pydantic models, FastAPI routers, no class-based views |
128 | | -- Frontend: React 19 + TypeScript + Tailwind CSS v4 + Vite 7, functional components only |
129 | | -- UI primitives in `frontend/src/components/ui/` (button, card, badge, etc.) using `class-variance-authority` |
130 | | -- Path alias `@` maps to `frontend/src/` (configured in `vite.config.ts` and `tsconfig.app.json`) |
131 | | -- All API routes prefixed with `/api` |
132 | | -- Pydantic models in `backend/models.py`, TypeScript mirrors in `frontend/src/types/index.ts` — keep in sync |
| 41 | +Frontend runs at `localhost:5173`, proxies API calls to backend at `localhost:8000`. |
0 commit comments