databricks-solutions · forrestmurray-db · Jan 7, 2026 · Jan 7, 2026 · Jan 7, 2026 · Jan 12, 2026
diff --git a/.claude/commands/verify.md b/.claude/commands/verify.md
diff --git a/.claude/settings.json b/.claude/settings.json
@@ -5,17 +5,17 @@
       "Bash(git push --force:*)",
       "Bash(git reset --hard:*)",
       "Bash(alembic revision:*)",
-      "Bash(alembic downgrade:*)",
-      "Edit(specs/*)",
-      "Write(specs/*)"
+      "Bash(alembic downgrade:*)"
     ],
     "ask": [
       "Bash(git push:*)",
       "Bash(alembic upgrade:*)",
       "Edit(server/models/*)",
       "Edit(alembic/*)",
       "Edit(.github/*)",
-      "Write(alembic/*)"
+      "Write(alembic/*)",
+      "Edit(specs/*)",
+      "Write(specs/*)"
     ],
     "allow": [
       "Bash(just:*)",
@@ -29,4 +29,4 @@
       "Read(*)"
     ]
   }
-}
+}
diff --git a/.claude/skills/verification-testing/SKILL.md b/.claude/skills/verification-testing/SKILL.md
@@ -1,97 +1,156 @@
 ---
 name: verification-testing
-description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) running tests after code changes, (2) writing new unit tests (pytest/vitest), (3) writing E2E tests with Playwright/TestScenario, (4) debugging test failures, (5) understanding what to mock in E2E tests, (6) verifying a feature implementation. Covers the full test pyramid: unit tests -> integration tests -> E2E tests."
+description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) checking implementation progress against specs, (2) running tests after code changes, (3) writing new tests, (4) debugging test failures. Covers unit tests, integration tests, and E2E tests."
 ---
 
 # Verification & Testing
 
-## Quick Verification Commands
+*IMPORTANT:* BEHAVIORS WHICH AREN'T YET IMPLEMENTED OR STUBBED WITH A PLACEHOLDER SHOULD NOT YIELD PASSING TESTS!
 
-Run these commands to verify code changes:
+## Common Questions (Start Here)
 
-| Command | Purpose | When to Use |
-|---------|---------|-------------|
-| `just test-server` | Python unit tests | After backend changes |
-| `just ui-test-unit` | React unit tests | After frontend changes |
-| `just ui-lint` | TypeScript/ESLint | Before committing |
-| `just e2e` | Full E2E tests | After any feature change |
+### "How far along is SPEC_NAME implementation?"
 
-## Verification Workflow
+**Use `just spec-status` - do NOT run tests unnecessarily.**
 
-### After Implementing a Feature
+```bash
+just spec-status SPEC_NAME
+```
 
-1. **Read the relevant spec** in `specs/` to understand success criteria
-2. **Run unit tests** for the layer you changed:
-   - Backend: `just test-server`
-   - Frontend: `just ui-test-unit`
-3. **Run linting**: `just ui-lint`
-4. **Run E2E tests**: `just e2e`
-5. **Add new tests** if the feature isn't covered
+This shows coverage percentage and any recent test results. To get detailed uncovered requirements:
 
-## Reference Files
+```bash
+just spec-coverage --specs SPEC_NAME --json | jq '{
+  coverage: .specs.SPEC_NAME.coverage_percent,
+  covered: .specs.SPEC_NAME.covered_count,
+  total: .specs.SPEC_NAME.requirement_count,
+  uncovered: .specs.SPEC_NAME.uncovered
+}'
+```
+
+**Summarize results for the user** - don't just dump JSON output.
+
+### "Which tests cover SPEC_NAME?"
+
+```bash
+# Python tests
+grep -r "@pytest.mark.spec(\"SPEC_NAME\")" tests/
+
+# E2E tests
+grep -l "@spec:SPEC_NAME" client/tests/e2e/*.spec.ts
+
+# All test counts by type
+just spec-coverage --specs SPEC_NAME --json | jq '.specs.SPEC_NAME.tests_by_type'
+```
 
-| Reference | Purpose | When to Read |
-|-----------|---------|--------------|
-| `e2e-patterns.md` | TestScenario builder API | When writing E2E tests |
-| `mocking.md` | E2E mocking + MLflow/external service mocking | When adding new endpoints or testing integrations |
-| `unit-tests.md` | pytest and vitest patterns | When writing unit tests |
+### "Are the tests passing?"
 
-## Key Concepts
+**Only run tests if the user asks to verify implementation works**, not just to check progress.
 
-### Test Pyramid
+```bash
+# After running tests, get concise summary
+just test-summary
 
+# Or filter by spec
+just test-summary --spec SPEC_NAME
 ```
-        ┌─────────┐
-        │   E2E   │  ← Playwright (slow, high confidence)
-        └────┬────┘
-     ┌───────┴───────┐
-     │  Integration  │  ← API tests (medium speed)
-     └───────┬───────┘
-┌────────────┴────────────┐
-│       Unit Tests        │  ← pytest/vitest (fast)
-└─────────────────────────┘
+
+### "Which requirements are uncovered?"
+
+```bash
+just spec-coverage --json | jq '.specs | to_entries[] | select(.value.uncovered | length > 0) | {spec: .key, uncovered: .value.uncovered}'
 ```
 
-### E2E Mocking Strategy
+---
 
-**Mock by default** - The test infrastructure mocks all API calls unless you opt out:
+## Quick Commands Reference
 
-```typescript
-// Everything mocked (default)
-const scenario = await TestScenario.create(page)
-  .withWorkshop()
-  .build();
-
-// Selective real API
-const scenario = await TestScenario.create(page)
-  .withWorkshop()
-  .withReal('/users/auth/login')  // Only auth is real
-  .build();
-
-// Full integration (no mocks)
-const scenario = await TestScenario.create(page)
-  .withWorkshop()
-  .withRealApi()
-  .build();
+| Command | Purpose |
+|---------|---------|
+| `just spec-status SPEC_NAME` | Coverage + recent test results for a spec |
+| `just spec-coverage` | Full coverage report (all specs) |
+| `just spec-coverage --affected` | Coverage for specs affected by recent changes |
+| `just test-summary` | Concise test results after running tests |
+| `just test-server` | Run all Python unit tests |
+| `just ui-test-unit` | Run all React unit tests |
+| `just e2e` | Run all E2E tests |
+| `just e2e-spec SPEC_NAME` | Run E2E tests for a specific spec |
+
+## Running Tests for a Specific Spec
+
+```bash
+# Python unit tests
+just test-server-spec SPEC_NAME
+
+# React unit tests
+just ui-test-unit-spec SPEC_NAME
+
+# E2E tests (headless by default)
+just e2e-spec SPEC_NAME
+
+# E2E with visible browser
+just e2e-spec SPEC_NAME headed
+```
+
+### E2E Timeout Configuration
+
+If tests are timing out, increase Playwright timeouts via environment variables:
+
+```bash
+# Increase test timeout (default: 30s) and expect timeout (default: 5s)
+PW_TEST_TIMEOUT=60000 PW_EXPECT_TIMEOUT=10000 just e2e-spec SPEC_NAME
 ```
 
-### Adding Mocks for New Endpoints
+Server logs are suppressed by default during E2E tests. To view them:
+- Check `.test-results/api-server.log` and `.test-results/ui-server.log`
+- Or run servers manually with `just e2e-servers` (logs to stdout)
+
+## Test Tagging
+
+All tests must be tagged with spec markers:
+
+**Python (pytest):**
+```python
+@pytest.mark.spec("SPEC_NAME")
+@pytest.mark.req("requirement text")  # optional, links to specific requirement
+def test_something(): ...
+```
 
-If you add a new API endpoint, add a mock handler in `client/tests/lib/mocks/api-mocker.ts`:
+**Playwright (E2E):**
+```typescript
+test.use({ tag: ['@spec:SPEC_NAME', '@req:requirement text'] });
+```
 
+**Vitest (unit):**
 ```typescript
-this.routes.push({
-  pattern: /\/workshops\/([a-f0-9-]+)\/your-endpoint$/i,
-  get: async (route) => {
-    await route.fulfill({ json: this.store.yourData });
-  },
-});
+// @spec SPEC_NAME
+// @req requirement text
 ```
 
-## Critical Files
+## Test File Locations
+
+| Type | Location | Tag Format |
+|------|----------|------------|
+| Python unit | `tests/unit/` | `@pytest.mark.spec("SPEC")` |
+| Python integration | `tests/integration/` | `@pytest.mark.spec("SPEC")` |
+| React unit | `client/src/**/*.test.ts` | `// @spec SPEC` comment |
+| E2E | `client/tests/e2e/*.spec.ts` | `test.use({ tag: ['@spec:SPEC'] })` |
+
+## Verification Workflow
+
+After implementing a feature:
+
+1. **Check coverage**: `just spec-status SPEC_NAME`
+2. **Run relevant tests**:
+   - Backend changes: `just test-server-spec SPEC_NAME`
+   - Frontend changes: `just ui-test-unit-spec SPEC_NAME`
+   - Full feature: `just e2e-spec SPEC_NAME`
+3. **Get results**: `just test-summary`
+4. **Lint**: `just ui-lint`
+
+## Reference Files
 
-- `specs/TESTING_SPEC.md` - Full testing specification
-- `client/tests/lib/README.md` - E2E test infrastructure docs
-- `client/tests/lib/mocks/api-mocker.ts` - Mock handlers
-- `client/tests/lib/scenario-builder.ts` - TestScenario class
-- `justfile` - All test commands
+For detailed patterns, see:
+- `e2e-patterns.md` - TestScenario builder API for E2E tests
+- `mocking.md` - How to mock API endpoints in E2E tests
+- `unit-tests.md` - pytest and vitest patterns
diff --git a/.coverage b/.coverage
diff --git a/.cursor/rules/always-use-uv.mdc b/.cursor/rules/always-use-uv.mdc
@@ -0,0 +1,4 @@
+---
+description: when running python commands or python executables like pytest use  `uv run python ...` or `uv run <cmd>`
+alwaysApply: false
+---
diff --git a/.cursor/rules/use-just-recipes.mdc b/.cursor/rules/use-just-recipes.mdc
@@ -0,0 +1,4 @@
+---
+description: When running things like unit tests, e2e tests, migrations, first look for a corresponding just recipe @justfile
+alwaysApply: false
+---
diff --git a/.gitignore b/.gitignore
@@ -19,4 +19,10 @@ workshop.db
 *.db-shm
 *.db-wal
 *.csv
-.databricks/
+.databricks/
+
+# Test results (JSON reports for LLM agents)
+.test-results/
+htmlcov/
+coverage/
+.coverage
diff --git a/CODEOWNERS.txt b/CODEOWNERS.txt
@@ -1,2 +1,4 @@
 vivian-xie-db
-Pallavi Koppol
+Pallavi Koppol
+forrestmurray-db
+Forrest Murray