databricks-solutions
diff --git a/‎.claude/CLAUDE.md‎
Lines changed: 63 additions & 0 deletions b/‎.claude/CLAUDE.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎.claude/commands/verify.md‎
Lines changed: 50 additions & 0 deletions b/‎.claude/commands/verify.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎.claude/settings.json‎
Lines changed: 32 additions & 0 deletions b/‎.claude/settings.json‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎.claude/skills/verification-testing/SKILL.md‎
Lines changed: 97 additions & 0 deletions b/‎.claude/skills/verification-testing/SKILL.md‎
Lines changed: 97 additions & 0 deletions
diff --git a/‎.claude/skills/verification-testing/references/e2e-patterns.md‎
Lines changed: 149 additions & 0 deletions b/‎.claude/skills/verification-testing/references/e2e-patterns.md‎
Lines changed: 149 additions & 0 deletions
@@ -0,0 +1,63 @@
+# Claude Code Instructions
+
+## Purpose
+
+Human Evaluation Workshop - a collaborative platform for annotating and evaluating LLM traces with MLflow integration. Built for Databricks Apps deployment.
+
+## Tech Stack
+
+- **Backend**: Python 3.11+, FastAPI, SQLAlchemy, Alembic (SQLite)
+- **Frontend**: React, TypeScript, Vite, Tailwind CSS
+- **Testing**: pytest, Vitest, Playwright
+- **Task runner**: `just` (see `justfile`)
+
+## Key Directories
+
+| Directory | Contents |
+|-----------|----------|
+| `/specs/` | Declarative specifications (source of truth) |
+| `/server/` | FastAPI backend |
+| `/client/` | React frontend |
+| `/tests/` | Python tests |
+| `/client/tests/` | Frontend unit + E2E tests |
+
+## Spec-Driven Development
+
+**This repo uses specs as source of truth.** Before implementing:
+
+1. Search `/specs/README.md` for relevant spec (keyword indexed)
+2. Read the spec - it defines expected behavior and success criteria
+3. Check `/specs/SPEC_COVERAGE_MAP.md` for existing test coverage
+
+## Core Rules
+
+- **Read spec before coding** - No feature work without understanding the spec
+- **Tag all tests to specs** - Use `@pytest.mark.spec("SPEC_NAME")` or equivalent
+- **Verify before completing** - Run tests, ensure they pass
+- **Ask if spec is unclear** - Don't guess at undefined behavior
+
+## Protected Operations (Ask First)
+
+- Modifying files in `/specs/`
+- Creating database migrations
+- Changing auth logic
+- Deleting files
+- Destructive git operations
+
+## Commands
+
+```bash
+just test-server     # Python unit tests
+just ui-test-unit    # React unit tests
+just ui-lint         # TypeScript/ESLint
+just e2e mode (headless|headed) extra-args      # End-to-end tests
+```
+
+if you want to do something not covered here consult @justfile
+
+## References
+
+- **Workflow details**: See `CONTRIBUTING.md`
+- **Test patterns**: `.claude/skills/verification-testing/SKILL.md`
+- **MLflow patterns**: `.claude/skills/mlflow-evaluation/SKILL.md`
+- **Spec index**: `/specs/README.md`
@@ -0,0 +1,50 @@
+---
+allowed-tools: Bash(just:*), Bash(git status:*)
+argument-hint: [scope] - "all", "affected", "backend", "frontend", "e2e"
+description: Run tests to verify code changes
+---
+
+# Verify Code Changes
+
+You are verifying code changes using the project's test infrastructure.
+
+## Scope: $ARGUMENTS
+
+Based on the scope provided, run the appropriate verification:
+
+### If scope is "all" or empty:
+Run the full verification suite:
+1. `just test-server` - Python unit tests
+2. `just ui-lint` - TypeScript/ESLint checks
+3. `just ui-test-unit` - React unit tests
+4. `just e2e` - End-to-end tests
+
+### If scope is "affected":
+1. First, check `git status` to see what files changed
+2. If Python files changed: run `just test-server`
+3. If TypeScript/React files changed: run `just ui-lint && just ui-test-unit`
+4. If E2E-relevant changes (UI components, API routes): run `just e2e`
+
+### If scope is "backend":
+Run only backend verification:
+1. `just test-server`
+
+### If scope is "frontend":
+Run only frontend verification:
+1. `just ui-lint`
+2. `just ui-test-unit`
+
+### If scope is "e2e":
+Run only E2E tests:
+1. `just e2e`
+
+## On Failure
+
+If any test fails:
+1. Report which tests failed with the error output
+2. Suggest fixes based on the error messages
+3. Ask if the user wants you to fix the issues
+
+## Reference
+
+See the verification-testing skill in `.claude/skills/verification-testing/` for detailed testing patterns and mocking guidance.
@@ -0,0 +1,32 @@
+{
+  "permissions": {
+    "deny": [
+      "Bash(rm -rf:*)",
+      "Bash(git push --force:*)",
+      "Bash(git reset --hard:*)",
+      "Bash(alembic revision:*)",
+      "Bash(alembic downgrade:*)",
+      "Edit(specs/*)",
+      "Write(specs/*)"
+    ],
+    "ask": [
+      "Bash(git push:*)",
+      "Bash(alembic upgrade:*)",
+      "Edit(server/models/*)",
+      "Edit(alembic/*)",
+      "Edit(.github/*)",
+      "Write(alembic/*)"
+    ],
+    "allow": [
+      "Bash(just:*)",
+      "Bash(uv run:*)",
+      "Bash(npm:*)",
+      "Bash(git status:*)",
+      "Bash(git diff:*)",
+      "Bash(git log:*)",
+      "Bash(git add:*)",
+      "Bash(git commit:*)",
+      "Read(*)"
+    ]
+  }
+}
@@ -0,0 +1,97 @@
+---
+name: verification-testing
+description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) running tests after code changes, (2) writing new unit tests (pytest/vitest), (3) writing E2E tests with Playwright/TestScenario, (4) debugging test failures, (5) understanding what to mock in E2E tests, (6) verifying a feature implementation. Covers the full test pyramid: unit tests -> integration tests -> E2E tests."
+---
+
+# Verification & Testing
+
+## Quick Verification Commands
+
+Run these commands to verify code changes:
+
+| Command | Purpose | When to Use |
+|---------|---------|-------------|
+| `just test-server` | Python unit tests | After backend changes |
+| `just ui-test-unit` | React unit tests | After frontend changes |
+| `just ui-lint` | TypeScript/ESLint | Before committing |
+| `just e2e` | Full E2E tests | After any feature change |
+
+## Verification Workflow
+
+### After Implementing a Feature
+
+1. **Read the relevant spec** in `specs/` to understand success criteria
+2. **Run unit tests** for the layer you changed:
+   - Backend: `just test-server`
+   - Frontend: `just ui-test-unit`
+3. **Run linting**: `just ui-lint`
+4. **Run E2E tests**: `just e2e`
+5. **Add new tests** if the feature isn't covered
+
+## Reference Files
+
+| Reference | Purpose | When to Read |
+|-----------|---------|--------------|
+| `e2e-patterns.md` | TestScenario builder API | When writing E2E tests |
+| `mocking.md` | E2E mocking + MLflow/external service mocking | When adding new endpoints or testing integrations |
+| `unit-tests.md` | pytest and vitest patterns | When writing unit tests |
+
+## Key Concepts
+
+### Test Pyramid
+
+```
+        ┌─────────┐
+        │   E2E   │  ← Playwright (slow, high confidence)
+        └────┬────┘
+     ┌───────┴───────┐
+     │  Integration  │  ← API tests (medium speed)
+     └───────┬───────┘
+┌────────────┴────────────┐
+│       Unit Tests        │  ← pytest/vitest (fast)
+└─────────────────────────┘
+```
+
+### E2E Mocking Strategy
+
+**Mock by default** - The test infrastructure mocks all API calls unless you opt out:
+
+```typescript
+// Everything mocked (default)
+const scenario = await TestScenario.create(page)
+  .withWorkshop()
+  .build();
+
+// Selective real API
+const scenario = await TestScenario.create(page)
+  .withWorkshop()
+  .withReal('/users/auth/login')  // Only auth is real
+  .build();
+
+// Full integration (no mocks)
+const scenario = await TestScenario.create(page)
+  .withWorkshop()
+  .withRealApi()
+  .build();
+```
+
+### Adding Mocks for New Endpoints
+
+If you add a new API endpoint, add a mock handler in `client/tests/lib/mocks/api-mocker.ts`:
+
+```typescript
+this.routes.push({
+  pattern: /\/workshops\/([a-f0-9-]+)\/your-endpoint$/i,
+  get: async (route) => {
+    await route.fulfill({ json: this.store.yourData });
+  },
+});
+```
+
+## Critical Files
+
+- `specs/TESTING_SPEC.md` - Full testing specification
+- `client/tests/lib/README.md` - E2E test infrastructure docs
+- `client/tests/lib/mocks/api-mocker.ts` - Mock handlers
+- `client/tests/lib/scenario-builder.ts` - TestScenario class
+- `justfile` - All test commands
@@ -0,0 +1,149 @@
+# E2E Test Patterns
+
+## TestScenario Builder API
+
+Location: `client/tests/lib/`
+
+### Basic Usage
+
+```typescript
+import { test, expect } from '@playwright/test';
+import { TestScenario } from './lib';
+
+test('facilitator can create a rubric', async ({ page }) => {
+  const scenario = await TestScenario.create(page)
+    .withWorkshop({ name: 'My Workshop' })
+    .withFacilitator()
+    .withParticipants(2)
+    .withTraces(5)
+    .inPhase('rubric')
+    .build();
+
+  await scenario.loginAs(scenario.facilitator);
+  await expect(page.getByRole('heading', { name: 'My Workshop' })).toBeVisible();
+  await scenario.cleanup();
+});
+```
+
+### Workshop Configuration
+
+```typescript
+.withWorkshop()                              // Default workshop
+.withWorkshop({ name: 'Custom Name' })       // Named workshop
+.withWorkshop({ name: 'W', description: 'D' }) // With description
+```
+
+### User Configuration
+
+```typescript
+.withFacilitator()                           // Default facilitator
+.withFacilitator({ email: 'a@b.com' })       // Custom email
+.withParticipants(3)                         // 3 participants
+.withSMEs(2)                                 // 2 SME users
+.withUser('participant', { name: 'Alice' }) // Named user
+```
+
+### Data Configuration
+
+```typescript
+.withTraces(5)                               // 5 mock traces
+.withRubric({ question: 'How helpful?' })    // Add rubric
+.withDiscoveryFinding({ insight: '...' })    // Add finding
+.withDiscoveryComplete()                     // Mark discovery done
+.withAnnotation({ rating: 4, comment: '...' }) // Add annotation
+```
+
+### Phase Configuration
+
+```typescript
+.inPhase('intake')       // Initial phase
+.inPhase('discovery')    // Discovery phase
+.inPhase('rubric')       // Rubric creation
+.inPhase('annotation')   // Annotation phase
+.inPhase('results')      // Results phase
+```
+
+### Mock vs Real API
+
+```typescript
+// Default: everything mocked
+.build()
+
+// Selective real endpoints
+.withReal('/users/auth/login')
+.withReal('WorkshopsService')
+
+// No mocking (full integration)
+.withRealApi()
+```
+
+## Accessing Scenario Data
+
+```typescript
+scenario.workshop           // Workshop object
+scenario.facilitator        // First facilitator
+scenario.users.participant  // Array of participants
+scenario.users.sme          // Array of SMEs
+scenario.traces             // Array of traces
+scenario.rubric             // Rubric (if created)
+scenario.findings           // Discovery findings
+scenario.annotations        // Annotations
+```
+
+## Actions
+
+```typescript
+// Authentication
+await scenario.loginAs(scenario.facilitator);
+await scenario.logout();
+
+// Navigation
+await scenario.goToPhase('discovery');
+await scenario.goToTab('Rubric Questions');
+
+// API-level phase advancement
+await scenario.advanceToPhase('rubric');
+
+// Data creation
+await scenario.createRubricQuestion({ question: '...' });
+await scenario.submitFinding({ trace: scenario.traces[0], insight: '...' });
+await scenario.submitAnnotation({ rating: 4 });
+await scenario.completeDiscovery();
+```
+
+## Multi-Browser Tests
+
+```typescript
+test('multi-user workflow', async ({ browser }) => {
+  const scenario = await TestScenario.create(browser)
+    .withWorkshop()
+    .withFacilitator()
+    .withParticipants(2)
+    .build();
+
+  const facilitatorPage = await scenario.newPageAs(scenario.facilitator);
+  const alicePage = await scenario.newPageAs(scenario.users.participant[0]);
+
+  // Actions scoped to a page
+  await scenario.using(alicePage).submitFinding({ ... });
+});
+```
+
+## API Access for Assertions
+
+```typescript
+const workshop = await scenario.api.getWorkshop();
+const rubric = await scenario.api.getRubric();
+const traces = await scenario.api.getTraces();
+const findings = await scenario.api.getFindings(userId);
+const annotations = await scenario.api.getAnnotations();
+const status = await scenario.api.getDiscoveryCompletionStatus();
+```
+
+## Running E2E Tests
+
+```bash
+just e2e              # Headless
+just e2e headed       # Visible browser
+just e2e ui           # Playwright UI mode
+```