Skip to content

Commit 25e0e95

Browse files
Merge pull request #31 from forrestmurray-db/feature-traceview-jsonpath
Feature traceview jsonpath
2 parents 4519fc0 + fdc9eba commit 25e0e95

71 files changed

Lines changed: 11991 additions & 72 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/CLAUDE.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Claude Code Instructions
2+
3+
## Purpose
4+
5+
Human Evaluation Workshop - a collaborative platform for annotating and evaluating LLM traces with MLflow integration. Built for Databricks Apps deployment.
6+
7+
## Tech Stack
8+
9+
- **Backend**: Python 3.11+, FastAPI, SQLAlchemy, Alembic (SQLite)
10+
- **Frontend**: React, TypeScript, Vite, Tailwind CSS
11+
- **Testing**: pytest, Vitest, Playwright
12+
- **Task runner**: `just` (see `justfile`)
13+
14+
## Key Directories
15+
16+
| Directory | Contents |
17+
|-----------|----------|
18+
| `/specs/` | Declarative specifications (source of truth) |
19+
| `/server/` | FastAPI backend |
20+
| `/client/` | React frontend |
21+
| `/tests/` | Python tests |
22+
| `/client/tests/` | Frontend unit + E2E tests |
23+
24+
## Spec-Driven Development
25+
26+
**This repo uses specs as source of truth.** Before implementing:
27+
28+
1. Search `/specs/README.md` for relevant spec (keyword indexed)
29+
2. Read the spec - it defines expected behavior and success criteria
30+
3. Check `/specs/SPEC_COVERAGE_MAP.md` for existing test coverage
31+
32+
## Core Rules
33+
34+
- **Read spec before coding** - No feature work without understanding the spec
35+
- **Tag all tests to specs** - Use `@pytest.mark.spec("SPEC_NAME")` or equivalent
36+
- **Verify before completing** - Run tests, ensure they pass
37+
- **Ask if spec is unclear** - Don't guess at undefined behavior
38+
39+
## Protected Operations (Ask First)
40+
41+
- Modifying files in `/specs/`
42+
- Creating database migrations
43+
- Changing auth logic
44+
- Deleting files
45+
- Destructive git operations
46+
47+
## Commands
48+
49+
```bash
50+
just test-server # Python unit tests
51+
just ui-test-unit # React unit tests
52+
just ui-lint # TypeScript/ESLint
53+
just e2e mode (headless|headed) extra-args # End-to-end tests
54+
```
55+
56+
if you want to do something not covered here consult @justfile
57+
58+
## References
59+
60+
- **Workflow details**: See `CONTRIBUTING.md`
61+
- **Test patterns**: `.claude/skills/verification-testing/SKILL.md`
62+
- **MLflow patterns**: `.claude/skills/mlflow-evaluation/SKILL.md`
63+
- **Spec index**: `/specs/README.md`

.claude/commands/verify.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
allowed-tools: Bash(just:*), Bash(git status:*)
3+
argument-hint: [scope] - "all", "affected", "backend", "frontend", "e2e"
4+
description: Run tests to verify code changes
5+
---
6+
7+
# Verify Code Changes
8+
9+
You are verifying code changes using the project's test infrastructure.
10+
11+
## Scope: $ARGUMENTS
12+
13+
Based on the scope provided, run the appropriate verification:
14+
15+
### If scope is "all" or empty:
16+
Run the full verification suite:
17+
1. `just test-server` - Python unit tests
18+
2. `just ui-lint` - TypeScript/ESLint checks
19+
3. `just ui-test-unit` - React unit tests
20+
4. `just e2e` - End-to-end tests
21+
22+
### If scope is "affected":
23+
1. First, check `git status` to see what files changed
24+
2. If Python files changed: run `just test-server`
25+
3. If TypeScript/React files changed: run `just ui-lint && just ui-test-unit`
26+
4. If E2E-relevant changes (UI components, API routes): run `just e2e`
27+
28+
### If scope is "backend":
29+
Run only backend verification:
30+
1. `just test-server`
31+
32+
### If scope is "frontend":
33+
Run only frontend verification:
34+
1. `just ui-lint`
35+
2. `just ui-test-unit`
36+
37+
### If scope is "e2e":
38+
Run only E2E tests:
39+
1. `just e2e`
40+
41+
## On Failure
42+
43+
If any test fails:
44+
1. Report which tests failed with the error output
45+
2. Suggest fixes based on the error messages
46+
3. Ask if the user wants you to fix the issues
47+
48+
## Reference
49+
50+
See the verification-testing skill in `.claude/skills/verification-testing/` for detailed testing patterns and mocking guidance.

.claude/settings.json

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"permissions": {
3+
"deny": [
4+
"Bash(rm -rf:*)",
5+
"Bash(git push --force:*)",
6+
"Bash(git reset --hard:*)",
7+
"Bash(alembic revision:*)",
8+
"Bash(alembic downgrade:*)",
9+
"Edit(specs/*)",
10+
"Write(specs/*)"
11+
],
12+
"ask": [
13+
"Bash(git push:*)",
14+
"Bash(alembic upgrade:*)",
15+
"Edit(server/models/*)",
16+
"Edit(alembic/*)",
17+
"Edit(.github/*)",
18+
"Write(alembic/*)"
19+
],
20+
"allow": [
21+
"Bash(just:*)",
22+
"Bash(uv run:*)",
23+
"Bash(npm:*)",
24+
"Bash(git status:*)",
25+
"Bash(git diff:*)",
26+
"Bash(git log:*)",
27+
"Bash(git add:*)",
28+
"Bash(git commit:*)",
29+
"Read(*)"
30+
]
31+
}
32+
}
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
name: verification-testing
3+
description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) running tests after code changes, (2) writing new unit tests (pytest/vitest), (3) writing E2E tests with Playwright/TestScenario, (4) debugging test failures, (5) understanding what to mock in E2E tests, (6) verifying a feature implementation. Covers the full test pyramid: unit tests -> integration tests -> E2E tests."
4+
---
5+
6+
# Verification & Testing
7+
8+
## Quick Verification Commands
9+
10+
Run these commands to verify code changes:
11+
12+
| Command | Purpose | When to Use |
13+
|---------|---------|-------------|
14+
| `just test-server` | Python unit tests | After backend changes |
15+
| `just ui-test-unit` | React unit tests | After frontend changes |
16+
| `just ui-lint` | TypeScript/ESLint | Before committing |
17+
| `just e2e` | Full E2E tests | After any feature change |
18+
19+
## Verification Workflow
20+
21+
### After Implementing a Feature
22+
23+
1. **Read the relevant spec** in `specs/` to understand success criteria
24+
2. **Run unit tests** for the layer you changed:
25+
- Backend: `just test-server`
26+
- Frontend: `just ui-test-unit`
27+
3. **Run linting**: `just ui-lint`
28+
4. **Run E2E tests**: `just e2e`
29+
5. **Add new tests** if the feature isn't covered
30+
31+
## Reference Files
32+
33+
| Reference | Purpose | When to Read |
34+
|-----------|---------|--------------|
35+
| `e2e-patterns.md` | TestScenario builder API | When writing E2E tests |
36+
| `mocking.md` | E2E mocking + MLflow/external service mocking | When adding new endpoints or testing integrations |
37+
| `unit-tests.md` | pytest and vitest patterns | When writing unit tests |
38+
39+
## Key Concepts
40+
41+
### Test Pyramid
42+
43+
```
44+
┌─────────┐
45+
│ E2E │ ← Playwright (slow, high confidence)
46+
└────┬────┘
47+
┌───────┴───────┐
48+
│ Integration │ ← API tests (medium speed)
49+
└───────┬───────┘
50+
┌────────────┴────────────┐
51+
│ Unit Tests │ ← pytest/vitest (fast)
52+
└─────────────────────────┘
53+
```
54+
55+
### E2E Mocking Strategy
56+
57+
**Mock by default** - The test infrastructure mocks all API calls unless you opt out:
58+
59+
```typescript
60+
// Everything mocked (default)
61+
const scenario = await TestScenario.create(page)
62+
.withWorkshop()
63+
.build();
64+
65+
// Selective real API
66+
const scenario = await TestScenario.create(page)
67+
.withWorkshop()
68+
.withReal('/users/auth/login') // Only auth is real
69+
.build();
70+
71+
// Full integration (no mocks)
72+
const scenario = await TestScenario.create(page)
73+
.withWorkshop()
74+
.withRealApi()
75+
.build();
76+
```
77+
78+
### Adding Mocks for New Endpoints
79+
80+
If you add a new API endpoint, add a mock handler in `client/tests/lib/mocks/api-mocker.ts`:
81+
82+
```typescript
83+
this.routes.push({
84+
pattern: /\/workshops\/([a-f0-9-]+)\/your-endpoint$/i,
85+
get: async (route) => {
86+
await route.fulfill({ json: this.store.yourData });
87+
},
88+
});
89+
```
90+
91+
## Critical Files
92+
93+
- `specs/TESTING_SPEC.md` - Full testing specification
94+
- `client/tests/lib/README.md` - E2E test infrastructure docs
95+
- `client/tests/lib/mocks/api-mocker.ts` - Mock handlers
96+
- `client/tests/lib/scenario-builder.ts` - TestScenario class
97+
- `justfile` - All test commands
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# E2E Test Patterns
2+
3+
## TestScenario Builder API
4+
5+
Location: `client/tests/lib/`
6+
7+
### Basic Usage
8+
9+
```typescript
10+
import { test, expect } from '@playwright/test';
11+
import { TestScenario } from './lib';
12+
13+
test('facilitator can create a rubric', async ({ page }) => {
14+
const scenario = await TestScenario.create(page)
15+
.withWorkshop({ name: 'My Workshop' })
16+
.withFacilitator()
17+
.withParticipants(2)
18+
.withTraces(5)
19+
.inPhase('rubric')
20+
.build();
21+
22+
await scenario.loginAs(scenario.facilitator);
23+
await expect(page.getByRole('heading', { name: 'My Workshop' })).toBeVisible();
24+
await scenario.cleanup();
25+
});
26+
```
27+
28+
### Workshop Configuration
29+
30+
```typescript
31+
.withWorkshop() // Default workshop
32+
.withWorkshop({ name: 'Custom Name' }) // Named workshop
33+
.withWorkshop({ name: 'W', description: 'D' }) // With description
34+
```
35+
36+
### User Configuration
37+
38+
```typescript
39+
.withFacilitator() // Default facilitator
40+
.withFacilitator({ email: 'a@b.com' }) // Custom email
41+
.withParticipants(3) // 3 participants
42+
.withSMEs(2) // 2 SME users
43+
.withUser('participant', { name: 'Alice' }) // Named user
44+
```
45+
46+
### Data Configuration
47+
48+
```typescript
49+
.withTraces(5) // 5 mock traces
50+
.withRubric({ question: 'How helpful?' }) // Add rubric
51+
.withDiscoveryFinding({ insight: '...' }) // Add finding
52+
.withDiscoveryComplete() // Mark discovery done
53+
.withAnnotation({ rating: 4, comment: '...' }) // Add annotation
54+
```
55+
56+
### Phase Configuration
57+
58+
```typescript
59+
.inPhase('intake') // Initial phase
60+
.inPhase('discovery') // Discovery phase
61+
.inPhase('rubric') // Rubric creation
62+
.inPhase('annotation') // Annotation phase
63+
.inPhase('results') // Results phase
64+
```
65+
66+
### Mock vs Real API
67+
68+
```typescript
69+
// Default: everything mocked
70+
.build()
71+
72+
// Selective real endpoints
73+
.withReal('/users/auth/login')
74+
.withReal('WorkshopsService')
75+
76+
// No mocking (full integration)
77+
.withRealApi()
78+
```
79+
80+
## Accessing Scenario Data
81+
82+
```typescript
83+
scenario.workshop // Workshop object
84+
scenario.facilitator // First facilitator
85+
scenario.users.participant // Array of participants
86+
scenario.users.sme // Array of SMEs
87+
scenario.traces // Array of traces
88+
scenario.rubric // Rubric (if created)
89+
scenario.findings // Discovery findings
90+
scenario.annotations // Annotations
91+
```
92+
93+
## Actions
94+
95+
```typescript
96+
// Authentication
97+
await scenario.loginAs(scenario.facilitator);
98+
await scenario.logout();
99+
100+
// Navigation
101+
await scenario.goToPhase('discovery');
102+
await scenario.goToTab('Rubric Questions');
103+
104+
// API-level phase advancement
105+
await scenario.advanceToPhase('rubric');
106+
107+
// Data creation
108+
await scenario.createRubricQuestion({ question: '...' });
109+
await scenario.submitFinding({ trace: scenario.traces[0], insight: '...' });
110+
await scenario.submitAnnotation({ rating: 4 });
111+
await scenario.completeDiscovery();
112+
```
113+
114+
## Multi-Browser Tests
115+
116+
```typescript
117+
test('multi-user workflow', async ({ browser }) => {
118+
const scenario = await TestScenario.create(browser)
119+
.withWorkshop()
120+
.withFacilitator()
121+
.withParticipants(2)
122+
.build();
123+
124+
const facilitatorPage = await scenario.newPageAs(scenario.facilitator);
125+
const alicePage = await scenario.newPageAs(scenario.users.participant[0]);
126+
127+
// Actions scoped to a page
128+
await scenario.using(alicePage).submitFinding({ ... });
129+
});
130+
```
131+
132+
## API Access for Assertions
133+
134+
```typescript
135+
const workshop = await scenario.api.getWorkshop();
136+
const rubric = await scenario.api.getRubric();
137+
const traces = await scenario.api.getTraces();
138+
const findings = await scenario.api.getFindings(userId);
139+
const annotations = await scenario.api.getAnnotations();
140+
const status = await scenario.api.getDiscoveryCompletionStatus();
141+
```
142+
143+
## Running E2E Tests
144+
145+
```bash
146+
just e2e # Headless
147+
just e2e headed # Visible browser
148+
just e2e ui # Playwright UI mode
149+
```

0 commit comments

Comments
 (0)