Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
70d5ee2
adds generated questions and summaries
forrestmurray-db Jan 7, 2026
590aa62
add dspy
forrestmurray-db Jan 7, 2026
ea1f582
update to use new assisted facilitation approach
forrestmurray-db Jan 7, 2026
d330218
synthetic trace generation + e2e test
forrestmurray-db Jan 12, 2026
e69f569
Add assisted facilitation spec
forrestmurray-db Jan 20, 2026
0e834bd
Implement Assisted Facilitation v2 specification
forrestmurray-db Jan 20, 2026
59ed109
Add E2E tests and update authentication flow for assisted facilitation
forrestmurray-db Jan 20, 2026
e1f300f
Merge branch 'refactor/verification-skills' into feat/discovery-update
forrestmurray-db Jan 20, 2026
f899abf
Fix assisted facilitation e2e tests and add proper spec tagging
forrestmurray-db Jan 20, 2026
7d702f8
skill and tool updates
forrestmurray-db Jan 21, 2026
f0409e7
Merge branch 'main' into feat/discovery-update
forrestmurray-db Jan 21, 2026
077ff0b
Update discovery questions migrations and refactor Claude settings
forrestmurray-db Jan 22, 2026
fed37d9
Merge refactor/verification-skills branch
forrestmurray-db Jan 22, 2026
1d5653a
update skill for better test filtering
forrestmurray-db Jan 22, 2026
2291544
Add token-efficient test reporting for LLM agents
forrestmurray-db Jan 22, 2026
1a33489
Add requirement-level spec coverage with test pyramid and affected mode
forrestmurray-db Jan 22, 2026
a954c72
Fix test-summary to extract spec from class-level pytest markers
forrestmurray-db Jan 22, 2026
c1a93f5
Tag all existing tests with @req markers for spec coverage tracking
forrestmurray-db Jan 22, 2026
ca7d810
Fix test-server-spec to filter tests by spec marker argument
forrestmurray-db Jan 22, 2026
c026d9d
Add discovery findings category classification and persistence
forrestmurray-db Jan 23, 2026
d765c27
Fix test mocks for assisted facilitation spec tests
forrestmurray-db Jan 23, 2026
5ec0551
Add DSPy-based disagreement detection for discovery findings
forrestmurray-db Jan 23, 2026
268c27e
Add database migration for assisted facilitation v2 tables
forrestmurray-db Jan 26, 2026
10d4f7f
Fix LLM classification integration for assisted facilitation v2
forrestmurray-db Jan 26, 2026
e3aad5b
Fix facilitator E2E login and add browser error capture
forrestmurray-db Jan 28, 2026
4ee3e50
Fix ESLint errors and add TanStack Query plugin
forrestmurray-db Jan 28, 2026
e752ca8
fix typecheck and lint errors
forrestmurray-db Jan 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 0 additions & 50 deletions .claude/commands/verify.md

This file was deleted.

10 changes: 5 additions & 5 deletions .claude/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@
"Bash(git push --force:*)",
"Bash(git reset --hard:*)",
"Bash(alembic revision:*)",
"Bash(alembic downgrade:*)",
"Edit(specs/*)",
"Write(specs/*)"
"Bash(alembic downgrade:*)"
],
"ask": [
"Bash(git push:*)",
"Bash(alembic upgrade:*)",
"Edit(server/models/*)",
"Edit(alembic/*)",
"Edit(.github/*)",
"Write(alembic/*)"
"Write(alembic/*)",
"Edit(specs/*)",
"Write(specs/*)"
],
"allow": [
"Bash(just:*)",
Expand All @@ -29,4 +29,4 @@
"Read(*)"
]
}
}
}
195 changes: 127 additions & 68 deletions .claude/skills/verification-testing/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,97 +1,156 @@
---
name: verification-testing
description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) running tests after code changes, (2) writing new unit tests (pytest/vitest), (3) writing E2E tests with Playwright/TestScenario, (4) debugging test failures, (5) understanding what to mock in E2E tests, (6) verifying a feature implementation. Covers the full test pyramid: unit tests -> integration tests -> E2E tests."
description: "Code verification and testing for the Human Evaluation Workshop. Use when (1) checking implementation progress against specs, (2) running tests after code changes, (3) writing new tests, (4) debugging test failures. Covers unit tests, integration tests, and E2E tests."
---

# Verification & Testing

## Quick Verification Commands
*IMPORTANT:* BEHAVIORS WHICH AREN'T YET IMPLEMENTED OR STUBBED WITH A PLACEHOLDER SHOULD NOT YIELD PASSING TESTS!

Run these commands to verify code changes:
## Common Questions (Start Here)

| Command | Purpose | When to Use |
|---------|---------|-------------|
| `just test-server` | Python unit tests | After backend changes |
| `just ui-test-unit` | React unit tests | After frontend changes |
| `just ui-lint` | TypeScript/ESLint | Before committing |
| `just e2e` | Full E2E tests | After any feature change |
### "How far along is SPEC_NAME implementation?"

## Verification Workflow
**Use `just spec-status` - do NOT run tests unnecessarily.**

### After Implementing a Feature
```bash
just spec-status SPEC_NAME
```

1. **Read the relevant spec** in `specs/` to understand success criteria
2. **Run unit tests** for the layer you changed:
- Backend: `just test-server`
- Frontend: `just ui-test-unit`
3. **Run linting**: `just ui-lint`
4. **Run E2E tests**: `just e2e`
5. **Add new tests** if the feature isn't covered
This shows coverage percentage and any recent test results. To get detailed uncovered requirements:

## Reference Files
```bash
just spec-coverage --specs SPEC_NAME --json | jq '{
coverage: .specs.SPEC_NAME.coverage_percent,
covered: .specs.SPEC_NAME.covered_count,
total: .specs.SPEC_NAME.requirement_count,
uncovered: .specs.SPEC_NAME.uncovered
}'
```

**Summarize results for the user** - don't just dump JSON output.

### "Which tests cover SPEC_NAME?"

```bash
# Python tests
grep -r "@pytest.mark.spec(\"SPEC_NAME\")" tests/

# E2E tests
grep -l "@spec:SPEC_NAME" client/tests/e2e/*.spec.ts

# All test counts by type
just spec-coverage --specs SPEC_NAME --json | jq '.specs.SPEC_NAME.tests_by_type'
```

| Reference | Purpose | When to Read |
|-----------|---------|--------------|
| `e2e-patterns.md` | TestScenario builder API | When writing E2E tests |
| `mocking.md` | E2E mocking + MLflow/external service mocking | When adding new endpoints or testing integrations |
| `unit-tests.md` | pytest and vitest patterns | When writing unit tests |
### "Are the tests passing?"

## Key Concepts
**Only run tests if the user asks to verify implementation works**, not just to check progress.

### Test Pyramid
```bash
# After running tests, get concise summary
just test-summary

# Or filter by spec
just test-summary --spec SPEC_NAME
```
┌─────────┐
│ E2E │ ← Playwright (slow, high confidence)
└────┬────┘
┌───────┴───────┐
│ Integration │ ← API tests (medium speed)
└───────┬───────┘
┌────────────┴────────────┐
│ Unit Tests │ ← pytest/vitest (fast)
└─────────────────────────┘

### "Which requirements are uncovered?"

```bash
just spec-coverage --json | jq '.specs | to_entries[] | select(.value.uncovered | length > 0) | {spec: .key, uncovered: .value.uncovered}'
```

### E2E Mocking Strategy
---

**Mock by default** - The test infrastructure mocks all API calls unless you opt out:
## Quick Commands Reference

```typescript
// Everything mocked (default)
const scenario = await TestScenario.create(page)
.withWorkshop()
.build();

// Selective real API
const scenario = await TestScenario.create(page)
.withWorkshop()
.withReal('/users/auth/login') // Only auth is real
.build();

// Full integration (no mocks)
const scenario = await TestScenario.create(page)
.withWorkshop()
.withRealApi()
.build();
| Command | Purpose |
|---------|---------|
| `just spec-status SPEC_NAME` | Coverage + recent test results for a spec |
| `just spec-coverage` | Full coverage report (all specs) |
| `just spec-coverage --affected` | Coverage for specs affected by recent changes |
| `just test-summary` | Concise test results after running tests |
| `just test-server` | Run all Python unit tests |
| `just ui-test-unit` | Run all React unit tests |
| `just e2e` | Run all E2E tests |
| `just e2e-spec SPEC_NAME` | Run E2E tests for a specific spec |

## Running Tests for a Specific Spec

```bash
# Python unit tests
just test-server-spec SPEC_NAME

# React unit tests
just ui-test-unit-spec SPEC_NAME

# E2E tests (headless by default)
just e2e-spec SPEC_NAME

# E2E with visible browser
just e2e-spec SPEC_NAME headed
```

### E2E Timeout Configuration

If tests are timing out, increase Playwright timeouts via environment variables:

```bash
# Increase test timeout (default: 30s) and expect timeout (default: 5s)
PW_TEST_TIMEOUT=60000 PW_EXPECT_TIMEOUT=10000 just e2e-spec SPEC_NAME
```

### Adding Mocks for New Endpoints
Server logs are suppressed by default during E2E tests. To view them:
- Check `.test-results/api-server.log` and `.test-results/ui-server.log`
- Or run servers manually with `just e2e-servers` (logs to stdout)

## Test Tagging

All tests must be tagged with spec markers:

**Python (pytest):**
```python
@pytest.mark.spec("SPEC_NAME")
@pytest.mark.req("requirement text") # optional, links to specific requirement
def test_something(): ...
```

If you add a new API endpoint, add a mock handler in `client/tests/lib/mocks/api-mocker.ts`:
**Playwright (E2E):**
```typescript
test.use({ tag: ['@spec:SPEC_NAME', '@req:requirement text'] });
```

**Vitest (unit):**
```typescript
this.routes.push({
pattern: /\/workshops\/([a-f0-9-]+)\/your-endpoint$/i,
get: async (route) => {
await route.fulfill({ json: this.store.yourData });
},
});
// @spec SPEC_NAME
// @req requirement text
```

## Critical Files
## Test File Locations

| Type | Location | Tag Format |
|------|----------|------------|
| Python unit | `tests/unit/` | `@pytest.mark.spec("SPEC")` |
| Python integration | `tests/integration/` | `@pytest.mark.spec("SPEC")` |
| React unit | `client/src/**/*.test.ts` | `// @spec SPEC` comment |
| E2E | `client/tests/e2e/*.spec.ts` | `test.use({ tag: ['@spec:SPEC'] })` |

## Verification Workflow

After implementing a feature:

1. **Check coverage**: `just spec-status SPEC_NAME`
2. **Run relevant tests**:
- Backend changes: `just test-server-spec SPEC_NAME`
- Frontend changes: `just ui-test-unit-spec SPEC_NAME`
- Full feature: `just e2e-spec SPEC_NAME`
3. **Get results**: `just test-summary`
4. **Lint**: `just ui-lint`

## Reference Files

- `specs/TESTING_SPEC.md` - Full testing specification
- `client/tests/lib/README.md` - E2E test infrastructure docs
- `client/tests/lib/mocks/api-mocker.ts` - Mock handlers
- `client/tests/lib/scenario-builder.ts` - TestScenario class
- `justfile` - All test commands
For detailed patterns, see:
- `e2e-patterns.md` - TestScenario builder API for E2E tests
- `mocking.md` - How to mock API endpoints in E2E tests
- `unit-tests.md` - pytest and vitest patterns
Binary file modified .coverage
Binary file not shown.
4 changes: 4 additions & 0 deletions .cursor/rules/always-use-uv.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
description: when running python commands or python executables like pytest use `uv run python ...` or `uv run <cmd>`
alwaysApply: false
---
4 changes: 4 additions & 0 deletions .cursor/rules/use-just-recipes.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
description: When running things like unit tests, e2e tests, migrations, first look for a corresponding just recipe @justfile
alwaysApply: false
---
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,10 @@ workshop.db
*.db-shm
*.db-wal
*.csv
.databricks/
.databricks/

# Test results (JSON reports for LLM agents)
.test-results/
htmlcov/
coverage/
.coverage
4 changes: 3 additions & 1 deletion CODEOWNERS.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
vivian-xie-db
Pallavi Koppol
Pallavi Koppol
forrestmurray-db
Forrest Murray
Loading