Feature: Assisted Discovery Facilitation#9
Draft
forrestmurray-db wants to merge 27 commits into
Draft
Conversation
bfe6088 to
9d90924
Compare
Defines the discovery phase facilitation feature: - Per-trace structured view with 5 category buckets - Real-time classification of participant findings - Facilitator-controlled question generation (broadcast per trace) - Auto-detected disagreements between participants - Promotion workflow to draft rubric staging area - Fuzzy progress for participants (no category bias) Co-Authored-By: Claude <noreply@anthropic.com>
Complete implementation of the Assisted Facilitation v2 architecture across all 6 phases: Phase 1: Database & Models - Added 5 new database tables (ClassifiedFinding, Disagreement, TraceDiscoveryQuestion, TraceDiscoveryThreshold, DraftRubricItem) - Added corresponding Pydantic models with proper configuration - Updated relationships in WorkshopDB and TraceDB Phase 2: Classification Service - Created ClassificationService with real-time finding classification - Added DSPy signatures for ClassifyFinding and DetectDisagreements - Implemented local classification heuristic (placeholder for LLM) Phase 3: Discovery Service Updates - Added 6 new methods: submit_finding_v2, get_trace_discovery_state, get_fuzzy_progress, promote_finding, update_trace_thresholds - Implemented fuzzy progress (exploring/good_coverage/complete) for participants - Added structured discovery state for facilitators Phase 4: API Endpoints - Added 7 new REST endpoints for assisted facilitation v2 - Endpoints cover findings submission, discovery state, promotion, and threshold updates Phase 5: Client Updates - Created TraceDiscoveryPanel component for facilitator view - Created DraftRubricPanel component for promoted findings - Added 5 new React hooks to useWorkshopApi.ts Phase 6: Testing - Created test_classification_service.py with 7 tests (all passing) - Created test_discovery_service_v2.py with 6 tests (all passing) - 100% test pass rate (13/13 tests) Key Features: - Real-time finding classification (5 categories) - Fuzzy progress indicator for participants - Disagreement detection framework - Draft rubric staging area - Per-trace threshold management Co-Authored-By: Claude <noreply@anthropic.com>
- Add E2E tests for classification, dashboard, discovery, and rubric promotion - Update auth test helper to handle workshop selection for facilitators - Update API mocker to support new workshop scenarios - Replace migration 0004 with 0008 for randomization columns - Update workshop router and database service Co-Authored-By: Claude <noreply@anthropic.com>
Brings in verification and testing improvements including: - Claude configuration and contributing guidelines - Enhanced test coverage and test utilities - Updated E2E tests with better workshop selection handling - Improved justfile commands - Updated spec coverage documentation Co-Authored-By: Claude <noreply@anthropic.com>
- Add .withRealApi() to all TestScenario builders so tests use real database - Add @SPEC:ASSISTED_FACILITATION_SPEC tag to all 5 test files - Add ASSISTED_FACILITATION_SPEC to known specs in analyzer and docs - Fix justfile to export E2E_API_URL for dynamic port support Co-Authored-By: Claude <noreply@anthropic.com>
Resolve conflicts by merging both feature sets: - Keep assisted facilitation v2 features (discovery questions, classification) - Keep JSONPath display customization features - Merge database columns: discovery_questions_model_name + input/output_jsonpath - Merge frontend hooks for both feature sets - Update SPEC_COVERAGE_MAP with both RUBRIC_SPEC and TRACE_DISPLAY_SPEC updates Co-Authored-By: Claude <noreply@anthropic.com>
0cd9b12 to
f0409e7
Compare
Add new migration for randomization columns and reorganize migration files. Refactor Claude settings to move spec operations to ask list for better control. Co-Authored-By: Claude <noreply@anthropic.com>
Configure all test runners to write JSON reports to .test-results/ and add test-summary tool for concise output grouped by spec. - Add pytest-json-report plugin for pytest JSON output - Update playwright.config.ts with JSON reporter (PW_JSON_REPORT=1) - Update vite.config.ts with JSON reporter (VITEST_JSON_REPORT=1) - Create tools/test_summary.py for parsing and summarizing reports - Add just test-summary and just spec-status commands - Update SKILL.md with token-efficient testing patterns Co-Authored-By: Claude <noreply@anthropic.com>
Enhances spec coverage analyzer to provide pytest-cov style reporting: - Parse success criteria from specs and track requirement-level coverage - Classify tests by type (unit, integration, e2e-mocked, e2e-real) - Add @Req marker support for linking tests to specific requirements - Add --affected flag to detect specs impacted by git changes - Add --specs flag to filter coverage to specific specs - Add JSON output mode for programmatic analysis - Add just test-affected command to run tests for changed specs only Co-Authored-By: Claude <noreply@anthropic.com>
pytest-json-report doesn't include marker args for class-level decorators
like @pytest.mark.spec("SPEC_NAME"). Added fallback to parse the source
file and extract the spec name, with caching to avoid repeated reads.
Co-Authored-By: Claude <noreply@anthropic.com>
Added @pytest.mark.req() markers to 100+ unit tests and Playwright E2E tests to link them to specific requirement success criteria. This enables granular requirement-level test coverage tracking. Key improvements: - ASSISTED_FACILITATION_SPEC: 100% (7/7 requirements) with 26 unit tests - JUDGE_EVALUATION_SPEC: 76% (10/13 requirements) with 29 unit tests - TRACE_DISPLAY_SPEC: 100% discovery with 23 unit tests now visible - AUTHENTICATION_SPEC: 57% (4/7 requirements) with 8 unit tests - ANNOTATION_SPEC: 22% (2/9 requirements) with 9 tests (4 unit + 5 E2E) - Overall: 100 unit tests + 44 E2E tests now tagged to specs Coverage tools can now report requirement-level coverage (which requirements have test coverage) vs just test counts. Co-Authored-By: Claude <noreply@anthropic.com>
The -k flag only matches test names, not marker arguments. Added custom
pytest --spec option that inspects @pytest.mark.spec("SPEC_NAME") markers
to properly select tests for a given spec.
Co-Authored-By: Claude <noreply@anthropic.com>
Extends discovery service to classify findings into categories (themes, edge_cases, boundary_conditions, failure_modes, missing_info) and persist the category with the finding. Updates get_trace_findings to return real findings grouped by category instead of placeholder data. - Add category field to DiscoveryFinding model and migration - Update submit_finding to classify and save findings with category - Implement get_trace_findings to query and group by category - Add add_classified_finding to database service - Update E2E tests for category-aware discovery workflow - Add Playwright timeout configuration via environment variables - Update test coverage and documentation Co-Authored-By: Claude <noreply@anthropic.com>
Update mock services to have correct interface matching actual database service methods. Tests now fail for the right reasons (unimplemented features) rather than missing mock methods. - Add get_classified_findings_by_trace to mocks - Add get_findings method returning findings list - Add MockFinding class with proper attributes - Fix threshold test to check individual values Co-Authored-By: Claude <noreply@anthropic.com>
Implement automatic disagreement detection using DSPy when participants submit findings. The system now: - Defines a DetectFindingDisagreements DSPy signature that analyzes findings from different users and identifies conflicting viewpoints - Calls the LLM to detect semantic conflicts after each finding submission - Persists detected disagreements to the database - Returns disagreements in the trace discovery state for facilitators The detection only runs when LLM configuration is available (model name, MLflow config, Databricks token). Falls back gracefully otherwise. Co-Authored-By: Claude <noreply@anthropic.com>
Creates tables for the new assisted facilitation workflow: - classified_findings: LLM-categorized findings - disagreements: Auto-detected conflicts between participants - trace_discovery_questions: Trace-level broadcast questions - trace_discovery_thresholds: Per-trace category coverage thresholds - draft_rubric_items: Promoted findings staging area Co-Authored-By: Claude <noreply@anthropic.com>
- Fix configureMLflow endpoint URL to use /workshops/{id}/mlflow-config
- Add update_discovery_questions_model_name to DatabaseService
- Include discovery_questions_model_name in Workshop model mapping
- Handle databricks/ prefix in DSPy model names to avoid duplication
- Create proper DSPy Signatures for classification and disagreement detection
- Update ClassificationService to use new DSPy signatures
- Fix disagreement persistence to use save_disagreement method
- Switch to ClassifiedFindingDB table for v2 findings (allows multiple
findings per user per trace, unlike legacy upsert behavior)
- Update get_trace_discovery_state to read from ClassifiedFindingDB
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix FacilitatorDashboard.tsx: Move expandedTraceId state declaration before hooks that use it, fixing "expandedTraceId is not defined" error that was crashing React and causing blank pages - Improve loginAs() in auth.ts: - Always navigate explicitly to login page - Wait for React to mount before interacting - Use proper loading state waits instead of hardcoded timeouts - Handle facilitator post-login flow (clicking workshop card) - Add automatic browser error capture to TestScenario: - Capture pageerror events (uncaught JS exceptions) - Capture console.error messages (filtered for non-critical 404s) - Errors are logged and cause test failure via scenario.cleanup() - Add baseURL to browser contexts created from browser fixture Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>
- Fix rules-of-hooks violations by moving hooks before early returns in FacilitatorDashboard, RoleBasedWorkflow, AnnotationReviewPage, AnnotationDemo, RubricCreationDemo, TraceViewerDemo - Fix real bug: add missing discoveryQuestions dep in TraceViewerDemo - Add @tanstack/eslint-plugin-query for queryKey validation - Fix TanStack Query exhaustive-deps in useDatabricksApi and useWorkshopApi - Create useMLflowStatus hook to replace useEffect+fetch anti-pattern - Add useMemo for derived state (perMetricScores, traceData, rubricQuestions) - Add useCallback for client state functions (handleSetWorkshopId, saveStateToStorage, parseLoadedComment) - Fix empty catch blocks with comments - Fix empty interface with type alias - Add eslint-disable comments for intentional dep exclusions with explanations - Add EXHAUSTIVE_DEPS_ANALYSIS.md documenting the analysis Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This feature is designed to improve facilitation and participation in the discovery stage. The current problems in short:
The assisted experience: