Feature: Assisted Discovery Facilitation by forrestmurray-db · Pull Request #9 · databricks-solutions/project-0xfffff

forrestmurray-db · 2026-01-07T17:49:41Z

This feature is designed to improve facilitation and participation in the discovery stage. The current problems in short:

Sparse findings from participants. Current questions allow for low-information responses which make the job of formulating rubric questions much more difficult.
High cognitive overload on the facilitator. Facilitators have to work to understand the domain of the agent and the nuances of quality while simultaneously thinking of ways to drive discussion which is very difficult.

The assisted experience:

Generates targeted questions for individual participants to spark deep thought.
Aggregates and summarizes findings to help the facilitator facilitate.

Defines the discovery phase facilitation feature: - Per-trace structured view with 5 category buckets - Real-time classification of participant findings - Facilitator-controlled question generation (broadcast per trace) - Auto-detected disagreements between participants - Promotion workflow to draft rubric staging area - Fuzzy progress for participants (no category bias) Co-Authored-By: Claude <noreply@anthropic.com>

Complete implementation of the Assisted Facilitation v2 architecture across all 6 phases: Phase 1: Database & Models - Added 5 new database tables (ClassifiedFinding, Disagreement, TraceDiscoveryQuestion, TraceDiscoveryThreshold, DraftRubricItem) - Added corresponding Pydantic models with proper configuration - Updated relationships in WorkshopDB and TraceDB Phase 2: Classification Service - Created ClassificationService with real-time finding classification - Added DSPy signatures for ClassifyFinding and DetectDisagreements - Implemented local classification heuristic (placeholder for LLM) Phase 3: Discovery Service Updates - Added 6 new methods: submit_finding_v2, get_trace_discovery_state, get_fuzzy_progress, promote_finding, update_trace_thresholds - Implemented fuzzy progress (exploring/good_coverage/complete) for participants - Added structured discovery state for facilitators Phase 4: API Endpoints - Added 7 new REST endpoints for assisted facilitation v2 - Endpoints cover findings submission, discovery state, promotion, and threshold updates Phase 5: Client Updates - Created TraceDiscoveryPanel component for facilitator view - Created DraftRubricPanel component for promoted findings - Added 5 new React hooks to useWorkshopApi.ts Phase 6: Testing - Created test_classification_service.py with 7 tests (all passing) - Created test_discovery_service_v2.py with 6 tests (all passing) - 100% test pass rate (13/13 tests) Key Features: - Real-time finding classification (5 categories) - Fuzzy progress indicator for participants - Disagreement detection framework - Draft rubric staging area - Per-trace threshold management Co-Authored-By: Claude <noreply@anthropic.com>

- Add E2E tests for classification, dashboard, discovery, and rubric promotion - Update auth test helper to handle workshop selection for facilitators - Update API mocker to support new workshop scenarios - Replace migration 0004 with 0008 for randomization columns - Update workshop router and database service Co-Authored-By: Claude <noreply@anthropic.com>

Brings in verification and testing improvements including: - Claude configuration and contributing guidelines - Enhanced test coverage and test utilities - Updated E2E tests with better workshop selection handling - Improved justfile commands - Updated spec coverage documentation Co-Authored-By: Claude <noreply@anthropic.com>

- Add .withRealApi() to all TestScenario builders so tests use real database - Add @SPEC:ASSISTED_FACILITATION_SPEC tag to all 5 test files - Add ASSISTED_FACILITATION_SPEC to known specs in analyzer and docs - Fix justfile to export E2E_API_URL for dynamic port support Co-Authored-By: Claude <noreply@anthropic.com>

Resolve conflicts by merging both feature sets: - Keep assisted facilitation v2 features (discovery questions, classification) - Keep JSONPath display customization features - Merge database columns: discovery_questions_model_name + input/output_jsonpath - Merge frontend hooks for both feature sets - Update SPEC_COVERAGE_MAP with both RUBRIC_SPEC and TRACE_DISPLAY_SPEC updates Co-Authored-By: Claude <noreply@anthropic.com>

Add new migration for randomization columns and reorganize migration files. Refactor Claude settings to move spec operations to ask list for better control. Co-Authored-By: Claude <noreply@anthropic.com>

Configure all test runners to write JSON reports to .test-results/ and add test-summary tool for concise output grouped by spec. - Add pytest-json-report plugin for pytest JSON output - Update playwright.config.ts with JSON reporter (PW_JSON_REPORT=1) - Update vite.config.ts with JSON reporter (VITEST_JSON_REPORT=1) - Create tools/test_summary.py for parsing and summarizing reports - Add just test-summary and just spec-status commands - Update SKILL.md with token-efficient testing patterns Co-Authored-By: Claude <noreply@anthropic.com>

@Req

Enhances spec coverage analyzer to provide pytest-cov style reporting: - Parse success criteria from specs and track requirement-level coverage - Classify tests by type (unit, integration, e2e-mocked, e2e-real) - Add @Req marker support for linking tests to specific requirements - Add --affected flag to detect specs impacted by git changes - Add --specs flag to filter coverage to specific specs - Add JSON output mode for programmatic analysis - Add just test-affected command to run tests for changed specs only Co-Authored-By: Claude <noreply@anthropic.com>

pytest-json-report doesn't include marker args for class-level decorators like @pytest.mark.spec("SPEC_NAME"). Added fallback to parse the source file and extract the spec name, with caching to avoid repeated reads. Co-Authored-By: Claude <noreply@anthropic.com>

Added @pytest.mark.req() markers to 100+ unit tests and Playwright E2E tests to link them to specific requirement success criteria. This enables granular requirement-level test coverage tracking. Key improvements: - ASSISTED_FACILITATION_SPEC: 100% (7/7 requirements) with 26 unit tests - JUDGE_EVALUATION_SPEC: 76% (10/13 requirements) with 29 unit tests - TRACE_DISPLAY_SPEC: 100% discovery with 23 unit tests now visible - AUTHENTICATION_SPEC: 57% (4/7 requirements) with 8 unit tests - ANNOTATION_SPEC: 22% (2/9 requirements) with 9 tests (4 unit + 5 E2E) - Overall: 100 unit tests + 44 E2E tests now tagged to specs Coverage tools can now report requirement-level coverage (which requirements have test coverage) vs just test counts. Co-Authored-By: Claude <noreply@anthropic.com>

The -k flag only matches test names, not marker arguments. Added custom pytest --spec option that inspects @pytest.mark.spec("SPEC_NAME") markers to properly select tests for a given spec. Co-Authored-By: Claude <noreply@anthropic.com>

Extends discovery service to classify findings into categories (themes, edge_cases, boundary_conditions, failure_modes, missing_info) and persist the category with the finding. Updates get_trace_findings to return real findings grouped by category instead of placeholder data. - Add category field to DiscoveryFinding model and migration - Update submit_finding to classify and save findings with category - Implement get_trace_findings to query and group by category - Add add_classified_finding to database service - Update E2E tests for category-aware discovery workflow - Add Playwright timeout configuration via environment variables - Update test coverage and documentation Co-Authored-By: Claude <noreply@anthropic.com>

Update mock services to have correct interface matching actual database service methods. Tests now fail for the right reasons (unimplemented features) rather than missing mock methods. - Add get_classified_findings_by_trace to mocks - Add get_findings method returning findings list - Add MockFinding class with proper attributes - Fix threshold test to check individual values Co-Authored-By: Claude <noreply@anthropic.com>

Implement automatic disagreement detection using DSPy when participants submit findings. The system now: - Defines a DetectFindingDisagreements DSPy signature that analyzes findings from different users and identifies conflicting viewpoints - Calls the LLM to detect semantic conflicts after each finding submission - Persists detected disagreements to the database - Returns disagreements in the trace discovery state for facilitators The detection only runs when LLM configuration is available (model name, MLflow config, Databricks token). Falls back gracefully otherwise. Co-Authored-By: Claude <noreply@anthropic.com>

Creates tables for the new assisted facilitation workflow: - classified_findings: LLM-categorized findings - disagreements: Auto-detected conflicts between participants - trace_discovery_questions: Trace-level broadcast questions - trace_discovery_thresholds: Per-trace category coverage thresholds - draft_rubric_items: Promoted findings staging area Co-Authored-By: Claude <noreply@anthropic.com>

- Fix configureMLflow endpoint URL to use /workshops/{id}/mlflow-config - Add update_discovery_questions_model_name to DatabaseService - Include discovery_questions_model_name in Workshop model mapping - Handle databricks/ prefix in DSPy model names to avoid duplication - Create proper DSPy Signatures for classification and disagreement detection - Update ClassificationService to use new DSPy signatures - Fix disagreement persistence to use save_disagreement method - Switch to ClassifiedFindingDB table for v2 findings (allows multiple findings per user per trace, unlike legacy upsert behavior) - Update get_trace_discovery_state to read from ClassifiedFindingDB Co-Authored-By: Claude <noreply@anthropic.com>

- Fix FacilitatorDashboard.tsx: Move expandedTraceId state declaration before hooks that use it, fixing "expandedTraceId is not defined" error that was crashing React and causing blank pages - Improve loginAs() in auth.ts: - Always navigate explicitly to login page - Wait for React to mount before interacting - Use proper loading state waits instead of hardcoded timeouts - Handle facilitator post-login flow (clicking workshop card) - Add automatic browser error capture to TestScenario: - Capture pageerror events (uncaught JS exceptions) - Capture console.error messages (filtered for non-critical 404s) - Errors are logged and cause test failure via scenario.cleanup() - Add baseURL to browser contexts created from browser fixture Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>

- Fix rules-of-hooks violations by moving hooks before early returns in FacilitatorDashboard, RoleBasedWorkflow, AnnotationReviewPage, AnnotationDemo, RubricCreationDemo, TraceViewerDemo - Fix real bug: add missing discoveryQuestions dep in TraceViewerDemo - Add @tanstack/eslint-plugin-query for queryKey validation - Fix TanStack Query exhaustive-deps in useDatabricksApi and useWorkshopApi - Create useMLflowStatus hook to replace useEffect+fetch anti-pattern - Add useMemo for derived state (perMetricScores, traceData, rubricQuestions) - Add useCallback for client state functions (handleSetWorkshopId, saveStateToStorage, parseLoadedComment) - Fix empty catch blocks with comments - Fix empty interface with type alias - Add eslint-disable comments for intentional dep exclusions with explanations - Add EXHAUSTIVE_DEPS_ANALYSIS.md documenting the analysis Co-Authored-By: Claude (databricks-claude-opus-4-5) <noreply@anthropic.com>

forrestmurray-db force-pushed the feat/discovery-update branch from bfe6088 to 9d90924 Compare January 12, 2026 21:01

forrestmurray-db and others added 11 commits January 20, 2026 11:46

adds generated questions and summaries

70d5ee2

add dspy

590aa62

update to use new assisted facilitation approach

ea1f582

synthetic trace generation + e2e test

d330218

skill and tool updates

7d702f8

forrestmurray-db force-pushed the feat/discovery-update branch from 0cd9b12 to f0409e7 Compare January 21, 2026 21:45

forrestmurray-db and others added 16 commits January 22, 2026 08:50

Update discovery questions migrations and refactor Claude settings

077ff0b

Add new migration for randomization columns and reorganize migration files. Refactor Claude settings to move spec operations to ask list for better control. Co-Authored-By: Claude <noreply@anthropic.com>

Merge refactor/verification-skills branch

fed37d9

update skill for better test filtering

1d5653a

fix typecheck and lint errors

e752ca8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Assisted Discovery Facilitation#9

Feature: Assisted Discovery Facilitation#9
forrestmurray-db wants to merge 27 commits into
databricks-solutions:mainfrom
forrestmurray-db:feat/discovery-update

forrestmurray-db commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

forrestmurray-db commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant