Skip to content

Commit 0cd9b12

Browse files
Add assisted facilitation spec
Defines the discovery phase facilitation feature: - Per-trace structured view with 5 category buckets - Real-time classification of participant findings - Facilitator-controlled question generation (broadcast per trace) - Auto-detected disagreements between participants - Promotion workflow to draft rubric staging area - Fuzzy progress for participants (no category bias) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 007b75c commit 0cd9b12

2 files changed

Lines changed: 277 additions & 0 deletions

File tree

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# Assisted Facilitation Specification
2+
3+
## Overview
4+
5+
Assisted Facilitation helps stakeholders identify what constitutes high/low quality AI responses for their specific domain. During the Discovery phase, participants examine traces and submit findings which are automatically classified into structured categories. Facilitators use this structured view to guide discussions and promote findings into draft rubric elements.
6+
7+
### Goals
8+
9+
1. Surface raw material (findings) that can be refined into rubric questions
10+
2. Provide facilitators with structured visibility into participant observations without requiring domain expertise
11+
3. Bridge discovery insights directly into rubric creation through a promotion workflow
12+
13+
---
14+
15+
## Core Concepts
16+
17+
### Discovery Categories
18+
19+
Every finding is classified into exactly one category:
20+
21+
| Category | Description | Example |
22+
|----------|-------------|---------|
23+
| `themes` | Major patterns of quality or issues | "Response lacks source citations" |
24+
| `edge_cases` | Unusual scenarios or boundary inputs | "Behavior is inconsistent when dates span timezones" |
25+
| `boundary_conditions` | What separates quality levels | "Acceptable but would be better with primary sources" |
26+
| `failure_modes` | Ways the response can fail | "Hallucinates when context is ambiguous" |
27+
| `missing_info` | Information gaps or missing context | "No mention of limitations or caveats" |
28+
29+
### Disagreements
30+
31+
Disagreements are automatically detected when participants submit conflicting observations about the same trace. They are surfaced separately to facilitators to drive discussion.
32+
33+
### Promotion
34+
35+
Facilitators can "promote" findings to a draft rubric staging area. A promoted finding becomes raw material for rubric construction:
36+
37+
- "Temporal context should accompany temporal questions" → potential grading criterion
38+
- "Acceptable but would be better with primary sources" → defines Likert scale boundary
39+
40+
---
41+
42+
## Participant Experience
43+
44+
### What Participants See
45+
46+
1. **Trace view** with input/output content
47+
2. **Questions** to answer about the trace:
48+
- Q1 (always present): "What makes this response effective or ineffective?"
49+
- Q2+ (if generated by facilitator): Targeted follow-up questions
50+
3. **Fuzzy progress indicator** showing overall workshop progress
51+
- Does NOT show category-level breakdown
52+
- Does NOT show per-trace detail
53+
- Prevents biasing findings by revealing what's "missing"
54+
55+
### Participant Actions
56+
57+
- Submit findings (free-form text answering the displayed questions)
58+
- Navigate between traces
59+
- Mark discovery complete when finished
60+
61+
---
62+
63+
## Facilitator Experience
64+
65+
### Per-Trace Structured View
66+
67+
For each trace, facilitators see:
68+
69+
```
70+
Trace 1 [Generate Question]
71+
├── themes ████░░ 2/3 [finding₁] [finding₂]
72+
├── edge_cases ██░░░░ 1/3 [finding₃]
73+
├── boundary_cond ░░░░░░ 0/3
74+
├── failure_modes ████░░ 2/3 [finding₄] [finding₅]
75+
├── missing_info ░░░░░░ 0/3
76+
└── disagreements █░░░░░ 1/? [Alice vs Bob: "..."]
77+
78+
Each finding shows: user attribution, text, [Promote] button
79+
```
80+
81+
### Progress Bars
82+
83+
- Each category shows `count / threshold`
84+
- Thresholds are configurable per trace (default: 3 per category)
85+
- Facilitators adjust thresholds based on trace complexity
86+
87+
### Facilitator Actions
88+
89+
1. **Generate Question**: Creates a follow-up question for a trace, targeting gaps in coverage. Broadcasts to all participants viewing that trace.
90+
91+
2. **Promote Finding**: Sends a finding to the draft rubric staging area for later use in rubric composition.
92+
93+
3. **Adjust Thresholds**: Change "good enough" count per category per trace.
94+
95+
---
96+
97+
## Classification
98+
99+
### When Classification Happens
100+
101+
Classification occurs in real-time when a participant submits a finding.
102+
103+
### Classification Process
104+
105+
1. Participant submits finding text
106+
2. LLM classifies into one of 5 categories (themes, edge_cases, boundary_conditions, failure_modes, missing_info)
107+
3. Finding is stored with assigned category
108+
4. Disagreement detection runs against other findings for the same trace
109+
5. Facilitator view updates
110+
111+
### Disagreement Detection
112+
113+
After each finding submission, compare against other findings for the same trace:
114+
- If conflicting viewpoints detected, create a Disagreement record
115+
- Disagreement includes: participating users, summary of conflict, source finding IDs
116+
117+
---
118+
119+
## Question Generation
120+
121+
### Fixed Question (Q1)
122+
123+
Every trace has Q1: "What makes this response effective or ineffective?"
124+
125+
### Generated Questions (Q2+)
126+
127+
Facilitator triggers generation per trace. The question:
128+
- Targets categories with gaps (below threshold)
129+
- Probes unresolved disagreements if present
130+
- Broadcasts to ALL participants on that trace (not per-user)
131+
132+
### Generation Logic
133+
134+
```
135+
1. Check coverage: which categories are below threshold?
136+
2. Check disagreements: any unresolved conflicts?
137+
3. If disagreements exist and not yet probed → generate disagreement question
138+
4. Else → generate question targeting lowest-coverage category
139+
5. Store question at trace level
140+
6. All participants see new question on next load/refresh
141+
```
142+
143+
---
144+
145+
## Data Model
146+
147+
### TraceDiscoveryState
148+
149+
```python
150+
class TraceDiscoveryState:
151+
trace_id: str
152+
workshop_id: str
153+
154+
# Findings by category
155+
themes: List[ClassifiedFinding]
156+
edge_cases: List[ClassifiedFinding]
157+
boundary_conditions: List[ClassifiedFinding]
158+
failure_modes: List[ClassifiedFinding]
159+
missing_info: List[ClassifiedFinding]
160+
161+
# Auto-detected disagreements
162+
disagreements: List[Disagreement]
163+
164+
# Questions (Q1 fixed, Q2+ generated)
165+
questions: List[DiscoveryQuestion]
166+
167+
# Configurable per-category thresholds
168+
thresholds: Dict[str, int] # default: 3 per category
169+
```
170+
171+
### ClassifiedFinding
172+
173+
```python
174+
class ClassifiedFinding:
175+
id: str
176+
trace_id: str
177+
user_id: str
178+
text: str
179+
category: str # themes | edge_cases | boundary_conditions | failure_modes | missing_info
180+
question_id: str # Which question this answered
181+
promoted: bool
182+
created_at: datetime
183+
```
184+
185+
### Disagreement
186+
187+
```python
188+
class Disagreement:
189+
id: str
190+
trace_id: str
191+
user_ids: List[str] # Participants who disagree
192+
finding_ids: List[str] # The conflicting findings
193+
summary: str # LLM-generated description
194+
created_at: datetime
195+
```
196+
197+
### DiscoveryQuestion
198+
199+
```python
200+
class DiscoveryQuestion:
201+
id: str # q_1, q_2, etc.
202+
trace_id: str
203+
prompt: str
204+
placeholder: Optional[str]
205+
target_category: Optional[str] # Category this question targets
206+
is_fixed: bool # True for Q1
207+
created_at: datetime
208+
```
209+
210+
### DraftRubricItem
211+
212+
```python
213+
class DraftRubricItem:
214+
id: str
215+
source_finding_id: str
216+
source_trace_id: str
217+
workshop_id: str
218+
text: str
219+
promoted_by: str # Facilitator user_id
220+
promoted_at: datetime
221+
```
222+
223+
---
224+
225+
## API Endpoints
226+
227+
### Participant Endpoints
228+
229+
| Method | Path | Description |
230+
|--------|------|-------------|
231+
| GET | `/workshops/{id}/traces/{trace_id}/discovery-questions` | Get questions for trace |
232+
| POST | `/workshops/{id}/findings` | Submit finding |
233+
| GET | `/workshops/{id}/discovery-progress` | Get fuzzy global progress |
234+
235+
### Facilitator Endpoints
236+
237+
| Method | Path | Description |
238+
|--------|------|-------------|
239+
| GET | `/workshops/{id}/traces/{trace_id}/discovery-state` | Get full structured state |
240+
| POST | `/workshops/{id}/traces/{trace_id}/generate-question` | Generate and broadcast question |
241+
| PUT | `/workshops/{id}/traces/{trace_id}/thresholds` | Update thresholds |
242+
| POST | `/workshops/{id}/findings/{finding_id}/promote` | Promote to draft rubric |
243+
| GET | `/workshops/{id}/draft-rubric` | Get promoted findings |
244+
245+
---
246+
247+
## Success Criteria
248+
249+
1. Findings are classified in real-time as participants submit them
250+
2. Facilitators see per-trace structured view with category breakdown
251+
3. Facilitators can generate targeted questions that broadcast to all participants
252+
4. Disagreements are auto-detected and surfaced
253+
5. Participants see only fuzzy progress (no category bias)
254+
6. Findings can be promoted to draft rubric staging area
255+
7. Thresholds are configurable per category per trace

specs/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This directory contains declarative specifications for the Human Evaluation Work
66

77
| Spec | Domain | Key Concepts |
88
|------|--------|--------------|
9+
| [ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md) | Discovery & Facilitation | discovery, facilitation, findings, classification, promotion, rubric bridge |
910
| [AUTHENTICATION_SPEC](./AUTHENTICATION_SPEC.md) | Auth & Sessions | login, permissions, session, Databricks auth, fallback |
1011
| [ANNOTATION_SPEC](./ANNOTATION_SPEC.md) | Annotation System | annotation, rating, editing, MLflow feedback, comments |
1112
| [DATASETS_SPEC](./DATASETS_SPEC.md) | Trace Datasets | dataset, labeling dataset, composition, randomization, per-user order |
@@ -23,6 +24,27 @@ This directory contains declarative specifications for the Human Evaluation Work
2324

2425
Use this index to find relevant specs by keyword.
2526

27+
### Discovery & Assisted Facilitation
28+
- **discovery**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md), [DISCOVERY_TRACE_ASSIGNMENT_SPEC](./DISCOVERY_TRACE_ASSIGNMENT_SPEC.md)
29+
- **assisted facilitation**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
30+
- **finding**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
31+
- **classification**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
32+
- **themes**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
33+
- **edge_cases**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
34+
- **boundary_conditions**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
35+
- **failure_modes**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
36+
- **disagreement**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
37+
- **promote**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
38+
- **promotion**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
39+
- **draft rubric**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
40+
- **progress bar**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
41+
- **fuzzy progress**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
42+
- **question generation**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
43+
- **broadcast**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
44+
- **DSPy**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
45+
- **TraceDiscoveryState**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
46+
- **ClassifiedFinding**[ASSISTED_FACILITATION_SPEC](./ASSISTED_FACILITATION_SPEC.md)
47+
2648
### Authentication & Authorization
2749
- **login**[AUTHENTICATION_SPEC](./AUTHENTICATION_SPEC.md)
2850
- **logout**[AUTHENTICATION_SPEC](./AUTHENTICATION_SPEC.md)

0 commit comments

Comments
 (0)