|
| 1 | +# Assisted Facilitation Specification |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Assisted Facilitation helps stakeholders identify what constitutes high/low quality AI responses for their specific domain. During the Discovery phase, participants examine traces and submit findings which are automatically classified into structured categories. Facilitators use this structured view to guide discussions and promote findings into draft rubric elements. |
| 6 | + |
| 7 | +### Goals |
| 8 | + |
| 9 | +1. Surface raw material (findings) that can be refined into rubric questions |
| 10 | +2. Provide facilitators with structured visibility into participant observations without requiring domain expertise |
| 11 | +3. Bridge discovery insights directly into rubric creation through a promotion workflow |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Core Concepts |
| 16 | + |
| 17 | +### Discovery Categories |
| 18 | + |
| 19 | +Every finding is classified into exactly one category: |
| 20 | + |
| 21 | +| Category | Description | Example | |
| 22 | +|----------|-------------|---------| |
| 23 | +| `themes` | Major patterns of quality or issues | "Response lacks source citations" | |
| 24 | +| `edge_cases` | Unusual scenarios or boundary inputs | "Behavior is inconsistent when dates span timezones" | |
| 25 | +| `boundary_conditions` | What separates quality levels | "Acceptable but would be better with primary sources" | |
| 26 | +| `failure_modes` | Ways the response can fail | "Hallucinates when context is ambiguous" | |
| 27 | +| `missing_info` | Information gaps or missing context | "No mention of limitations or caveats" | |
| 28 | + |
| 29 | +### Disagreements |
| 30 | + |
| 31 | +Disagreements are automatically detected when participants submit conflicting observations about the same trace. They are surfaced separately to facilitators to drive discussion. |
| 32 | + |
| 33 | +### Promotion |
| 34 | + |
| 35 | +Facilitators can "promote" findings to a draft rubric staging area. A promoted finding becomes raw material for rubric construction: |
| 36 | + |
| 37 | +- "Temporal context should accompany temporal questions" → potential grading criterion |
| 38 | +- "Acceptable but would be better with primary sources" → defines Likert scale boundary |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## Participant Experience |
| 43 | + |
| 44 | +### What Participants See |
| 45 | + |
| 46 | +1. **Trace view** with input/output content |
| 47 | +2. **Questions** to answer about the trace: |
| 48 | + - Q1 (always present): "What makes this response effective or ineffective?" |
| 49 | + - Q2+ (if generated by facilitator): Targeted follow-up questions |
| 50 | +3. **Fuzzy progress indicator** showing overall workshop progress |
| 51 | + - Does NOT show category-level breakdown |
| 52 | + - Does NOT show per-trace detail |
| 53 | + - Prevents biasing findings by revealing what's "missing" |
| 54 | + |
| 55 | +### Participant Actions |
| 56 | + |
| 57 | +- Submit findings (free-form text answering the displayed questions) |
| 58 | +- Navigate between traces |
| 59 | +- Mark discovery complete when finished |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## Facilitator Experience |
| 64 | + |
| 65 | +### Per-Trace Structured View |
| 66 | + |
| 67 | +For each trace, facilitators see: |
| 68 | + |
| 69 | +``` |
| 70 | +Trace 1 [Generate Question] |
| 71 | +├── themes ████░░ 2/3 [finding₁] [finding₂] |
| 72 | +├── edge_cases ██░░░░ 1/3 [finding₃] |
| 73 | +├── boundary_cond ░░░░░░ 0/3 |
| 74 | +├── failure_modes ████░░ 2/3 [finding₄] [finding₅] |
| 75 | +├── missing_info ░░░░░░ 0/3 |
| 76 | +└── disagreements █░░░░░ 1/? [Alice vs Bob: "..."] |
| 77 | +
|
| 78 | +Each finding shows: user attribution, text, [Promote] button |
| 79 | +``` |
| 80 | + |
| 81 | +### Progress Bars |
| 82 | + |
| 83 | +- Each category shows `count / threshold` |
| 84 | +- Thresholds are configurable per trace (default: 3 per category) |
| 85 | +- Facilitators adjust thresholds based on trace complexity |
| 86 | + |
| 87 | +### Facilitator Actions |
| 88 | + |
| 89 | +1. **Generate Question**: Creates a follow-up question for a trace, targeting gaps in coverage. Broadcasts to all participants viewing that trace. |
| 90 | + |
| 91 | +2. **Promote Finding**: Sends a finding to the draft rubric staging area for later use in rubric composition. |
| 92 | + |
| 93 | +3. **Adjust Thresholds**: Change "good enough" count per category per trace. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Classification |
| 98 | + |
| 99 | +### When Classification Happens |
| 100 | + |
| 101 | +Classification occurs in real-time when a participant submits a finding. |
| 102 | + |
| 103 | +### Classification Process |
| 104 | + |
| 105 | +1. Participant submits finding text |
| 106 | +2. LLM classifies into one of 5 categories (themes, edge_cases, boundary_conditions, failure_modes, missing_info) |
| 107 | +3. Finding is stored with assigned category |
| 108 | +4. Disagreement detection runs against other findings for the same trace |
| 109 | +5. Facilitator view updates |
| 110 | + |
| 111 | +### Disagreement Detection |
| 112 | + |
| 113 | +After each finding submission, compare against other findings for the same trace: |
| 114 | +- If conflicting viewpoints detected, create a Disagreement record |
| 115 | +- Disagreement includes: participating users, summary of conflict, source finding IDs |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Question Generation |
| 120 | + |
| 121 | +### Fixed Question (Q1) |
| 122 | + |
| 123 | +Every trace has Q1: "What makes this response effective or ineffective?" |
| 124 | + |
| 125 | +### Generated Questions (Q2+) |
| 126 | + |
| 127 | +Facilitator triggers generation per trace. The question: |
| 128 | +- Targets categories with gaps (below threshold) |
| 129 | +- Probes unresolved disagreements if present |
| 130 | +- Broadcasts to ALL participants on that trace (not per-user) |
| 131 | + |
| 132 | +### Generation Logic |
| 133 | + |
| 134 | +``` |
| 135 | +1. Check coverage: which categories are below threshold? |
| 136 | +2. Check disagreements: any unresolved conflicts? |
| 137 | +3. If disagreements exist and not yet probed → generate disagreement question |
| 138 | +4. Else → generate question targeting lowest-coverage category |
| 139 | +5. Store question at trace level |
| 140 | +6. All participants see new question on next load/refresh |
| 141 | +``` |
| 142 | + |
| 143 | +--- |
| 144 | + |
| 145 | +## Data Model |
| 146 | + |
| 147 | +### TraceDiscoveryState |
| 148 | + |
| 149 | +```python |
| 150 | +class TraceDiscoveryState: |
| 151 | + trace_id: str |
| 152 | + workshop_id: str |
| 153 | + |
| 154 | + # Findings by category |
| 155 | + themes: List[ClassifiedFinding] |
| 156 | + edge_cases: List[ClassifiedFinding] |
| 157 | + boundary_conditions: List[ClassifiedFinding] |
| 158 | + failure_modes: List[ClassifiedFinding] |
| 159 | + missing_info: List[ClassifiedFinding] |
| 160 | + |
| 161 | + # Auto-detected disagreements |
| 162 | + disagreements: List[Disagreement] |
| 163 | + |
| 164 | + # Questions (Q1 fixed, Q2+ generated) |
| 165 | + questions: List[DiscoveryQuestion] |
| 166 | + |
| 167 | + # Configurable per-category thresholds |
| 168 | + thresholds: Dict[str, int] # default: 3 per category |
| 169 | +``` |
| 170 | + |
| 171 | +### ClassifiedFinding |
| 172 | + |
| 173 | +```python |
| 174 | +class ClassifiedFinding: |
| 175 | + id: str |
| 176 | + trace_id: str |
| 177 | + user_id: str |
| 178 | + text: str |
| 179 | + category: str # themes | edge_cases | boundary_conditions | failure_modes | missing_info |
| 180 | + question_id: str # Which question this answered |
| 181 | + promoted: bool |
| 182 | + created_at: datetime |
| 183 | +``` |
| 184 | + |
| 185 | +### Disagreement |
| 186 | + |
| 187 | +```python |
| 188 | +class Disagreement: |
| 189 | + id: str |
| 190 | + trace_id: str |
| 191 | + user_ids: List[str] # Participants who disagree |
| 192 | + finding_ids: List[str] # The conflicting findings |
| 193 | + summary: str # LLM-generated description |
| 194 | + created_at: datetime |
| 195 | +``` |
| 196 | + |
| 197 | +### DiscoveryQuestion |
| 198 | + |
| 199 | +```python |
| 200 | +class DiscoveryQuestion: |
| 201 | + id: str # q_1, q_2, etc. |
| 202 | + trace_id: str |
| 203 | + prompt: str |
| 204 | + placeholder: Optional[str] |
| 205 | + target_category: Optional[str] # Category this question targets |
| 206 | + is_fixed: bool # True for Q1 |
| 207 | + created_at: datetime |
| 208 | +``` |
| 209 | + |
| 210 | +### DraftRubricItem |
| 211 | + |
| 212 | +```python |
| 213 | +class DraftRubricItem: |
| 214 | + id: str |
| 215 | + source_finding_id: str |
| 216 | + source_trace_id: str |
| 217 | + workshop_id: str |
| 218 | + text: str |
| 219 | + promoted_by: str # Facilitator user_id |
| 220 | + promoted_at: datetime |
| 221 | +``` |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## API Endpoints |
| 226 | + |
| 227 | +### Participant Endpoints |
| 228 | + |
| 229 | +| Method | Path | Description | |
| 230 | +|--------|------|-------------| |
| 231 | +| GET | `/workshops/{id}/traces/{trace_id}/discovery-questions` | Get questions for trace | |
| 232 | +| POST | `/workshops/{id}/findings` | Submit finding | |
| 233 | +| GET | `/workshops/{id}/discovery-progress` | Get fuzzy global progress | |
| 234 | + |
| 235 | +### Facilitator Endpoints |
| 236 | + |
| 237 | +| Method | Path | Description | |
| 238 | +|--------|------|-------------| |
| 239 | +| GET | `/workshops/{id}/traces/{trace_id}/discovery-state` | Get full structured state | |
| 240 | +| POST | `/workshops/{id}/traces/{trace_id}/generate-question` | Generate and broadcast question | |
| 241 | +| PUT | `/workshops/{id}/traces/{trace_id}/thresholds` | Update thresholds | |
| 242 | +| POST | `/workshops/{id}/findings/{finding_id}/promote` | Promote to draft rubric | |
| 243 | +| GET | `/workshops/{id}/draft-rubric` | Get promoted findings | |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Success Criteria |
| 248 | + |
| 249 | +1. Findings are classified in real-time as participants submit them |
| 250 | +2. Facilitators see per-trace structured view with category breakdown |
| 251 | +3. Facilitators can generate targeted questions that broadcast to all participants |
| 252 | +4. Disagreements are auto-detected and surfaced |
| 253 | +5. Participants see only fuzzy progress (no category bias) |
| 254 | +6. Findings can be promoted to draft rubric staging area |
| 255 | +7. Thresholds are configurable per category per trace |
0 commit comments