Commit 67c5899
Add Critic Result Display in TUI (#360)
* Add critic result display in TUI and use latest SDK commit
- Update pyproject.toml to use commit e9a84c5 from PR #1269 (critic model feature)
- Add critic score indicator to action titles in collapsible widgets
- Display full critic evaluation results for ActionEvent and MessageEvent
- Handle FinishAction with critic results by showing full event visualize
The TUI now displays:
1. Critic scores in action/message titles with color coding (green/yellow)
2. Full critic evaluation details including probability breakdown
3. Proper formatting of critic results in both collapsed and expanded views
Co-authored-by: openhands <openhands@all-hands.dev>
* Add critic score diff display from previous action
- Track the last critic score in ConversationVisualizer
- Display diff from previous score when available (e.g., [Critic: 0.85, +0.13])
- Color code diffs: green for improvements, red for declines
- Only show diff if meaningful (≥0.01 change)
This helps users track whether the agent's actions are improving or
declining in quality over the conversation.
Co-authored-by: openhands <openhands@all-hands.dev>
* Add critic score display for MessageEvent
- Display prominent critic score header for MessageEvent with critic results
- Format: 'Critic Score: 0.85 (+0.13)' at the top of the message
- Include diff from previous score with color coding
- Consistent with ActionEvent critic display
Now both ActionEvent and MessageEvent show critic scores prominently:
- ActionEvent: In collapsible title
- MessageEvent: As a bold header at the top of the message
Co-authored-by: openhands <openhands@all-hands.dev>
* Auto-enable critic for All-Hands LLM proxy
- Add get_default_critic() helper function to auto-configure critic
- Automatically enable critic when using llm-proxy.*.all-hands.dev
- Critic uses {base_url}/vllm endpoint with 'critic' model
- Applied to both load() and create_and_save_from_settings() methods
- Gracefully handles critic initialization failures
This follows the pattern from examples/01_standalone_sdk/34_critic_model_example.py
and enables critic evaluation automatically for All-Hands hosted LLM proxy users.
Co-authored-by: openhands <openhands@all-hands.dev>
* Ensure critic is never persisted to agent settings
- Remove critic from Agent constructor in create_and_save_from_settings()
- Add critic on-the-fly after saving agent configuration
- Critic is always derived based on current LLM configuration
- This ensures critic config stays fresh and never becomes stale
The critic is now always computed dynamically in both load() and
create_and_save_from_settings() methods, preventing it from being
saved to the persistent agent settings file.
Co-authored-by: openhands <openhands@all-hands.dev>
* Fix APIBasedCritic import path
- Update import to use openhands.sdk.critic.impl.api.APIBasedCritic
- The API-based critic is in the impl.api submodule, not directly in critic
- Tested with full integration test - all tests pass
Co-authored-by: openhands <openhands@all-hands.dev>
* Display critic status in UI when enabled
- Add critic notification to splash screen when critic is configured
- Show 'Critic Enabled' message in the update notice area
- Display critic status during agent initialization
- Add debug logging to track critic configuration
This makes it clear to users when the critic is actively evaluating
their agent's actions, improving transparency and user experience.
Co-authored-by: openhands <openhands@all-hands.dev>
* Update SDK to latest commit with improved critic logging
Update to commit ed81a95d which includes better error logging for
critic evaluation failures. This will help identify why critic results
are not appearing in the TUI.
Co-authored-by: openhands <openhands@all-hands.dev>
* refactor: Remove custom critic visualization, rely on event.visualize from upstream
- Remove _last_critic_score tracking and custom critic score display
- Remove critic notification from splash screen
- Remove critic status display from setup.py
- Simplify FinishAction and MessageEvent to use event.visualize directly
- Update snapshots for intentional UI changes
The upstream SDK (event.visualize and CriticResult.visualize) already
provides proper critic score visualization.
Co-authored-by: openhands <openhands@all-hands.dev>
* bump commit
* feat(tui): implement custom critic score visualization with collapsible widget
- Extract raw message content (action.message / llm_message.content) instead of using SDK visualize
- Display message as Markdown for clean, readable text
- Add custom critic score collapsible with Rich-formatted breakdown
- Collapsed: Shows score summary (✅/1 parent 72238a8 commit 67c5899
18 files changed
Lines changed: 1292 additions & 33 deletions
File tree
- openhands_cli
- stores
- tui
- content
- modals/settings/components
- utils/critic
- widgets
- tests
- snapshots
- __snapshots__/test_critic_feedback_snapshots
- tui
- modals/settings
- utils
- critic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
126 | 127 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
| 21 | + | |
19 | 22 | | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
| 29 | + | |
26 | 30 | | |
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
33 | 79 | | |
34 | 80 | | |
35 | 81 | | |
| |||
249 | 295 | | |
250 | 296 | | |
251 | 297 | | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
252 | 304 | | |
253 | 305 | | |
254 | 306 | | |
| |||
259 | 311 | | |
260 | 312 | | |
261 | 313 | | |
| 314 | + | |
262 | 315 | | |
263 | 316 | | |
264 | 317 | | |
| |||
320 | 373 | | |
321 | 374 | | |
322 | 375 | | |
| 376 | + | |
323 | 377 | | |
324 | 378 | | |
325 | | - | |
| 379 | + | |
326 | 380 | | |
327 | 381 | | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
328 | 388 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
| 50 | + | |
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
| |||
74 | 77 | | |
75 | 78 | | |
76 | 79 | | |
| 80 | + | |
77 | 81 | | |
78 | 82 | | |
79 | 83 | | |
| |||
83 | 87 | | |
84 | 88 | | |
85 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
86 | 100 | | |
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
78 | 88 | | |
79 | 89 | | |
80 | 90 | | |
| |||
83 | 93 | | |
84 | 94 | | |
85 | 95 | | |
| 96 | + | |
86 | 97 | | |
87 | 98 | | |
88 | 99 | | |
89 | 100 | | |
| 101 | + | |
90 | 102 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
| 183 | + | |
183 | 184 | | |
184 | 185 | | |
185 | 186 | | |
| |||
348 | 349 | | |
349 | 350 | | |
350 | 351 | | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
351 | 365 | | |
352 | 366 | | |
353 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
354 | 370 | | |
355 | 371 | | |
356 | 372 | | |
| |||
376 | 392 | | |
377 | 393 | | |
378 | 394 | | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
379 | 403 | | |
380 | 404 | | |
381 | 405 | | |
| |||
489 | 513 | | |
490 | 514 | | |
491 | 515 | | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
492 | 519 | | |
493 | 520 | | |
494 | 521 | | |
| |||
720 | 747 | | |
721 | 748 | | |
722 | 749 | | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
723 | 762 | | |
724 | 763 | | |
725 | 764 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
109 | 116 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
0 commit comments