Skip to content

Commit 67c5899

Browse files
Add Critic Result Display in TUI (#360)
* Add critic result display in TUI and use latest SDK commit - Update pyproject.toml to use commit e9a84c5 from PR #1269 (critic model feature) - Add critic score indicator to action titles in collapsible widgets - Display full critic evaluation results for ActionEvent and MessageEvent - Handle FinishAction with critic results by showing full event visualize The TUI now displays: 1. Critic scores in action/message titles with color coding (green/yellow) 2. Full critic evaluation details including probability breakdown 3. Proper formatting of critic results in both collapsed and expanded views Co-authored-by: openhands <openhands@all-hands.dev> * Add critic score diff display from previous action - Track the last critic score in ConversationVisualizer - Display diff from previous score when available (e.g., [Critic: 0.85, +0.13]) - Color code diffs: green for improvements, red for declines - Only show diff if meaningful (≥0.01 change) This helps users track whether the agent's actions are improving or declining in quality over the conversation. Co-authored-by: openhands <openhands@all-hands.dev> * Add critic score display for MessageEvent - Display prominent critic score header for MessageEvent with critic results - Format: 'Critic Score: 0.85 (+0.13)' at the top of the message - Include diff from previous score with color coding - Consistent with ActionEvent critic display Now both ActionEvent and MessageEvent show critic scores prominently: - ActionEvent: In collapsible title - MessageEvent: As a bold header at the top of the message Co-authored-by: openhands <openhands@all-hands.dev> * Auto-enable critic for All-Hands LLM proxy - Add get_default_critic() helper function to auto-configure critic - Automatically enable critic when using llm-proxy.*.all-hands.dev - Critic uses {base_url}/vllm endpoint with 'critic' model - Applied to both load() and create_and_save_from_settings() methods - Gracefully handles critic initialization failures This follows the pattern from examples/01_standalone_sdk/34_critic_model_example.py and enables critic evaluation automatically for All-Hands hosted LLM proxy users. Co-authored-by: openhands <openhands@all-hands.dev> * Ensure critic is never persisted to agent settings - Remove critic from Agent constructor in create_and_save_from_settings() - Add critic on-the-fly after saving agent configuration - Critic is always derived based on current LLM configuration - This ensures critic config stays fresh and never becomes stale The critic is now always computed dynamically in both load() and create_and_save_from_settings() methods, preventing it from being saved to the persistent agent settings file. Co-authored-by: openhands <openhands@all-hands.dev> * Fix APIBasedCritic import path - Update import to use openhands.sdk.critic.impl.api.APIBasedCritic - The API-based critic is in the impl.api submodule, not directly in critic - Tested with full integration test - all tests pass Co-authored-by: openhands <openhands@all-hands.dev> * Display critic status in UI when enabled - Add critic notification to splash screen when critic is configured - Show 'Critic Enabled' message in the update notice area - Display critic status during agent initialization - Add debug logging to track critic configuration This makes it clear to users when the critic is actively evaluating their agent's actions, improving transparency and user experience. Co-authored-by: openhands <openhands@all-hands.dev> * Update SDK to latest commit with improved critic logging Update to commit ed81a95d which includes better error logging for critic evaluation failures. This will help identify why critic results are not appearing in the TUI. Co-authored-by: openhands <openhands@all-hands.dev> * refactor: Remove custom critic visualization, rely on event.visualize from upstream - Remove _last_critic_score tracking and custom critic score display - Remove critic notification from splash screen - Remove critic status display from setup.py - Simplify FinishAction and MessageEvent to use event.visualize directly - Update snapshots for intentional UI changes The upstream SDK (event.visualize and CriticResult.visualize) already provides proper critic score visualization. Co-authored-by: openhands <openhands@all-hands.dev> * bump commit * feat(tui): implement custom critic score visualization with collapsible widget - Extract raw message content (action.message / llm_message.content) instead of using SDK visualize - Display message as Markdown for clean, readable text - Add custom critic score collapsible with Rich-formatted breakdown - Collapsed: Shows score summary (✅/⚠️ + score value) - Expanded: Shows detailed sentiment predictions and risk indicators with color coding - Mount critic collapsible as separate widget after message (no container wrapper) - Preserves Rich Text formatting throughout (no string conversion that loses colors) Co-authored-by: openhands <openhands@all-hands.dev> * feat(critic): add comprehensive taxonomy for critic rubrics Add taxonomy defining features for critic evaluation and user follow-up prediction: - General context & task classification (2 features) - Agent behavioral issues (13 binary features) - User follow-up patterns (9 features, requires user reply) - Infrastructure issues (2 binary features) Features grouped into: - BEHAVIORAL_ISSUES: Agent issues + User follow-up patterns (22 features) - ALL_FEATURES: Complete taxonomy (26 features) Includes helper methods: - get_feature_names(category): Get features by category - get_binary_features(), get_classification_features(), get_text_features() - get_user_reply_dependent_features(): Features requiring user reply - validate_feature_value(): Validate feature values against taxonomy Based on production trace analysis for intrinsic evaluation. Co-authored-by: openhands <openhands@all-hands.dev> * docs: add critic taxonomy documentation Add comprehensive documentation for the critic rubrics taxonomy including: - Overview of all 26 features across 4 categories - Detailed descriptions of each feature - Usage examples with Python code - Feature distribution breakdown - Implementation notes on user reply dependency and grouping Co-authored-by: openhands <openhands@all-hands.dev> * feat(tui): group critic score features by taxonomy categories Update critic collapsible visualization to use taxonomy categories: - General Context & Task Classification (cyan) - sentiments, user goals - Agent Behavioral Issues (red) - agent mistakes and failures - User Follow-Up Patterns (magenta) - user corrections and concerns - Infrastructure Issues (yellow) - environment/platform problems Color coding by severity: - General context: cyan (high confidence) to dim (low confidence) - Issues: red bold (≥0.7) → red (≥0.5) → yellow (≥0.3) → dim Features sorted by probability within each category for better readability. Co-authored-by: openhands <openhands@all-hands.dev> * refactor(critic): simplify taxonomy to feature-to-category mapping Simplify critic taxonomy from complex class structure to simple dictionary: - Remove ClassVar, Literal types, and helper methods - Keep only essential FEATURE_CATEGORIES dict mapping features to categories - Add simple get_category() function - Update visualizer to use simplified taxonomy - Remove emoji from critic score title - Simplify tests to match new structure Result: 282 lines → 48 lines of taxonomy code Same functionality with much cleaner implementation. Co-authored-by: openhands <openhands@all-hands.dev> * refactor(critic): extract visualization logic to critic_utils.py Move all critic visualization code from richlog_visualizer.py to a dedicated utility module: - Create openhands_cli/critic_utils.py with visualization logic - Extract 180+ lines of critic-specific code - Functions: create_critic_collapsible(), _build_critic_content(), etc. - richlog_visualizer.py now simply imports and calls the utility Benefits: - Cleaner separation of concerns - richlog_visualizer.py is more focused and maintainable - Critic visualization logic can be reused elsewhere - Easier to test critic rendering in isolation Co-authored-by: openhands <openhands@all-hands.dev> * feat(critic): show predicted sentiment in title, simplify score display Changes to critic visualization: - Extract predicted sentiment (highest probability) and show in collapsible title Example: "Critic Score: 0.8500 | Predicted Sentiment: Neutral (0.77)" - Remove success/needs improvement interpretation, just show raw score Before: "Score: 0.8500 (success)" After: "Score: 0.8500" - Skip sentiment_* features from categorized breakdown (shown in title instead) - Clean up feature name formatting (no longer need sentiment prefix removal) Result: More concise, informative title with predicted sentiment at a glance. Co-authored-by: openhands <openhands@all-hands.dev> * feat(critic): improve category labels and expand by default Update critic visualization with clearer category naming: - "Detected Agent Behavioral Issues" - Issues that already occurred - "Predicted User Follow-Up Patterns" - What user likely does next - "Detected Infrastructure Issues" - Infrastructure problems found Changes: - Remove "General Context & Task Classification" section (sentiment in title) - Expand collapsible by default (collapsed=False) for immediate visibility - Clean up color coding logic (remove unused general category branch) - Skip general_context features entirely (only sentiment, already in title) Result: Clearer temporal distinction between detected issues (past) and predicted patterns (future). Co-authored-by: openhands <openhands@all-hands.dev> * refactor(critic): reorganize into tui/utils/critic/ package Move critic-related files into better organized package structure: - openhands_cli/critic_taxonomy.py → openhands_cli/tui/utils/critic/taxonomy.py - openhands_cli/critic_utils.py → openhands_cli/tui/utils/critic/visualization.py - Add openhands_cli/tui/utils/critic/__init__.py with public API exports Update all imports: - richlog_visualizer.py: from openhands_cli.tui.utils.critic import create_critic_collapsible - visualization.py: from openhands_cli.tui.utils.critic.taxonomy import FEATURE_CATEGORIES - test_critic_taxonomy.py: from openhands_cli.tui.utils.critic import ... Benefits: - Better code organization (critic code grouped together) - Clear namespace hierarchy (tui/utils/critic/*) - Cleaner imports via __init__.py - All tests still passing Co-authored-by: openhands <openhands@all-hands.dev> * feat(critic): add colored title, remove duplicate score, reduce padding Enhance critic visualization with better visual presentation: Title improvements: - Build Rich Text title with colored score and sentiment - Score: green (success) or yellow (needs improvement) - Sentiment: green (Positive), red (Negative), yellow (Neutral) - Example: "Critic Score: 0.8500 | Predicted Sentiment: Neutral (0.77)" Content improvements: - Remove duplicate "Score: 0.8500" line from body (already in title) - Start directly with categorized feature breakdown - More compact presentation Layout improvements: - Reduce collapsible padding from default to (0, 0, 0, 1) for compact display - Less vertical whitespace, better information density Collapsible widget updates: - Accept str | Text for title parameter (was str only) - Support Rich Text in both __init__ and update_title methods - Add Text import for type hints Result: More visually appealing, color-coded, compact critic display. Co-authored-by: openhands <openhands@all-hands.dev> * tui: normalize sentiment probabilities with softmax in critic display Apply softmax normalization to the three sentiment classes (positive, negative, neutral) before selecting the predicted sentiment. This ensures probabilities sum to 1.0 and form a proper probability distribution. Co-authored-by: openhands <openhands@all-hands.dev> * tweak viz * bump version * refactor(critic): use SDK taxonomy instead of local implementation - Remove local taxonomy.py and CRITIC_TAXONOMY.md - Update visualization.py to use pre-categorized features from SDK metadata - Update tests to import taxonomy from openhands.sdk.critic - Update pyproject.toml to use SDK commit with taxonomy support The SDK now provides categorized_features in critic_result.metadata, ready for visualization in the CLI. Co-authored-by: openhands <openhands@all-hands.dev> * critic: add user feedback widget with auto-focus and settings toggle - Add CriticFeedbackWidget with auto-focus on mount (no click required) - Add 5 feedback options (0-4): dismiss, overestimation, underestimation, about right, doesn't make sense - Send feedback to PostHog with critic_score, conversation_id, and event_ids for reproducibility - Add enable_critic setting (default: true) in CLI settings menu - Add splash screen notification when critic is active (dimmed text, proper padding) - Detection based on agent.critic presence (not URL pattern) - Improved thank you message: simple and clear Co-authored-by: openhands <openhands@all-hands.dev> * refactor(critic): use SDK taxonomy instead of local implementation - Remove local taxonomy.py and CRITIC_TAXONOMY.md - Update visualization.py to use pre-categorized features from SDK metadata - Update tests to import taxonomy from openhands.sdk.critic - Update pyproject.toml to use SDK commit with taxonomy support The SDK now provides categorized_features in critic_result.metadata, ready for visualization in the CLI. Co-authored-by: openhands <openhands@all-hands.dev> * chore: update SDK commit to include 0.2 display threshold Co-authored-by: openhands <openhands@all-hands.dev> * style(critic): update visualization to match SDK format - Add score explanation (0-1, higher is better) - Use inline features with dot separators - Use parentheses for probabilities - Update SDK commit to 1b03e98 Co-authored-by: openhands <openhands@all-hands.dev> * style(critic): update feedback options wording and order - [1] Just about right - [2] Overestimation (agent performs better than predicted) - [3] Underestimation (agent performs worse than predicted) - [4] Not applicable - [0] Dismiss Co-authored-by: openhands <openhands@all-hands.dev> * remove color * reword * style(critic): simplify feedback options and add top margin Options simplified to: - [1] Accurate [2] Too high [3] Too low [4] N/A [0] Dismiss Added top margin for better visual separation. Co-authored-by: openhands <openhands@all-hands.dev> * style(critic): bold question and choice numbers in feedback widget Co-authored-by: openhands <openhands@all-hands.dev> * style(critic): show score with 2 decimal places - Update SDK commit to ec361345 - Update CLI visualization to match Co-authored-by: openhands <openhands@all-hands.dev> * update lock * test: remove test_critic_taxonomy.py that imports non-existent SDK functions The test file was importing FEATURE_CATEGORIES, categorize_features, and get_category from openhands.sdk.critic, but these functions don't exist in the SDK. This was causing: - Unit test collection failures - Pre-commit pyright type checking failures Co-authored-by: openhands <openhands@all-hands.dev> * Apply suggestion from @xingyaoww * test: fix failing unit tests for critic feature - Update snapshot tests for visualizer to match new critic display - Fix test_save_writes_expected_json_format to include enable_critic field - Fix test_critic_feedback_initial_render to match bold-formatted option numbers Co-authored-by: openhands <openhands@all-hands.dev> * Update SDK to latest critic visualization (star rating + likelihood %) SDK commit: 851144e8dc939c963a64e1b3320f4f2ebbd2228d New visualization format: - Critic: agent success likelihood ★★★☆☆ (65.0%) Potential Issues: Did Not Follow Instruction (likelihood 35%) Co-authored-by: openhands <openhands@all-hands.dev> * Update critic visualization to star rating format, filter CLI sections SDK commit: b6a4376f3218529b4b7d2727306523489ba13045 Changes: - Use star rating format: 'Critic: agent success likelihood ★★★☆☆ (65.0%)' - Show likelihood percentages: 'Did Not Follow Instruction (likelihood 35%)' - Filter out 'Likely Follow-up' and 'Other' sections for CLI - Keep only 'Potential Issues' and 'Infrastructure' sections - Remove yellow color from headers (just bold) Co-authored-by: openhands <openhands@all-hands.dev> * Collapse critic widget when no content to display If there are no Potential Issues or Infrastructure issues to show, the collapsible starts collapsed instead of expanded with empty content. Co-authored-by: openhands <openhands@all-hands.dev> * Auto-dismiss feedback widget when user sends new message When the user ignores the critic feedback prompt and continues sending messages, the feedback widget is automatically removed from the UI. Co-authored-by: openhands <openhands@all-hands.dev> * Update uv.lock with latest SDK Co-authored-by: openhands <openhands@all-hands.dev> * Apply suggestion from @xingyaoww * Add PostHog critic inference event, agent model tracking, and clickable buttons - Send 'critic_inference' event to PostHog when critic result is displayed - Add agent model name to both critic_inference and critic_feedback events - Replace text-based feedback options with clickable buttons (while keeping keyboard shortcuts) - Update splash.py critic notice to mention usage metrics collection - Add tests for new functionality Co-authored-by: openhands <openhands@all-hands.dev> * Fix button text visibility in CriticFeedbackWidget Add explicit color: $foreground to Button CSS rules to ensure button text is visible against the dark background. The buttons were previously invisible due to missing text color styling. Co-authored-by: openhands <openhands@all-hands.dev> * Make CriticFeedbackWidget more compact and remove border - Remove yellow border around the widget - Set transparent background - Reduce padding and margins for a more compact layout - Remove focus border styling (not needed without visible border) Co-authored-by: openhands <openhands@all-hands.dev> * Add breathing room to CriticFeedbackWidget - Add vertical margin (1 0) to separate from surrounding content - Add margin-top to button row for spacing from question text Co-authored-by: openhands <openhands@all-hands.dev> * Make feedback buttons smaller and more subtle - Set height to 1 line (single row buttons) - Remove border, use transparent background - Reduce min-width and padding - Add subtle hover effect - Make dismiss button muted color Co-authored-by: openhands <openhands@all-hands.dev> * Fix button text visibility and use consistent width - Remove height: 1 which was hiding text - Set fixed width: 14 for all buttons for consistency - Use subtle dark background instead of transparent - Add hover effect for better feedback Co-authored-by: openhands <openhands@all-hands.dev> * Make all buttons same color and reduce size - Remove special dismiss button styling (all same color now) - Set height: 1 and min-height: 1 for compact buttons - Remove border for cleaner look - Reduce padding to 0 1 - Reduce width to 12 - Remove text-style bold Co-authored-by: openhands <openhands@all-hands.dev> * Fix button text visibility - remove height constraint Height: 1 was cutting off button text. Removed height/min-height and padding constraints to let buttons render at default size with visible text. Co-authored-by: openhands <openhands@all-hands.dev> * Use compact buttons for smaller, single-line feedback widget - Add compact=True to all feedback buttons for single-line display - Remove unused .dismiss class styling - Add snapshot test for visual verification of button styling Co-authored-by: openhands <openhands@all-hands.dev> * bump commit * Fix lint issues: pyright type errors and ruff formatting - Add hasattr check and type: ignore comments for BaseConversation.agent access - Apply ruff formatting fixes to feedback.py and test files - Remove unused import in test_feedback.py Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 72238a8 commit 67c5899

18 files changed

Lines changed: 1292 additions & 33 deletions

File tree

openhands_cli/setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,4 +123,5 @@ def setup_conversation(
123123
conversation.set_confirmation_policy(confirmation_policy)
124124

125125
console.print(f"✓ Agent initialized with model: {agent.llm.model}", style="green")
126+
126127
return conversation

openhands_cli/stores/agent_store.py

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from __future__ import annotations
33

44
import os
5+
import re
56
from typing import Any
67

78
from prompt_toolkit import HTML, print_formatted_text
@@ -16,20 +17,65 @@
1617
LocalFileStore,
1718
)
1819
from openhands.sdk.context import load_project_skills
20+
from openhands.sdk.critic.base import CriticBase
21+
from openhands.sdk.critic.impl.api import APIBasedCritic
1922
from openhands.tools.preset.default import get_default_tools
2023
from openhands_cli.locations import (
2124
AGENT_SETTINGS_PATH,
2225
PERSISTENCE_DIR,
2326
WORK_DIR,
2427
)
2528
from openhands_cli.mcp.mcp_utils import list_enabled_servers
29+
from openhands_cli.stores.cli_settings import CliSettings
2630
from openhands_cli.utils import (
2731
get_llm_metadata,
2832
get_os_description,
2933
should_set_litellm_extra_body,
3034
)
3135

3236

37+
def get_default_critic(llm: LLM, *, enable_critic: bool = True) -> CriticBase | None:
38+
"""Auto-configure critic for All-Hands LLM proxy.
39+
40+
When the LLM base_url matches `llm-proxy.*.all-hands.dev`, returns an
41+
APIBasedCritic configured with:
42+
- server_url: {base_url}/vllm
43+
- api_key: same as LLM
44+
- model_name: "critic"
45+
46+
Returns None if base_url doesn't match, api_key is not set, or enable_critic
47+
is False.
48+
49+
Args:
50+
llm: The LLM configuration
51+
enable_critic: Whether critic feature is enabled (from settings)
52+
"""
53+
# Check if critic is enabled in settings
54+
if not enable_critic:
55+
return None
56+
57+
base_url = llm.base_url
58+
api_key = llm.api_key
59+
if base_url is None or api_key is None:
60+
return None
61+
62+
# Match: llm-proxy.{env}.all-hands.dev (e.g., staging, prod, eval, app)
63+
pattern = r"^https?://llm-proxy\.[^./]+\.all-hands\.dev"
64+
if not re.match(pattern, base_url):
65+
return None
66+
67+
try:
68+
return APIBasedCritic(
69+
server_url=f"{base_url.rstrip('/')}/vllm",
70+
api_key=api_key,
71+
model_name="critic",
72+
)
73+
except Exception:
74+
# If critic creation fails, silently return None
75+
# This allows the CLI to continue working without critic
76+
return None
77+
78+
3379
DEFAULT_LLM_BASE_URL = "https://llm-proxy.app.all-hands.dev/"
3480

3581
# Environment variable names for LLM configuration
@@ -249,6 +295,12 @@ def load(self, session_id: str | None = None) -> Agent | None:
249295
)
250296
condenser = LLMSummarizingCondenser(llm=condenser_llm)
251297

298+
# Auto-configure critic if applicable
299+
cli_settings = CliSettings.load()
300+
critic = get_default_critic(
301+
updated_llm, enable_critic=cli_settings.enable_critic
302+
)
303+
252304
# Update tools and context
253305
agent = agent.model_copy(
254306
update={
@@ -259,6 +311,7 @@ def load(self, session_id: str | None = None) -> Agent | None:
259311
else {},
260312
"agent_context": agent_context,
261313
"condenser": condenser,
314+
"critic": critic,
262315
}
263316
)
264317

@@ -320,9 +373,16 @@ def create_and_save_from_settings(
320373
tools=get_default_tools(enable_browser=False),
321374
mcp_config={},
322375
condenser=condenser,
376+
# Note: critic is NOT included here - it will be derived on-the-fly
323377
)
324378

325-
# Save the agent configuration
379+
# Save the agent configuration (without critic)
326380
self.save(agent)
327381

382+
# Now add critic on-the-fly for the returned agent (not persisted)
383+
cli_settings = CliSettings.load()
384+
critic = get_default_critic(llm, enable_critic=cli_settings.enable_critic)
385+
if critic is not None:
386+
agent = agent.model_copy(update={"critic": critic})
387+
328388
return agent

openhands_cli/stores/cli_settings.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ class CliSettings(BaseModel):
1212

1313
default_cells_expanded: bool = True
1414
auto_open_plan_panel: bool = True
15+
enable_critic: bool = True
1516

1617
@classmethod
1718
def get_config_path(cls) -> Path:

openhands_cli/tui/content/splash.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,15 @@ def get_openhands_banner() -> str:
3939
return "\n".join(padded_lines)
4040

4141

42-
def get_splash_content(conversation_id: str, *, theme: Theme) -> dict:
42+
def get_splash_content(
43+
conversation_id: str, *, theme: Theme, has_critic: bool = False
44+
) -> dict:
4345
"""Get structured splash screen content for native Textual widgets.
4446
4547
Args:
4648
conversation_id: Optional conversation ID to display
4749
theme: Theme to use for colors
50+
has_critic: Whether the agent has a critic configured
4851
"""
4952
# Use theme colors
5053
primary_color = theme.primary
@@ -74,6 +77,7 @@ def get_splash_content(conversation_id: str, *, theme: Theme) -> dict:
7477
),
7578
],
7679
"update_notice": None,
80+
"critic_notice": None,
7781
}
7882

7983
# Add update notification if needed
@@ -83,4 +87,14 @@ def get_splash_content(conversation_id: str, *, theme: Theme) -> dict:
8387
"Run 'uv tool upgrade openhands' to update"
8488
)
8589

90+
# Add critic notification if enabled
91+
if has_critic:
92+
content["critic_notice"] = (
93+
f"\n[{primary_color}]Experimental Critic Feature Enabled[/]\n"
94+
"[dim]We've detected you're using the OpenHands LLM provider. "
95+
"An experimental critic feature is now active (free) to predict task "
96+
"success. We will collect usage metrics and your feedback "
97+
"for critic improvement. You can disable this in settings.[/dim]"
98+
)
99+
86100
return content

openhands_cli/tui/modals/settings/components/cli_settings_tab.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,16 @@ def compose(self) -> ComposeResult:
7575
value=self.cli_settings.auto_open_plan_panel,
7676
)
7777

78+
yield SettingsSwitch(
79+
label="Enable Critic (Experimental)",
80+
description=(
81+
"When enabled and using OpenHands LLM provider, an experimental "
82+
"critic feature will predict task success and collect feedback. "
83+
),
84+
switch_id="enable_critic_switch",
85+
value=self.cli_settings.enable_critic,
86+
)
87+
7888
def get_cli_settings(self) -> CliSettings:
7989
"""Get the current CLI settings from the form."""
8090
default_cells_expanded_switch = self.query_one(
@@ -83,8 +93,10 @@ def get_cli_settings(self) -> CliSettings:
8393
auto_open_plan_panel_switch = self.query_one(
8494
"#auto_open_plan_panel_switch", Switch
8595
)
96+
enable_critic_switch = self.query_one("#enable_critic_switch", Switch)
8697

8798
return CliSettings(
8899
default_cells_expanded=default_cells_expanded_switch.value,
89100
auto_open_plan_panel=auto_open_plan_panel_switch.value,
101+
enable_critic=enable_critic_switch.value,
90102
)

openhands_cli/tui/textual_app.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,7 @@ def compose(self) -> ComposeResult:
180180
)
181181
yield Static(id="splash_instructions", classes="splash-instruction")
182182
yield Static(id="splash_update_notice", classes="splash-update-notice")
183+
yield Static(id="splash_critic_notice", classes="splash-critic-notice")
183184

184185
# Input area - docked to bottom
185186
with Container(id="input_area"):
@@ -348,9 +349,24 @@ def _initialize_main_ui(self) -> None:
348349
if self.is_ui_initialized:
349350
return
350351

352+
# Check if agent has critic configured
353+
has_critic = False
354+
try:
355+
from openhands_cli.stores import AgentStore
356+
357+
agent_store = AgentStore()
358+
agent = agent_store.load()
359+
if agent:
360+
has_critic = agent.critic is not None
361+
except Exception:
362+
# If we can't load agent, just continue without critic notice
363+
pass
364+
351365
# Get structured splash content
352366
splash_content = get_splash_content(
353-
conversation_id=self.conversation_id.hex, theme=OPENHANDS_THEME
367+
conversation_id=self.conversation_id.hex,
368+
theme=OPENHANDS_THEME,
369+
has_critic=has_critic,
354370
)
355371

356372
# Update individual splash widgets
@@ -376,6 +392,14 @@ def _initialize_main_ui(self) -> None:
376392
else:
377393
update_notice_widget.display = False
378394

395+
# Update critic notice (hide if None)
396+
critic_notice_widget = self.query_one("#splash_critic_notice", Static)
397+
if splash_content["critic_notice"]:
398+
critic_notice_widget.update(splash_content["critic_notice"])
399+
critic_notice_widget.display = True
400+
else:
401+
critic_notice_widget.display = False
402+
379403
# Process any queued inputs
380404
self._process_queued_inputs()
381405
self.is_ui_initialized = True
@@ -489,6 +513,9 @@ def _handle_command(self, command: str) -> None:
489513

490514
async def _handle_user_message(self, user_message: str) -> None:
491515
"""Handle regular user messages with the conversation runner."""
516+
# Dismiss any pending critic feedback widgets when user sends a new message
517+
self._dismiss_pending_feedback_widgets()
518+
492519
# Check if conversation runner is initialized
493520
if self.conversation_runner is None:
494521
self.conversation_runner = self.create_conversation_runner()
@@ -720,6 +747,18 @@ def _handle_feedback_command(self) -> None:
720747
severity="information",
721748
)
722749

750+
def _dismiss_pending_feedback_widgets(self) -> None:
751+
"""Remove all pending CriticFeedbackWidget instances from the UI.
752+
753+
Called when user sends a new message, indicating they chose to
754+
ignore the feedback prompt.
755+
"""
756+
from openhands_cli.tui.utils.critic.feedback import CriticFeedbackWidget
757+
758+
# Find and remove all CriticFeedbackWidget instances
759+
for widget in self.main_display.query(CriticFeedbackWidget):
760+
widget.remove()
761+
723762
def _handle_new_command(self) -> None:
724763
"""Handle the /new command to start a new conversation."""
725764
self._conversation_manager.create_new()

openhands_cli/tui/textual_app.tcss

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,4 +106,11 @@ Footer {
106106
background: $background;
107107
color: $primary;
108108
margin: 1 0 0 0;
109+
}
110+
111+
.splash-critic-notice {
112+
padding: 0 1;
113+
background: $background;
114+
color: $foreground;
115+
margin: 0;
109116
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
"""Critic visualization utilities."""
2+
3+
from openhands_cli.tui.utils.critic.feedback import send_critic_inference_event
4+
from openhands_cli.tui.utils.critic.visualization import create_critic_collapsible
5+
6+
7+
__all__ = [
8+
"create_critic_collapsible",
9+
"send_critic_inference_event",
10+
]

0 commit comments

Comments
 (0)