@@ -3,144 +3,147 @@ title: "Changelog"
33description : " Product updates and release notes for HUD SDK and Platform."
44---
55
6- <Update label = " March 16, 2026" description = " v0.5.29 – v0.5.33" >
6+ <Update label = " May 6, 2026" >
7+ ## Models, Tasksets, Templates & Sharing
8+
9+ ### Platform
10+
11+ - ** Models directory refresh** — ` /models ` is a single unified list with Private and Trainable filters and a live usage column on every row.
12+ - ** Taskset analytics tab** — dedicated analytics view on tasksets with charts and richer summaries.
13+ - ** Multi-environment taskset selection** — pick multiple environments at once when configuring a taskset run.
14+ - ** Run from suggested tasksets** — kick off an evaluation from a model's suggested-taskset row with the model already locked in.
15+ - ** Templates and workflow orchestration** — templates settings page and a right-click workflow entry point for repeatable runs.
16+ - ** Resource sharing** — invite users or whole teams to traces, jobs, evalsets, models, registry items, and collections with a unified accept flow.
17+ - ** Trace grader info** — evaluation cards on traces show the grader that produced each result.
18+ </Update >
19+
20+ <Update label = " March 16, 2026" >
721## A2A Chat, Citations, GPT-5 & CLI Sync
822
9- - ** A2A chat orchestrator** — agent-to-agent communication for multi-agent workflows with input handling and follow-up turns
10- - ** ` hud sync tasks ` ** — new CLI command to sync task definitions from Python files or directories to the platform
11- - ** ` hud sync env ` ** — new CLI command replacing ` hud link ` , syncing local environment configs with collision detection
12- - ** ` hud eval ` accepts Python files** — run evaluations directly from ` .py ` files and directories containing ` Task ` objects
13- - ** Chat class** — new ` Chat ` abstraction in the SDK for managing multi-turn agent conversations
14- - ** GPT-5 support** — ` ResponseAgent ` defaults to ` gpt-5 ` , with ToolSearch tool support
15- - ** Citations** — citation support for Claude, Gemini, and OpenAI responses in chat and agent traces
16- - ** JPEG compression for screenshots** — reduces token usage for Anthropic computer use with configurable quality
17- - ** Interactive deploy collision handling** — ` hud deploy ` now prompts when environment names collide instead of silently overwriting
18- - ** Configurable bash timeout** — computer tool bash sessions support custom timeout values (previously hardcoded)
23+ - ** A2A chat orchestrator** — agent-to-agent communication for multi-agent workflows with input handling and follow-up turns.
24+ - ** ` hud sync tasks ` ** — sync task definitions from Python files or directories to the platform.
25+ - ** ` hud sync env ` ** — sync local environment configs with collision detection (replaces ` hud link ` ).
26+ - ** ` hud eval ` accepts Python files** — run evaluations directly from ` .py ` files and directories containing ` Task ` objects.
27+ - ** Chat class** — manage multi-turn agent conversations from a single SDK abstraction.
28+ - ** GPT-5 support** — ` ResponseAgent ` defaults to ` gpt-5 ` , with ToolSearch tool support.
29+ - ** Citations** — citation support for Claude, Gemini, and OpenAI responses in chat and agent traces.
1930
2031### Platform
2132
22- - ** Click & scroll coordinate overlays** — computer use traces render click coordinates and scroll actions directly on screenshots
23- - ** Trace-level QA workflows** — run QA workflows across all tasks from the trace table, with screenshot input and per-task status tracking
24- - ** Evalset environment filtering** — filter results by environment version, with earliest-version-only toggle
25- - ** EvaluationResult info viewer** — inspect the full ` info ` field of evaluation results directly in the UI
26- - ** Individual user spend** — usage page now shows per-user spend alongside team totals
27- - ** Inline job renaming** — rename jobs directly from the jobs page
28- - ** Resizable task name column** — longer task slugs visible with a resizable column and higher character limit
29- - ** Vendor portal** — new vendor-facing site for RFP intake and bid management
30- - ** Modal integration** — run environments on Modal compute infrastructure
31- - ** Resources section** — new ` /resources ` page with published articles
33+ - ** Click & scroll coordinate overlays** — computer use traces render click coordinates and scroll actions directly on screenshots.
34+ - ** Trace-level QA workflows** — run QA workflows across all tasks from the trace table, with screenshot input and per-task status.
35+ - ** Evalset environment filtering** — filter results by environment version, with an earliest-version-only toggle.
36+ - ** EvaluationResult info viewer** — inspect the full ` info ` field of evaluation results directly in the UI.
37+ - ** Individual user spend** — usage page shows per-user spend alongside team totals.
38+ - ** Inline job renaming** — rename jobs directly from the jobs page.
39+ - ** Modal integration** — run environments on Modal compute infrastructure.
40+ - ** Resources section** — new ` /resources ` page with published articles.
3241</Update >
3342
34- <Update label = " February 16, 2026" description = " v0.5.18 – v0.5.28 " >
43+ <Update label = " February 16, 2026" >
3544## Opus 4.6 Computer Use, Streaming & Deploy Improvements
3645
37- - ** Opus 4.6 computer tool** — native support for Claude Opus 4.6 computer use with zoom and screenshot gating
38- - ** Fine-grained tool streaming** — opt-in streaming for individual tool results during agent execution
39- - ** ` hud deploy ` build args & secrets** — pass build arguments and secrets to environment container builds
40- - ** ` allowed_tools ` in ` @env.scenario ` ** — scope tool access per evaluation scenario via the decorator
41- - ** Retry logic for MCP errors** — automatic retry with backoff for 5xx errors from ` mcp.hud.ai `
42- - ** Checkpoint configs** — configure checkpoint behavior for long-running evaluations
43- - ** Subagent instrumentation** — telemetry now captures subagent spans for nested agent workflows
46+ - ** Opus 4.6 computer tool** — native support for Claude Opus 4.6 computer use with zoom and screenshot gating.
47+ - ** Fine-grained tool streaming** — opt-in streaming for individual tool results during agent execution.
48+ - ** ` hud deploy ` build args & secrets** — pass build arguments and secrets to environment container builds.
49+ - ** ` allowed_tools ` in ` @env.scenario ` ** — scope tool access per evaluation scenario via the decorator.
50+ - ** Checkpoint configs** — configure checkpoint behavior for long-running evaluations.
4451
4552### Platform
4653
47- - ** Billing refactor** — auto top-up, redesigned billing page, and per-key pricing for HUD-managed API keys
48- - ** Trace viewer enhancements** — strip review mode, inline run switching, file attachment display
49- - ** System prompt in trace viewer** — system prompt visible (collapsed by default) in the trace sidebar
50- - ** Trace comments** — add and edit comments on individual traces, visible as a dedicated column in taskset view
51- - ** Training jobs dashboard** — dedicated section for RL training jobs with detail pages
52- - ** Native binarization toggle** — pass/fail binarization for taskset evaluations, built into the platform
53- - ** Column ordering** — reorder columns in the taskset table view
54- - ** Model & environment sorting** — sort taskset results by model, environment, and environment version
54+ - ** Billing refactor** — auto top-up, redesigned billing page, and per-key pricing for HUD-managed API keys.
55+ - ** Trace viewer enhancements** — strip review mode, inline run switching, and file attachment display.
56+ - ** Trace comments** — add and edit comments on individual traces, with a dedicated column in taskset view.
57+ - ** Training jobs dashboard** — dedicated section for RL training jobs with detail pages.
58+ - ** Native binarization toggle** — pass/fail binarization for taskset evaluations, built into the platform.
59+ - ** Column ordering** — reorder columns in the taskset table view.
60+ - ** Model & environment sorting** — sort taskset results by model, environment, and environment version.
5561</Update >
5662
57- <Update label = " January 12, 2026" description = " v0.5.5 – v0.5.17 " >
63+ <Update label = " January 12, 2026" >
5864## CLI Refinements & Leaderboard Redesign
5965
60- - ** Build args for ` hud deploy ` ** — pass custom build arguments to environment container builds
61- - ** Subagent telemetry** — telemetry instrumentation for subagent spans within nested workflows
62- - ** Server output validation** — runtime validation of MCP server responses
63- - ** Wildcard tools** — environments can expose ` * ` to allow all tools without explicit registration
64- - ** CLI mode distinction** — ` hud build ` and ` hud analyze ` distinguish between HTTP and stdio modes
66+ - ** Build args for ` hud deploy ` ** — pass custom build arguments to environment container builds.
67+ - ** Wildcard tools** — environments can expose ` * ` to allow all tools without explicit registration.
68+ - ** CLI mode distinction** — ` hud build ` and ` hud analyze ` distinguish between HTTP and stdio modes.
6569
6670### Platform
6771
68- - ** Leaderboard redesign** — redesigned leaderboards with publishing flow, public visibility, and embedding support
69- - ** Slack bot** — Slack integration for job notifications and external integration provider support
70- - ** Trace compact view** — compact trace view with column reorder, inline comments, and truncated task names
71- - ** BYOK API keys** — bring-your-own-key support with ` use_hud_key ` option for user-managed API keys
72- - ** Per-key pricing** — individual pricing tiers for HUD-managed API keys
73- - ** Jobs page improvements** — compact job list view, stats section updates
72+ - ** Leaderboard redesign** — redesigned leaderboards with publishing flow, public visibility, and embedding support.
73+ - ** Slack bot** — Slack integration for job notifications and external integration providers.
74+ - ** Trace compact view** — compact trace view with column reorder, inline comments, and truncated task names.
75+ - ** BYOK API keys** — bring-your-own-key support with a ` use_hud_key ` option for user-managed API keys.
76+ - ** Per-key pricing** — individual pricing tiers for HUD-managed API keys.
77+ - ** Jobs page improvements** — compact job list view and refreshed stats.
7478</Update >
7579
76- <Update label = " December 17, 2025" description = " v0.5.0 – v0.5.4 " >
80+ <Update label = " December 17, 2025" >
7781## v0.5.0: MCP-First Architecture
7882
79- - ** Environments decoupled** — environment definitions moved to separate repos, enabling independent versioning and community contributions
80- - ** Unified scenario/tool/prompt/resource handling** — single abstraction layer for MCP servers and client-side tools, with caching and hot-reload
81- - ** New telemetry ** — OpenTelemetry-based instrumentation with trace IDs, subagent spans, and structured logging
82- - ** Scenario decorator** — ` @env.scenario ` for defining evaluation scenarios with typed configuration
83- - ** RL training** — initial support for reinforcement learning training via the CLI
83+ - ** Environments decoupled** — environment definitions moved to separate repos, enabling independent versioning and community contributions.
84+ - ** Unified scenario/tool/prompt/resource handling** — single abstraction layer for MCP servers and client-side tools, with caching and hot-reload.
85+ - ** Telemetry ** — trace IDs, subagent spans, and structured logging for agent runs.
86+ - ** Scenario decorator** — ` @env.scenario ` for defining evaluation scenarios with typed configuration.
87+ - ** RL training** — initial support for reinforcement learning training via the CLI.
8488
8589### Platform
8690
87- - ** Inference API usage tracking** — track inference API usage on the usage page
88- - ** HUD-managed API keys** — platform-side API key management with ` set api_key ` support
91+ - ** Inference API usage tracking** — track inference API usage on the usage page.
92+ - ** HUD-managed API keys** — platform-side API key management with ` set api_key ` support.
8993</Update >
9094
91- <Update label = " October 1, 2025" description = " v0.4.49 – v0.4.74 " >
95+ <Update label = " October 1, 2025" >
9296## Bedrock, Gemini & Expanded Model Support
9397
94- - ** AWS Bedrock** — ` hud-python[bedrock] ` extra for running Claude agents via AWS Bedrock
95- - ** Gemini CUA** — Gemini computer use agent support with checkpoint management
96- - ** Qwen computer tool** — QwenComputerTool for Qwen-series models
97- - ** MCP server support** — use HUD environments as MCP servers, integrating with any MCP-compatible client
98- - ** Telemetry tracing** — structured telemetry for agent runs with trace export
98+ - ** AWS Bedrock** — ` hud-python[bedrock] ` extra for running Claude agents via AWS Bedrock.
99+ - ** Gemini CUA** — Gemini computer use agent support with checkpoint management.
100+ - ** Qwen computer tool** — QwenComputerTool for Qwen-series models.
101+ - ** MCP server support** — use HUD environments as MCP servers, integrating with any MCP-compatible client.
102+ - ** Telemetry tracing** — structured telemetry for agent runs with trace export.
99103
100104### Platform
101105
102- - ** Text trace viewer** — view text-only agent traces with dedicated viewer
103- - ** Leaderboard embeds** — embed leaderboards in external pages
104- - ** Versioned models** — unified evalsets and leaderboards with versioned model support
105- - ** Usage tracking & billing** — Stripe integration, subscription management, and usage analytics
106+ - ** Text trace viewer** — view text-only agent traces with a dedicated viewer.
107+ - ** Leaderboard embeds** — embed leaderboards in external pages.
108+ - ** Versioned models** — unified evalsets and leaderboards with versioned model support.
109+ - ** Usage tracking & billing** — usage analytics and subscription management.
106110</Update >
107111
108- <Update label = " August 23, 2025" description = " v0.3.0 – v0.4.48 " >
112+ <Update label = " August 23, 2025" >
109113## CLI & Claude Agent
110114
111- - ** ` hud ` CLI** — full CLI for the development lifecycle: ` init ` , ` dev ` , ` build ` , ` deploy ` , ` eval ` , ` analyze ` , ` debug `
112- - ** Claude agent with prompt caching** — built-in Claude agent with Anthropic prompt caching for reduced latency and cost
113- - ** Pre-filtered tools** — agents receive only the tools relevant to their current scenario
114- - ** User-provided system prompts** — custom system prompts for tasksets and individual tasks
115+ - ** ` hud ` CLI** — full CLI for the development lifecycle: ` init ` , ` dev ` , ` build ` , ` deploy ` , ` eval ` , ` analyze ` , ` debug ` .
116+ - ** Claude agent with prompt caching** — built-in Claude agent with reduced latency and cost.
117+ - ** Pre-filtered tools** — agents receive only the tools relevant to their current scenario.
118+ - ** User-provided system prompts** — custom system prompts for tasksets and individual tasks.
115119
116120### Platform
117121
118- - ** Trace viewer** — full trace exploration UI with step-by-step replay of agent actions and screenshots
119- - ** Leaderboards & scorecards** — evalset leaderboards with scorecard breakdowns
120- - ** Jobs & runs display** — view agent runs with step-by-step screenshots and action metadata
121- - ** Public trace sharing** — publish and share individual traces publicly
122+ - ** Trace viewer** — full trace exploration UI with step-by-step replay of agent actions and screenshots.
123+ - ** Leaderboards & scorecards** — evalset leaderboards with scorecard breakdowns.
124+ - ** Jobs & runs display** — view agent runs with step-by-step screenshots and action metadata.
125+ - ** Public trace sharing** — publish and share individual traces publicly.
122126</Update >
123127
124- <Update label = " April 18, 2025" description = " v0.1.5 – v0.2.0 " >
128+ <Update label = " April 18, 2025" >
125129## Environment Controllers & Docker Support
126130
127- - ** Client-side environment management** — local Docker-based environment execution with copy-to/from support
128- - ** Claude adapter** — built-in adapter for Anthropic Claude computer use and Operator
129- - ** Gymnasium wrapper** — ` gym.make() ` compatibility for RL-style agent training loops
130- - ** Evaluator framework** — pluggable evaluators with structured logging and result export
131+ - ** Client-side environment management** — local Docker-based environment execution with copy-to/from support.
132+ - ** Claude adapter** — built-in adapter for Anthropic Claude computer use and Operator.
133+ - ** Gymnasium wrapper** — ` gym.make() ` compatibility for RL-style agent training loops.
134+ - ** Evaluator framework** — pluggable evaluators with structured logging and result export.
131135
132136### Platform
133137
134- - ** Platform launch** — dashboard at hud.ai with authentication and evalset browsing
135- - ** API keys management** — create and manage API keys from the dashboard
136- - ** Profile & team pages** — user profiles with team membership and settings
138+ - ** Platform launch** — dashboard at hud.ai with authentication and evalset browsing.
139+ - ** API keys management** — create and manage API keys from the dashboard.
140+ - ** Profile & team pages** — user profiles with team membership and settings.
137141</Update >
138142
139- <Update label = " March 3, 2025" description = " v0.1.0 " >
143+ <Update label = " March 3, 2025" >
140144## Initial Release
141145
142- - ** Open-source SDK** — ` pip install hud-python ` for AI agent evaluation and RL environments
143- - ** Core primitives** — environments, tasks, evaluators, and runs as first-class objects
144- - ** Computer use actions** — keyboard, mouse, scroll, keyup/keydown, and hold-key actions for desktop environments
145- - ** Mintlify docs** — documentation site at docs.hud.ai
146+ - ** Open-source SDK** — ` pip install hud-python ` for AI agent evaluation and RL environments.
147+ - ** Core primitives** — environments, tasks, evaluators, and runs as first-class objects.
148+ - ** Computer use actions** — keyboard, mouse, scroll, keyup/keydown, and hold-key actions for desktop environments.
146149</Update >
0 commit comments