File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 11# CHANGELOG
22
33
4+ ## v0.28.0 (2026-03-03)
5+
6+ ### Features
7+
8+ - ** agent** : Add closed-loop demo-conditioned controller
9+ ([ #92 ] ( https://github.com/OpenAdaptAI/openadapt-evals/pull/92 ) ,
10+ [ ` 53d0b22 ` ] ( https://github.com/OpenAdaptAI/openadapt-evals/commit/53d0b22b5c173ac0d51f241831cf7fa5f48233cb ) )
11+
12+ Add VLM-based step verification (plan_verify.py), demo-conditioned controller state machine
13+ (demo_controller.py), and plan progress tracking in the CU agent. Enables the agent to verify each
14+ step's expected outcome via screenshot, override premature "done" signals, and retry/replan failed
15+ steps.
16+
17+ Key additions: - plan_verify.py: verify_step(), verify_plan_progress(), verify_goal_completion() -
18+ demo_controller.py: DemoController state machine with step-by-step execution -
19+ claude_computer_use_agent.py: plan parsing, progress injection, done override - CLI --controller
20+ flag for both openadapt-evals and run_dc_eval.py - 120 tests (31 plan_verify + 36 demo_controller
21+ + 53 agent)
22+
23+ Validated offline: - Level 1: 91% accuracy on real eval screenshots (10/11 correct) - Level 2:
24+ Done-override correctly prevents premature quit
25+
26+ Co-authored-by: Claude Opus 4.6 < noreply@anthropic.com >
27+
28+
429## v0.27.1 (2026-03-03)
530
631### Bug Fixes
Original file line number Diff line number Diff line change @@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44
55[project ]
66name = " openadapt-evals"
7- version = " 0.27.1 "
7+ version = " 0.28.0 "
88description = " Evaluation infrastructure for GUI agent benchmarks"
99readme = " README.md"
1010requires-python = " >=3.10"
You can’t perform that action at this time.
0 commit comments