Skip to content

Commit 5c50e11

Browse files
author
semantic-release
committed
chore: release 0.28.0
1 parent 53d0b22 commit 5c50e11

2 files changed

Lines changed: 26 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,31 @@
11
# CHANGELOG
22

33

4+
## v0.28.0 (2026-03-03)
5+
6+
### Features
7+
8+
- **agent**: Add closed-loop demo-conditioned controller
9+
([#92](https://github.com/OpenAdaptAI/openadapt-evals/pull/92),
10+
[`53d0b22`](https://github.com/OpenAdaptAI/openadapt-evals/commit/53d0b22b5c173ac0d51f241831cf7fa5f48233cb))
11+
12+
Add VLM-based step verification (plan_verify.py), demo-conditioned controller state machine
13+
(demo_controller.py), and plan progress tracking in the CU agent. Enables the agent to verify each
14+
step's expected outcome via screenshot, override premature "done" signals, and retry/replan failed
15+
steps.
16+
17+
Key additions: - plan_verify.py: verify_step(), verify_plan_progress(), verify_goal_completion() -
18+
demo_controller.py: DemoController state machine with step-by-step execution -
19+
claude_computer_use_agent.py: plan parsing, progress injection, done override - CLI --controller
20+
flag for both openadapt-evals and run_dc_eval.py - 120 tests (31 plan_verify + 36 demo_controller
21+
+ 53 agent)
22+
23+
Validated offline: - Level 1: 91% accuracy on real eval screenshots (10/11 correct) - Level 2:
24+
Done-override correctly prevents premature quit
25+
26+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
27+
28+
429
## v0.27.1 (2026-03-03)
530

631
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.27.1"
7+
version = "0.28.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)