Skip to content

Commit 8983744

Browse files
author
semantic-release
committed
chore: release 0.35.2
1 parent 4a28653 commit 8983744

2 files changed

Lines changed: 50 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,55 @@
11
# CHANGELOG
22

33

4+
## v0.35.2 (2026-03-08)
5+
6+
### Bug Fixes
7+
8+
- Detect and dismiss Windows lock screen before each task
9+
([#117](https://github.com/OpenAdaptAI/openadapt-evals/pull/117),
10+
[`4a28653`](https://github.com/OpenAdaptAI/openadapt-evals/commit/4a2865367ce0f9bd65a61ca8168cf1676b226197))
11+
12+
* feat: add correction flywheel (store, capture, parser, controller hooks)
13+
14+
Implements the correction flywheel MVP:
15+
16+
- correction_store.py: JSON-file-based correction library with save/find (fuzzy string matching via
17+
SequenceMatcher)/load_all - correction_capture.py: Human correction capture using
18+
openadapt-capture Recorder (primary) with PIL screenshot fallback - correction_parser.py: VLM call
19+
to parse before/after screenshots into PlanStep dict (think/action/expect) - demo_controller.py:
20+
Added correction_store and enable_correction_capture params. On retry exhaustion: check correction
21+
store -> inject match, or capture human correction -> parse -> store -> advance - cli.py: Added
22+
--correction-library and --enable-correction-capture flags
23+
24+
The loop: agent fails at step N -> correction store checked -> if match, inject corrected step -> if
25+
no match and capture enabled, human completes step -> Recorder captures -> VLM parses ->
26+
correction stored -> next run retrieves it.
27+
28+
17 tests added, all passing. 54 existing demo_controller tests unaffected.
29+
30+
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
31+
32+
* fix: mock _has_recorder in correction capture test
33+
34+
The test was calling the real Recorder which may not have wait_for_ready in the installed version.
35+
Mock it to use the simple fallback path since this is a unit test.
36+
37+
* fix: detect and dismiss Windows lock screen before each task
38+
39+
Add _dismiss_lock_screen() to run_dc_eval.py that checks for LogonUI.exe process and types the
40+
password to unlock if the screen is locked. Called from ensure_waa_ready() after each successful
41+
probe.
42+
43+
This prevents eval failures when the Windows VM has been idle and the lock screen has engaged
44+
between tasks or between sessions.
45+
46+
* chore: sync beads state
47+
48+
---------
49+
50+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
51+
52+
453
## v0.35.1 (2026-03-07)
554

655
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.35.1"
7+
version = "0.35.2"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)