|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.35.2 (2026-03-08) |
| 5 | + |
| 6 | +### Bug Fixes |
| 7 | + |
| 8 | +- Detect and dismiss Windows lock screen before each task |
| 9 | + ([#117](https://github.com/OpenAdaptAI/openadapt-evals/pull/117), |
| 10 | + [`4a28653`](https://github.com/OpenAdaptAI/openadapt-evals/commit/4a2865367ce0f9bd65a61ca8168cf1676b226197)) |
| 11 | + |
| 12 | +* feat: add correction flywheel (store, capture, parser, controller hooks) |
| 13 | + |
| 14 | +Implements the correction flywheel MVP: |
| 15 | + |
| 16 | +- correction_store.py: JSON-file-based correction library with save/find (fuzzy string matching via |
| 17 | + SequenceMatcher)/load_all - correction_capture.py: Human correction capture using |
| 18 | + openadapt-capture Recorder (primary) with PIL screenshot fallback - correction_parser.py: VLM call |
| 19 | + to parse before/after screenshots into PlanStep dict (think/action/expect) - demo_controller.py: |
| 20 | + Added correction_store and enable_correction_capture params. On retry exhaustion: check correction |
| 21 | + store -> inject match, or capture human correction -> parse -> store -> advance - cli.py: Added |
| 22 | + --correction-library and --enable-correction-capture flags |
| 23 | + |
| 24 | +The loop: agent fails at step N -> correction store checked -> if match, inject corrected step -> if |
| 25 | + no match and capture enabled, human completes step -> Recorder captures -> VLM parses -> |
| 26 | + correction stored -> next run retrieves it. |
| 27 | + |
| 28 | +17 tests added, all passing. 54 existing demo_controller tests unaffected. |
| 29 | + |
| 30 | +Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
| 31 | + |
| 32 | +* fix: mock _has_recorder in correction capture test |
| 33 | + |
| 34 | +The test was calling the real Recorder which may not have wait_for_ready in the installed version. |
| 35 | + Mock it to use the simple fallback path since this is a unit test. |
| 36 | + |
| 37 | +* fix: detect and dismiss Windows lock screen before each task |
| 38 | + |
| 39 | +Add _dismiss_lock_screen() to run_dc_eval.py that checks for LogonUI.exe process and types the |
| 40 | + password to unlock if the screen is locked. Called from ensure_waa_ready() after each successful |
| 41 | + probe. |
| 42 | + |
| 43 | +This prevents eval failures when the Windows VM has been idle and the lock screen has engaged |
| 44 | + between tasks or between sessions. |
| 45 | + |
| 46 | +* chore: sync beads state |
| 47 | + |
| 48 | +--------- |
| 49 | + |
| 50 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 51 | + |
| 52 | + |
4 | 53 | ## v0.35.1 (2026-03-07) |
5 | 54 |
|
6 | 55 | ### Bug Fixes |
|
0 commit comments