|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.71.3 (2026-03-28) |
| 5 | + |
| 6 | +### Bug Fixes |
| 7 | + |
| 8 | +- Eval infra, forced keyboard override, Outlines constrained decoding |
| 9 | + ([#197](https://github.com/OpenAdaptAI/openadapt-evals/pull/197), |
| 10 | + [`257bc7f`](https://github.com/OpenAdaptAI/openadapt-evals/commit/257bc7f2f3595ed9d350ac8a482efdf805302a05)) |
| 11 | + |
| 12 | +* fix: per-step milestone tracking, forced keyboard override, eval infra |
| 13 | + |
| 14 | +Evaluation infrastructure: - Per-step milestone high-water mark: milestones checked after each step, |
| 15 | + once passed they stay passed. Fixes transient states (open dialogs) being missed by |
| 16 | + end-of-episode-only evaluation. - evaluate_checks_local() fallback: when /evaluate endpoint is |
| 17 | + down, uses task config's own command/screenshot checks via /execute_windows - iptables retry loop |
| 18 | + in start_with_evaluate.sh: ensures port 5050 exemption persists even if DNAT rule is (re)applied |
| 19 | + later |
| 20 | + |
| 21 | +Anti-loop forced override: - After 6 consecutive identical actions (planner ignoring warnings), |
| 22 | + bypasses planner entirely and emits first keyboard shortcut from demo guidance (e.g., |
| 23 | + Ctrl+Shift+Delete). This breaks click loops where the grounder places clicks incorrectly. |
| 24 | + |
| 25 | +Task setup fixes: - Chrome popup: registry policies, First Run sentinel, launch flags - Single-line |
| 26 | + PowerShell commands (fixes YAML escaping for /execute_windows) - Redesigned milestones: combined |
| 27 | + settings/dialog check, evidence-based |
| 28 | + |
| 29 | +Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 30 | + |
| 31 | +* fix: remove Alt+F4 from demo, add Outlines constrained decoding |
| 32 | + |
| 33 | +Demo fix: - Remove step 4 (Alt+F4 close Chrome) from clear-browsing-data demo. Alt+F4 on desktop |
| 34 | + triggers Windows Shutdown dialog when Chrome loses focus. The task goal is clearing data, not |
| 35 | + closing Chrome. - Updated step 2 description to include "Delete from this device" button text |
| 36 | + (newer Chrome versions changed the label). |
| 37 | + |
| 38 | +Constrained decoding (GRPO trainer): - Add `constrained_decoding` config flag (default False) - When |
| 39 | + enabled, uses Outlines RegexLogitsProcessor to force model output to match the action format regex |
| 40 | + (CLICK/TYPE/WAIT/DONE). Eliminates 5-15% of rollouts wasted on unparseable output. - Allows |
| 41 | + free-form Thought prefix before the action. - DFA compilation cached after first call (~2s |
| 42 | + one-time cost). - Graceful fallback if outlines not installed. - Added outlines>=0.1.0 to training |
| 43 | + optional dependencies. |
| 44 | + |
| 45 | +--------- |
| 46 | + |
| 47 | +Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 48 | + |
| 49 | + |
4 | 50 | ## v0.71.2 (2026-03-28) |
5 | 51 |
|
6 | 52 | ### Bug Fixes |
|
0 commit comments