|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.38.0 (2026-03-17) |
| 5 | + |
| 6 | +### Documentation |
| 7 | + |
| 8 | +- Add example Flask server for HttpAgent protocol |
| 9 | + ([#120](https://github.com/OpenAdaptAI/openadapt-evals/pull/120), |
| 10 | + [`0d6fa24`](https://github.com/OpenAdaptAI/openadapt-evals/commit/0d6fa2494005287d0694e722f742fa9725770b71)) |
| 11 | + |
| 12 | +Minimal reference implementation showing the POST /act request/response contract. Copy-and-replace |
| 13 | + the predict() function with your model. |
| 14 | + |
| 15 | +Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 16 | + |
| 17 | +- Add experiment framework design document |
| 18 | + ([#121](https://github.com/OpenAdaptAI/openadapt-evals/pull/121), |
| 19 | + [`f71c81f`](https://github.com/OpenAdaptAI/openadapt-evals/commit/f71c81f49a4a94583ff51d64df34ed858407e12b)) |
| 20 | + |
| 21 | +Frames OpenAdapt as a general-purpose computer use framework with multiple experiment tracks |
| 22 | + (demo-conditioning, LoRA-per-task, GRPO, SFT, API baselines, UI-Venus base model). Covers |
| 23 | + autoresearch pattern adaptation, wright+autoresearch composition, tiered oracle architecture, |
| 24 | + multi-objective scoring, mutation surface ordering, and reproducibility requirements. |
| 25 | + |
| 26 | +Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 27 | + |
| 28 | +### Features |
| 29 | + |
| 30 | +- Add GiGPO anchor state computation to WAADesktopEnv |
| 31 | + ([#122](https://github.com/OpenAdaptAI/openadapt-evals/pull/122), |
| 32 | + [`72aa537`](https://github.com/OpenAdaptAI/openadapt-evals/commit/72aa5370de78eeb8821a39a9a27e6bdc76073978)) |
| 33 | + |
| 34 | +* feat: add GiGPO anchor state computation to WAADesktopEnv |
| 35 | + |
| 36 | +Add compute_anchor_state() function that produces a state key for GiGPO cross-rollout grouping. Uses |
| 37 | + a11y tree SHA256 hash (primary) with screenshot MD5 fallback. The state_key is included in the |
| 38 | + info dict from both reset() and step() so VAGEN/verl can use it for O(1) anchor grouping instead |
| 39 | + of recomputing perceptual hashes. |
| 40 | + |
| 41 | +Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 42 | + |
| 43 | +* docs: clarify VAGEN vs verl-agent distinction in decision doc |
| 44 | + |
| 45 | +Add dated addendum (2026-03-16) correcting the earlier conflation of VAGEN and verl-agent as a |
| 46 | + single project. Key findings: VAGEN-Lite dropped Bi-Level GAE (only vanilla GRPO/PPO), GiGPO lives |
| 47 | + exclusively in verl-agent which uses its own env_base.py interface (not GymImageEnv), and our |
| 48 | + train_verl_e2e.py targets the wrong entry point. Outlines a corrected two-phase path: standalone |
| 49 | + GRPO first, then direct verl-agent integration if per-step credit is needed. |
| 50 | + |
| 51 | +* docs: add comprehensive GRPO training research report |
| 52 | + |
| 53 | +Covers desktop RL landscape (30+ projects), per-step credit assignment alternatives (HCAPO |
| 54 | + recommended over GiGPO), scaling architectures (ComputerRL, DART-GUI), and synthetic environment |
| 55 | + feasibility (GUI-Genesis). Includes revised architecture recommendation: standalone GRPO + HCAPO |
| 56 | + first, then dense rewards + API-GUI hybrid, then async scaling. |
| 57 | + |
| 58 | +* docs: correct prioritization — validate GRPO before optimizing training math |
| 59 | + |
| 60 | +HCAPO and per-step credit are Phase 3 optimizations, not Phase 1. The bottleneck is rollout success |
| 61 | + rate (getting non-zero rewards), not loss computation. Dense partial-credit rewards and API-GUI |
| 62 | + hybrid actions directly increase gradient signal and should come first. |
| 63 | + |
| 64 | +--------- |
| 65 | + |
| 66 | +Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 67 | + |
| 68 | + |
4 | 69 | ## v0.37.0 (2026-03-16) |
5 | 70 |
|
6 | 71 | ### Features |
|
0 commit comments