Skip to content

Commit 04e23c7

Browse files
author
semantic-release
committed
chore: release 0.38.0
1 parent 72aa537 commit 04e23c7

2 files changed

Lines changed: 66 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,71 @@
11
# CHANGELOG
22

33

4+
## v0.38.0 (2026-03-17)
5+
6+
### Documentation
7+
8+
- Add example Flask server for HttpAgent protocol
9+
([#120](https://github.com/OpenAdaptAI/openadapt-evals/pull/120),
10+
[`0d6fa24`](https://github.com/OpenAdaptAI/openadapt-evals/commit/0d6fa2494005287d0694e722f742fa9725770b71))
11+
12+
Minimal reference implementation showing the POST /act request/response contract. Copy-and-replace
13+
the predict() function with your model.
14+
15+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16+
17+
- Add experiment framework design document
18+
([#121](https://github.com/OpenAdaptAI/openadapt-evals/pull/121),
19+
[`f71c81f`](https://github.com/OpenAdaptAI/openadapt-evals/commit/f71c81f49a4a94583ff51d64df34ed858407e12b))
20+
21+
Frames OpenAdapt as a general-purpose computer use framework with multiple experiment tracks
22+
(demo-conditioning, LoRA-per-task, GRPO, SFT, API baselines, UI-Venus base model). Covers
23+
autoresearch pattern adaptation, wright+autoresearch composition, tiered oracle architecture,
24+
multi-objective scoring, mutation surface ordering, and reproducibility requirements.
25+
26+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
27+
28+
### Features
29+
30+
- Add GiGPO anchor state computation to WAADesktopEnv
31+
([#122](https://github.com/OpenAdaptAI/openadapt-evals/pull/122),
32+
[`72aa537`](https://github.com/OpenAdaptAI/openadapt-evals/commit/72aa5370de78eeb8821a39a9a27e6bdc76073978))
33+
34+
* feat: add GiGPO anchor state computation to WAADesktopEnv
35+
36+
Add compute_anchor_state() function that produces a state key for GiGPO cross-rollout grouping. Uses
37+
a11y tree SHA256 hash (primary) with screenshot MD5 fallback. The state_key is included in the
38+
info dict from both reset() and step() so VAGEN/verl can use it for O(1) anchor grouping instead
39+
of recomputing perceptual hashes.
40+
41+
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
42+
43+
* docs: clarify VAGEN vs verl-agent distinction in decision doc
44+
45+
Add dated addendum (2026-03-16) correcting the earlier conflation of VAGEN and verl-agent as a
46+
single project. Key findings: VAGEN-Lite dropped Bi-Level GAE (only vanilla GRPO/PPO), GiGPO lives
47+
exclusively in verl-agent which uses its own env_base.py interface (not GymImageEnv), and our
48+
train_verl_e2e.py targets the wrong entry point. Outlines a corrected two-phase path: standalone
49+
GRPO first, then direct verl-agent integration if per-step credit is needed.
50+
51+
* docs: add comprehensive GRPO training research report
52+
53+
Covers desktop RL landscape (30+ projects), per-step credit assignment alternatives (HCAPO
54+
recommended over GiGPO), scaling architectures (ComputerRL, DART-GUI), and synthetic environment
55+
feasibility (GUI-Genesis). Includes revised architecture recommendation: standalone GRPO + HCAPO
56+
first, then dense rewards + API-GUI hybrid, then async scaling.
57+
58+
* docs: correct prioritization — validate GRPO before optimizing training math
59+
60+
HCAPO and per-step credit are Phase 3 optimizations, not Phase 1. The bottleneck is rollout success
61+
rate (getting non-zero rewards), not loss computation. Dense partial-credit rewards and API-GUI
62+
hybrid actions directly increase gradient signal and should come first.
63+
64+
---------
65+
66+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
67+
68+
469
## v0.37.0 (2026-03-16)
570

671
### Features

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.37.0"
7+
version = "0.38.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)