Skip to content

Commit 4896b65

Browse files
author
semantic-release
committed
chore: release 0.24.0
1 parent 97c144b commit 4896b65

2 files changed

Lines changed: 66 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,71 @@
11
# CHANGELOG
22

33

4+
## v0.24.0 (2026-03-03)
5+
6+
### Documentation
7+
8+
- Document AWS SSO as recommended auth method
9+
([#80](https://github.com/OpenAdaptAI/openadapt-evals/pull/80),
10+
[`8812e7c`](https://github.com/OpenAdaptAI/openadapt-evals/commit/8812e7c69294d9f80c3cde723fa8838b02cad550))
11+
12+
- Update README: replace static key instructions with SSO guide including example ~/.aws/config and
13+
aws configure sso workflow - Update CLAUDE.md AWS section with SSO note - Update aws_vm.py
14+
docstring to include SSO in credential chain
15+
16+
No code changes needed — boto3's default credential chain already handles SSO transparently.
17+
18+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
19+
20+
- Update README with recent features from PRs #58-#75
21+
([#82](https://github.com/OpenAdaptAI/openadapt-evals/pull/82),
22+
[`840f9ef`](https://github.com/OpenAdaptAI/openadapt-evals/commit/840f9efcdb7561fdad43bf80f6c87e0483443f2d))
23+
24+
Add coverage for RL training environment, end-to-end eval pipeline, annotation pipeline, 4-layer
25+
probe diagnostics, demo recording persistence, review artifacts, coordinate clamping, and
26+
multi-cloud VMProvider protocol. Update architecture tree with new modules (rl_env.py, probe.py,
27+
annotation.py, vlm.py, vm_provider.py, evaluation/) and scripts directory. Add openadapt-consilium
28+
to related projects.
29+
30+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
31+
32+
### Features
33+
34+
- Add self-contained GRPO training example script
35+
([#81](https://github.com/OpenAdaptAI/openadapt-evals/pull/81),
36+
[`97c144b`](https://github.com/OpenAdaptAI/openadapt-evals/commit/97c144bbd346292eaa6c0a8b4ef5d3185868387d))
37+
38+
* feat: add self-contained GRPO training example script
39+
40+
250-line example showing the full RL training loop: model loading → rollout collection → GRPO loss →
41+
weight update → checkpoint.
42+
43+
No openadapt-ml dependency — all GRPO math, action parsing, and log-prob computation are inline.
44+
Uses RLEnvironment from openadapt-evals.
45+
46+
Includes --mock flag for testing without a VM.
47+
48+
Usage: python scripts/train_grpo_example.py --mock --num-steps 3 python
49+
scripts/train_grpo_example.py --server http://localhost:5001 --task-id <UUID>
50+
51+
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
52+
53+
* fix: align GRPO training example with openadapt-ml trainer
54+
55+
- Align SYSTEM_PROMPT with openadapt_ml.datasets.next_action.SYSTEM_PROMPT - Use chat template for
56+
prompt construction (not raw string concatenation) - Fix screen height default: 1080 (was 1200) -
57+
Fix LoRA target_modules: 4 projections (was 2) matching ml trainer - Fix coordinate fallback: use
58+
format_action_as_text with normalized fractions (was using raw pixel coords like x=960) - Add
59+
WAIT() handler in parse_action (was falling through to DONE) - Fix TYPE regex to handle escaped
60+
quotes and backslashes - Fix loss scaling: divide by (n_valid * num_steps) matching ml trainer -
61+
Rename grpo_loss to policy_gradient_loss with honest docstring - Add build_agent_messages and
62+
format_action_as_text helper functions
63+
64+
---------
65+
66+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
67+
68+
469
## v0.23.1 (2026-03-03)
570

671
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.23.1"
7+
version = "0.24.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)