Skip to content

Commit b89a15b

Browse files
author
semantic-release
committed
chore: release 0.68.0
1 parent ba049f7 commit b89a15b

2 files changed

Lines changed: 27 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,32 @@
11
# CHANGELOG
22

33

4+
## v0.68.0 (2026-03-23)
5+
6+
### Features
7+
8+
- Add standalone GRPO trainer with WAADirect (no openadapt-ml dependency)
9+
([#191](https://github.com/OpenAdaptAI/openadapt-evals/pull/191),
10+
[`ba049f7`](https://github.com/OpenAdaptAI/openadapt-evals/commit/ba049f74f82bb3034465cf4368f4808f22b5f7e9))
11+
12+
Self-contained GRPO training package that eliminates the openadapt-ml dependency for RL training.
13+
Uses direct HTTP calls to WAA Flask server (WAADirect) instead of the WAALiveAdapter +
14+
RLEnvironment stack, removing version coupling and adapter indirection.
15+
16+
Package structure (695 LOC total): - config.py: TrainingConfig dataclass - waa_direct.py: WAADirect
17+
HTTP client (screenshot/click/type/key) - prompt.py: SYSTEM_PROMPT + build_agent_messages +
18+
parse_vlm_output_to_action - reward.py: compute_group_advantages + evaluate_milestones_screenshot
19+
- model_loader.py: load_model_and_processor (HF + PEFT) - trainer.py: GRPOTrainer with rollout
20+
collection + training loop
21+
22+
Key design decisions: - ZERO openadapt-ml imports (self-contained, will migrate later) -
23+
max_new_tokens=2048 default (100 was catastrophically low) - Multi-format parser (Thought/Action,
24+
bare DSL, JSON) - Fresh screenshot for evaluation (not cached) - Per-step backward to avoid OOM on
25+
long trajectories - VLM judge via OpenAI API for milestone evaluation
26+
27+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
28+
29+
430
## v0.67.0 (2026-03-23)
531

632
### Features

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.67.0"
7+
version = "0.68.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)