Skip to content

Commit e5a3dd8

Browse files
author
semantic-release
committed
chore: release 0.49.0
1 parent ada912d commit e5a3dd8

2 files changed

Lines changed: 50 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,55 @@
11
# CHANGELOG
22

33

4+
## v0.49.0 (2026-03-20)
5+
6+
### Documentation
7+
8+
- Comprehensive README update for planner-grounder, workflow, and training features
9+
([#158](https://github.com/OpenAdaptAI/openadapt-evals/pull/158),
10+
[`1cb83b3`](https://github.com/OpenAdaptAI/openadapt-evals/commit/1cb83b308717565984cd903a556f035c1135a170))
11+
12+
Covers ~20 PRs merged since March 17 (#134-#157): PlannerGrounderAgent dual-model architecture,
13+
TaskConfig YAML custom tasks, 4-pass workflow extraction pipeline, RL training infra (TRL GRPO
14+
rollout, AReaL workflow, OpenEnv), LocalAdapter + ScrubMiddleware for governed desktop agent,
15+
correction flywheel, strict mode, and task setup dispatch. Updated architecture tree and key files
16+
table.
17+
18+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
19+
20+
### Features
21+
22+
- Add full evaluation runner with resume support and pool integration
23+
([#160](https://github.com/OpenAdaptAI/openadapt-evals/pull/160),
24+
[`ada912d`](https://github.com/OpenAdaptAI/openadapt-evals/commit/ada912d8a7c14532cc26f3b8a62ba3b2769e3996))
25+
26+
Implement _run_external_agent in pool.py to support PlannerGrounderAgent and other external agents
27+
across pool VMs via SSH tunnels. Create run_full_eval.py script for robust unattended WAA
28+
evaluation runs with incremental JSONL checkpointing, per-task error isolation, exponential
29+
backoff retry on server drops, --resume to continue interrupted runs, --dry-run mode,
30+
--save-screenshots, progress display with ETA, and --parallel N for distributed pool execution.
31+
32+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
33+
34+
- Implement install_apps handler via winget for WAA task setup
35+
([#159](https://github.com/OpenAdaptAI/openadapt-evals/pull/159),
36+
[`233295e`](https://github.com/OpenAdaptAI/openadapt-evals/commit/233295e454a310e323eefad6dfcf1656bd4f835c))
37+
38+
Replace the warning-only stub in _config_entry_to_command with a working implementation that
39+
installs apps using Windows Package Manager (winget).
40+
41+
- Map 16 common app names (chrome, firefox, libreoffice, vlc, vscode, 7zip, notepad++, gimp, obs,
42+
audacity, paint.net) to winget package IDs - Normalize app names (hyphens/spaces to underscores)
43+
to handle WAA config inconsistencies (e.g. "libreoffice-calc" vs "libreoffice_calc") - Deduplicate
44+
installs (e.g. libreoffice_calc + libreoffice_writer both map to
45+
TheDocumentFoundation.LibreOffice) - For unknown apps, fall back to winget search and install
46+
first match - Collect failures without crashing — each app install is independent - Use 600s HTTP
47+
timeout for install_apps (vs 120s default) since winget installs can take several minutes - Accept
48+
both success (rc=0) and already-installed (rc=-1978335189)
49+
50+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
51+
52+
453
## v0.48.5 (2026-03-20)
554

655
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.48.5"
7+
version = "0.49.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)