Skip to content

Commit c6b1fdd

Browse files
author
semantic-release
committed
chore: release 0.70.2
1 parent fa1a9c4 commit c6b1fdd

2 files changed

Lines changed: 25 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,30 @@
11
# CHANGELOG
22

33

4+
## v0.70.2 (2026-03-26)
5+
6+
### Bug Fixes
7+
8+
- Switch distillation collection to WAADirect for reliable task setup
9+
([#194](https://github.com/OpenAdaptAI/openadapt-evals/pull/194),
10+
[`fa1a9c4`](https://github.com/OpenAdaptAI/openadapt-evals/commit/fa1a9c4a9fc830315b99f655228e03e7aa4cd434))
11+
12+
Replace RLEnvironment + WAALiveAdapter with WAADirect in the distillation data collection script.
13+
The adapter layer fails on custom YAML task IDs and doesn't reset the environment properly.
14+
15+
Key changes: - Load task configs from --task-dir (YAML/JSON files) via TaskConfig.from_dir() - Use
16+
WAADirect.setup_task(task_config.to_waa_config()) for environment reset - Use
17+
WAADirect.screenshot() and execute_action() instead of env.step() - Evaluate via
18+
evaluate_milestones_screenshot() on fresh post-episode screenshot - Fix Anthropic API call: always
19+
use max_tokens (not max_completion_tokens) - Add --eval-model flag for milestone VLM evaluation
20+
model - Add --task-dir as required arg (replaces server-side task discovery)
21+
22+
Kept unchanged: TeacherAgent, PlannerTrajectoryLogger (keep_failed=True), CostTracker, resume
23+
support, graceful shutdown handling.
24+
25+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
26+
27+
428
## v0.70.1 (2026-03-26)
529

630
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.70.1"
7+
version = "0.70.2"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)