Skip to content

Commit 713cb90

Browse files
author
semantic-release
committed
chore: release 0.30.2
1 parent eb9bc3e commit 713cb90

2 files changed

Lines changed: 31 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,36 @@
11
# CHANGELOG
22

33

4+
## v0.30.2 (2026-03-04)
5+
6+
### Bug Fixes
7+
8+
- Condense multilevel demo PLAN from 13 to 5 phases
9+
([`7a63fa1`](https://github.com/OpenAdaptAI/openadapt-evals/commit/7a63fa134de5a2602341a898d8606e89112f0857))
10+
11+
Research (ShowUI-Aloha) recommends 3-7 high-level phases in the PLAN section. The rule-based
12+
generator produced 13 granular steps (one per demo action), which defeats the purpose of having an
13+
abstract plan.
14+
15+
Condensed to 5 phases: create sheet, headers, years, formulas+fill, format as percentage.
16+
17+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
18+
19+
- Prefer multilevel demo files over plain .txt in eval scripts
20+
([#103](https://github.com/OpenAdaptAI/openadapt-evals/pull/103),
21+
[`eb9bc3e`](https://github.com/OpenAdaptAI/openadapt-evals/commit/eb9bc3eeff06c58ea359eb2c4e56e76365b5b561))
22+
23+
When both {task_id}_multilevel.txt and {task_id}.txt exist in the demo directory, all demo file
24+
lookup paths now prefer the multilevel (Option D) format. Falls back to plain .txt, then .json for
25+
backwards compatibility.
26+
27+
Files changed: - scripts/run_dc_eval.py - scripts/run_eval_pipeline.py -
28+
openadapt_evals/benchmarks/cli.py (_suite_find_demo) -
29+
openadapt_evals/benchmarks/comparison_viewer.py
30+
31+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
32+
33+
434
## v0.30.1 (2026-03-04)
535

636
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.30.1"
7+
version = "0.30.2"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)