Skip to content

Commit 0922b0a

Browse files
author
semantic-release
committed
chore: release 0.81.3
1 parent 3b8c1c2 commit 0922b0a

2 files changed

Lines changed: 24 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,29 @@
11
# CHANGELOG
22

33

4+
## v0.81.3 (2026-03-29)
5+
6+
### Bug Fixes
7+
8+
- Try local eval before slow /evaluate endpoint in evaluate_dense
9+
([#245](https://github.com/OpenAdaptAI/openadapt-evals/pull/245),
10+
[`3b8c1c2`](https://github.com/OpenAdaptAI/openadapt-evals/commit/3b8c1c2b6317a693fec2e97cf8aa459205f1be4d))
11+
12+
51% of TRL training time wasted on 5050 evaluate timeouts (180s × 3 retries = 9 min per evaluation).
13+
The local evaluation via evaluate_checks_local takes ~5s.
14+
15+
Fix: when task config has checks defined, try local eval FIRST. Only
16+
17+
fall through to the slow /evaluate endpoint when no local checks exist. This eliminates the 9-minute
18+
timeout for custom YAML tasks that define their own checks.
19+
20+
Before: evaluate() [9 min] → if 0.0 → local [5s]
21+
22+
After: local [5s] → if no checks → evaluate() [9 min]
23+
24+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
25+
26+
427
## v0.81.2 (2026-03-29)
528

629
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.81.2"
7+
version = "0.81.3"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)