Skip to content

Commit 855dba1

Browse files
author
semantic-release
committed
chore: release 0.20.0
1 parent 47a8168 commit 855dba1

2 files changed

Lines changed: 51 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,56 @@
11
# CHANGELOG
22

33

4+
## v0.20.0 (2026-03-02)
5+
6+
### Features
7+
8+
- Add smoke-test-aws CLI command with full lifecycle test
9+
([#72](https://github.com/OpenAdaptAI/openadapt-evals/pull/72),
10+
[`47a8168`](https://github.com/OpenAdaptAI/openadapt-evals/commit/47a8168b7801080352fa17ba11566dee48c78539))
11+
12+
* feat: add `smoke-test-aws` CLI command with full lifecycle test
13+
14+
Add `oa-vm smoke-test-aws` command that runs incremental verification stages against real AWS
15+
infrastructure:
16+
17+
Read-only stages (default): 1. AWS credentials (STS get_caller_identity) 2. SSH public key
18+
(~/.ssh/id_rsa.pub) 3. AMI lookup (latest Ubuntu 22.04 LTS) 4. Instance type availability
19+
(find_available_size_and_region) 5. VPC infrastructure (ensure_vpc_infrastructure)
20+
21+
Full lifecycle stages (--full): 6. Create VM (m5a.xlarge, $0.17/hr) 7. SSH connectivity
22+
(wait_for_ssh + hostname) 8. Stop/Start cycle (deallocate -> start -> verify IP refresh) 9.
23+
Cleanup (delete -> verify terminated)
24+
25+
Also fixes two bugs in AWSVMManager discovered during testing: - deallocate_vm: now waits for
26+
'stopped' state before returning (previously returned immediately, causing start_vm to fail with
27+
IncorrectInstanceState) - delete_vm: now waits for 'terminated' state before returning (previously
28+
returned immediately, so callers couldn't verify termination)
29+
30+
Tested: 9/9 stages passed on real AWS (us-east-1, ~1m42s total).
31+
32+
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
33+
34+
* docs: add screenshots of Windows 11 running on AWS EC2
35+
36+
Screenshots captured from m5.metal instance in us-east-1: - aws-waa-installing.png: Windows 11
37+
installer at 42% on EC2 - aws-waa-windows-desktop.png: Full Windows 11 desktop with Start menu
38+
39+
Proves the full WAA stack works on AWS: EC2 m5.metal → Docker → QEMU/KVM → Windows 11 with all
40+
benchmark apps (Notepad, Calculator, Settings, Edge, etc.)
41+
42+
* chore: sync beads state
43+
44+
* docs: add AWS support section with cost analysis to CLAUDE.md
45+
46+
Documents AWS workflow (smoke-test-aws, pool commands with --cloud aws), m5.metal cost breakdown per
47+
phase, and references the Windows 11 screenshot.
48+
49+
---------
50+
51+
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
52+
53+
454
## v0.19.2 (2026-03-02)
555

656
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.19.2"
7+
version = "0.20.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)