|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.24.0 (2026-03-03) |
| 5 | + |
| 6 | +### Documentation |
| 7 | + |
| 8 | +- Document AWS SSO as recommended auth method |
| 9 | + ([#80](https://github.com/OpenAdaptAI/openadapt-evals/pull/80), |
| 10 | + [`8812e7c`](https://github.com/OpenAdaptAI/openadapt-evals/commit/8812e7c69294d9f80c3cde723fa8838b02cad550)) |
| 11 | + |
| 12 | +- Update README: replace static key instructions with SSO guide including example ~/.aws/config and |
| 13 | + aws configure sso workflow - Update CLAUDE.md AWS section with SSO note - Update aws_vm.py |
| 14 | + docstring to include SSO in credential chain |
| 15 | + |
| 16 | +No code changes needed — boto3's default credential chain already handles SSO transparently. |
| 17 | + |
| 18 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 19 | + |
| 20 | +- Update README with recent features from PRs #58-#75 |
| 21 | + ([#82](https://github.com/OpenAdaptAI/openadapt-evals/pull/82), |
| 22 | + [`840f9ef`](https://github.com/OpenAdaptAI/openadapt-evals/commit/840f9efcdb7561fdad43bf80f6c87e0483443f2d)) |
| 23 | + |
| 24 | +Add coverage for RL training environment, end-to-end eval pipeline, annotation pipeline, 4-layer |
| 25 | + probe diagnostics, demo recording persistence, review artifacts, coordinate clamping, and |
| 26 | + multi-cloud VMProvider protocol. Update architecture tree with new modules (rl_env.py, probe.py, |
| 27 | + annotation.py, vlm.py, vm_provider.py, evaluation/) and scripts directory. Add openadapt-consilium |
| 28 | + to related projects. |
| 29 | + |
| 30 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 31 | + |
| 32 | +### Features |
| 33 | + |
| 34 | +- Add self-contained GRPO training example script |
| 35 | + ([#81](https://github.com/OpenAdaptAI/openadapt-evals/pull/81), |
| 36 | + [`97c144b`](https://github.com/OpenAdaptAI/openadapt-evals/commit/97c144bbd346292eaa6c0a8b4ef5d3185868387d)) |
| 37 | + |
| 38 | +* feat: add self-contained GRPO training example script |
| 39 | + |
| 40 | +250-line example showing the full RL training loop: model loading → rollout collection → GRPO loss → |
| 41 | + weight update → checkpoint. |
| 42 | + |
| 43 | +No openadapt-ml dependency — all GRPO math, action parsing, and log-prob computation are inline. |
| 44 | + Uses RLEnvironment from openadapt-evals. |
| 45 | + |
| 46 | +Includes --mock flag for testing without a VM. |
| 47 | + |
| 48 | +Usage: python scripts/train_grpo_example.py --mock --num-steps 3 python |
| 49 | + scripts/train_grpo_example.py --server http://localhost:5001 --task-id <UUID> |
| 50 | + |
| 51 | +Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
| 52 | + |
| 53 | +* fix: align GRPO training example with openadapt-ml trainer |
| 54 | + |
| 55 | +- Align SYSTEM_PROMPT with openadapt_ml.datasets.next_action.SYSTEM_PROMPT - Use chat template for |
| 56 | + prompt construction (not raw string concatenation) - Fix screen height default: 1080 (was 1200) - |
| 57 | + Fix LoRA target_modules: 4 projections (was 2) matching ml trainer - Fix coordinate fallback: use |
| 58 | + format_action_as_text with normalized fractions (was using raw pixel coords like x=960) - Add |
| 59 | + WAIT() handler in parse_action (was falling through to DONE) - Fix TYPE regex to handle escaped |
| 60 | + quotes and backslashes - Fix loss scaling: divide by (n_valid * num_steps) matching ml trainer - |
| 61 | + Rename grpo_loss to policy_gradient_loss with honest docstring - Add build_agent_messages and |
| 62 | + format_action_as_text helper functions |
| 63 | + |
| 64 | +--------- |
| 65 | + |
| 66 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 67 | + |
| 68 | + |
4 | 69 | ## v0.23.1 (2026-03-03) |
5 | 70 |
|
6 | 71 | ### Bug Fixes |
|
0 commit comments