|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.18.0 (2026-03-02) |
| 5 | + |
| 6 | +### Features |
| 7 | + |
| 8 | +- Migrate annotation pipeline from openadapt-ml to openadapt-evals |
| 9 | + ([#64](https://github.com/OpenAdaptAI/openadapt-evals/pull/64), |
| 10 | + [`7896051`](https://github.com/OpenAdaptAI/openadapt-evals/commit/7896051e514aedd647faeba0383e4acba9bea5ab)) |
| 11 | + |
| 12 | +* feat: migrate annotation pipeline from openadapt-ml to openadapt-evals |
| 13 | + |
| 14 | +Move annotation data classes, prompts, and utilities into openadapt_evals.annotation and consolidate |
| 15 | + three separate VLM call implementations into a shared openadapt_evals.vlm module. |
| 16 | + |
| 17 | +- New openadapt_evals/vlm.py: unified vlm_call() supporting consilium council, OpenAI, and |
| 18 | + Anthropic; extract_json() for LLM output parsing; image_bytes_from_path() helper - New |
| 19 | + openadapt_evals/annotation.py: AnnotatedStep/AnnotatedDemo data classes, |
| 20 | + ANNOTATION_SYSTEM_PROMPT/ANNOTATION_STEP_PROMPT constants, parse_annotation_response(), |
| 21 | + validate_annotations(), format_annotated_demo() - Updated scripts/record_waa_demos.py |
| 22 | + cmd_annotate_waa() to import from openadapt_evals instead of openadapt_ml - Updated |
| 23 | + scripts/refine_demo.py to use shared vlm_call/extract_json, refactored message builders to |
| 24 | + prompt+images interface - Updated scripts/convert_recording_to_demo.py to use shared vlm_call - 16 |
| 25 | + new tests in tests/test_annotation.py, all existing tests pass |
| 26 | + |
| 27 | +Closes #59 |
| 28 | + |
| 29 | +Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
| 30 | + |
| 31 | +* fix: remove unused import and hoist model resolution in convert_recording_to_demo |
| 32 | + |
| 33 | +- Remove unused `import os` from openadapt_evals/vlm.py - Move `resolved_model` computation before |
| 34 | + the for-loop in convert_vlm() so it's computed once instead of redundantly inside each step's try |
| 35 | + block |
| 36 | + |
| 37 | +* fix: add timeouts, fix temperature regression, remove dead api_key param |
| 38 | + |
| 39 | +- vlm.py: add timeout=120s to OpenAI/Anthropic SDK clients to prevent indefinite hangs (old code had |
| 40 | + explicit timeouts via requests) - vlm.py: pass system prompt separately to consilium |
| 41 | + council_query() instead of concatenating into user prompt - refine_demo.py: explicitly pass |
| 42 | + temperature=1.0 to vlm_call() in holistic and per-step review to match old behavior (vlm_call |
| 43 | + defaults to 0.1 which would be an unintended behavioral change) - refine_demo.py: remove dead |
| 44 | + api_key parameter from run_holistic_review, run_per_step_review, refine_recording, and main() — |
| 45 | + vlm_call() reads API keys from environment via the SDK |
| 46 | + |
| 47 | +--------- |
| 48 | + |
| 49 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 50 | + |
| 51 | +### Refactoring |
| 52 | + |
| 53 | +- Deduplicate recording artifacts and use JPEG thumbnails |
| 54 | + ([#65](https://github.com/OpenAdaptAI/openadapt-evals/pull/65), |
| 55 | + [`f60df10`](https://github.com/OpenAdaptAI/openadapt-evals/commit/f60df10a56cea3031e254a4f573a8487dc73b5e3)) |
| 56 | + |
| 57 | +- Remove docs/artifacts/full/ (was a copy of waa_recordings/ PNGs) - Thumbnails now link to |
| 58 | + originals in waa_recordings/ for full-res - Switch thumbnails from PNG to JPEG (1.5 MB vs 3.0 MB |
| 59 | + for same images) - Un-gitignore waa_recordings/ (research data, should be tracked) - Gitignore |
| 60 | + docs/artifacts/full/ instead (regenerable) - Untrack benchmark_results/ (mock test output, already |
| 61 | + gitignored) - Move os import to module level in generate_demo_review.py |
| 62 | + |
| 63 | +Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
| 64 | + |
| 65 | + |
4 | 66 | ## v0.17.1 (2026-03-02) |
5 | 67 |
|
6 | 68 | ### Bug Fixes |
|
0 commit comments