Skip to content

Commit 13ad511

Browse files
author
semantic-release
committed
chore: release 0.82.0
1 parent fa26d55 commit 13ad511

2 files changed

Lines changed: 43 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,48 @@
11
# CHANGELOG
22

33

4+
## v0.82.0 (2026-03-29)
5+
6+
### Features
7+
8+
- Vlmmodelwrapper — multimodal compatibility layer for TRL
9+
([#251](https://github.com/OpenAdaptAI/openadapt-evals/pull/251),
10+
[`fa26d55`](https://github.com/OpenAdaptAI/openadapt-evals/commit/fa26d553d05e6bcae1f41191d52375331888095e))
11+
12+
* feat: VLMModelWrapper — multimodal compatibility layer for TRL
13+
14+
TRL's GRPOTrainer calls model.forward(input_ids=...) during training without pixel_values. VLMs need
15+
pixel_values to produce meaningful logits. Without them, the model is blind and generates garbage.
16+
17+
VLMModelWrapper caches vision tensors during rollout generation (when we have the images) and
18+
injects them during TRL's forward pass. This is the standard adapter pattern — 120 lines, no TRL
19+
internals modified.
20+
21+
- vlm_wrapper.py: VLMModelWrapper with cache_vision_inputs + forward - trl_wrapper.py: wraps model
22+
before passing to GRPOTrainer - trl_rollout.py: calls cache_vision_inputs before model.generate -
23+
9 tests covering injection, delegation, cache behavior, warnings
24+
25+
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
26+
27+
* test: add e2e tests for VLM+TRL pipeline and wrapper integration
28+
29+
5 e2e tests (@pytest.mark.heavy, CPU-only, skipped in CI): - test_generation_sees_pixel_values:
30+
model not blind during rollout - test_trl_forward_gets_cached_pixel_values: wrapper injects into
31+
TRL - test_output_format_not_garbage: prompt has DSL format guidance -
32+
test_no_thinking_tokens_in_template: no <think> in chat template - test_vision_changes_logits:
33+
pixel_values actually affect logits
34+
35+
2 integration tests (light, runs in CI): - test_wrapper_used_in_train_source: VLMModelWrapper in
36+
trl_wrapper - test_generate_fn_calls_cache_vision_inputs: cache call in rollout
37+
38+
Each test maps to a bug class from the March 29 session. Together they prevent the entire class of
39+
multimodal TRL failures before they reach the customer.
40+
41+
---------
42+
43+
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
44+
45+
446
## v0.81.9 (2026-03-29)
547

648
### Bug Fixes

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "openadapt-evals"
7-
version = "0.81.9"
7+
version = "0.82.0"
88
description = "Evaluation infrastructure for GUI agent benchmarks"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)