Skip to content

Commit e2b6d0f

Browse files
abrichrclaude
andcommitted
feat: vendor GymImageEnv base classes from VAGEN
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4bb7b6f commit e2b6d0f

5 files changed

Lines changed: 317 additions & 19 deletions

File tree

docs/verl_agent_decision.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,9 @@ By delegating to verl-agent, we avoid building and maintaining:
234234

235235
3. **Next**: Test end-to-end with verl-agent on a GPU machine. If successful,
236236
the standalone trainer becomes a reference implementation / fallback, and
237-
verl-agent becomes the recommended training path.
237+
verl-agent becomes the recommended training path. **Note**: Both backends
238+
coexist — see the [Dual Backend Strategy](#dual-backend-strategy) section
239+
for the comparison plan and dependency approach.
238240

239241
4. **Future**: If TRL resolves #5120 (multi-turn VLM support), evaluate whether
240242
to switch. TRL has broader adoption; switching would reduce the dependency
@@ -243,6 +245,50 @@ By delegating to verl-agent, we avoid building and maintaining:
243245

244246
---
245247

248+
## Dual Backend Strategy
249+
250+
Rather than deprecating the standalone trainer immediately, we maintain both
251+
backends for comparison:
252+
253+
### Backend 1: Standalone (openadapt-ml)
254+
255+
- **Code**: `openadapt_ml/training/grpo/trainer.py` (~546 lines)
256+
- **When to use**: Quick experiments, single-GPU, no Ray/vLLM dependency
257+
- **Limitations**: Episode-level rewards only, no GiGPO, no distributed training
258+
- **Config**: `GRPOConfig(backend="standalone", ...)`
259+
260+
### Backend 2: verl-agent (openadapt-evals)
261+
262+
- **Code**: `openadapt_evals/adapters/verl_env.py` (~250 lines adapter)
263+
- **When to use**: Production training, multi-GPU, GiGPO per-step credit
264+
- **Advantages**: Distributed training, vLLM/sglang, step-level advantages
265+
- **Config**: `configs/train_waa_vagen.yaml`
266+
267+
### Dependency Strategy
268+
269+
The `GymImageEnv` and `GymBaseEnv` abstract base classes (~150 lines) are
270+
**vendored** into `openadapt_evals/adapters/_vendored/` to avoid requiring the
271+
full VAGEN installation. The vendored classes are pure interfaces with only a
272+
`Pillow` dependency. Import priority:
273+
274+
1. `from vagen.envs.gym_image_env import GymImageEnv` (if VAGEN installed)
275+
2. `from openadapt_evals.adapters._vendored.gym_image_env import GymImageEnv` (fallback)
276+
277+
The full VAGEN/verl-agent stack (Ray, vLLM, etc.) is only needed when actually
278+
running distributed training, not for defining or testing environments.
279+
280+
### Comparison Plan
281+
282+
To validate the verl-agent integration provides real value over standalone:
283+
284+
1. Train on the same WAA task with both backends
285+
2. Compare: final reward, training wall time, GPU memory usage
286+
3. Specifically measure whether GiGPO's per-step credit improves sample
287+
efficiency on long-horizon tasks (15+ steps)
288+
4. Document results in a comparison report
289+
290+
---
291+
246292
## References
247293

248294
- [verl-agent](https://github.com/langfengQ/verl-agent) — GiGPO paper implementation
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
"""Vendored pure-abstract base classes from VAGEN.
2+
3+
These are copied from https://github.com/mll-lab-nu/VAGEN so that
4+
openadapt-evals can implement the GymImageEnv protocol without
5+
requiring the full VAGEN package (and its heavy transitive
6+
dependencies) to be installed.
7+
8+
Only the abstract interface definitions are vendored here -- no
9+
concrete implementations or utilities.
10+
"""
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Vendored from https://github.com/mll-lab-nu/VAGEN
2+
# These are pure abstract base classes with no heavy dependencies.
3+
# Vendored to avoid requiring the full VAGEN installation.
4+
# Last synced: 2026-03-02
5+
6+
from __future__ import annotations
7+
8+
from abc import ABC, abstractmethod
9+
from typing import Any, Dict, Tuple
10+
11+
12+
class GymBaseEnv(ABC):
13+
"""
14+
Abstract async environment API.
15+
The handler does not assume any obs/data schema beyond what you return.
16+
17+
Contract:
18+
- reset(seed) -> (obs, info)
19+
- step(action_str) -> (obs, reward, done, info)
20+
"""
21+
22+
def __init__(self, env_config: Dict[str, Any]):
23+
self.config = env_config
24+
25+
@abstractmethod
26+
async def close(self) -> None:
27+
"""Async teardown."""
28+
raise NotImplementedError
29+
30+
@abstractmethod
31+
async def reset(self, seed: int):
32+
raise NotImplementedError
33+
34+
@abstractmethod
35+
async def step(self, action_str: str):
36+
raise NotImplementedError
37+
38+
@abstractmethod
39+
async def system_prompt(self):
40+
raise NotImplementedError
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Vendored from https://github.com/mll-lab-nu/VAGEN
2+
# These are pure abstract base classes with no heavy dependencies.
3+
# Vendored to avoid requiring the full VAGEN installation.
4+
# Last synced: 2026-03-02
5+
6+
from __future__ import annotations
7+
8+
from abc import abstractmethod
9+
from typing import Any, Dict, Tuple
10+
11+
from .gym_base_env import GymBaseEnv
12+
13+
14+
class GymImageEnv(GymBaseEnv):
15+
"""
16+
GymImageEnv is a base environment class that supports optional
17+
**image-based multi-modal observations**, while keeping the same API
18+
as GymBaseEnv.
19+
20+
--------------------------------------------------------------------
21+
Observation Protocol
22+
--------------------------------------------------------------------
23+
24+
WITH images
25+
-------------------------------
26+
If the environment returns images, the observation should follow:
27+
28+
obs = {
29+
"obs_str": "... <image> ...",
30+
"multi_modal_input": {
31+
"<image>": [PIL.Image.Image, ...]
32+
}
33+
}
34+
35+
- Images are stored under obs["multi_modal_input"]["<image>"].
36+
- "<image>" in obs_str is a placeholder indicating where each image
37+
should appear in the prompt.
38+
- The number of "<image>" in obs_str should match the number of
39+
images in the list.
40+
41+
WITHOUT images:
42+
----------------------------------
43+
Can simply use:
44+
45+
obs = {
46+
"obs_str": "..."
47+
}
48+
49+
- "multi_modal_input" is optional and may be omitted.
50+
- obs_str should NOT contain "<image>" placeholders.
51+
52+
53+
--------------------------------------------------------------------
54+
Agent-Loop Rollout
55+
--------------------------------------------------------------------
56+
- sys : system prompt (from system_prompt()).
57+
- init_obs : observation from reset().
58+
- step_obs : observation from step().
59+
- res_i : agent response at step i.
60+
61+
Concat mode (single growing context):
62+
sys + init_obs + res_0 + step_obs_1 + res_1 + ...
63+
64+
Non-concat mode (step-wise independent contexts):
65+
Step 0: sys + init_obs + res_0
66+
Step 1: sys + step_obs_1 + res_1
67+
Step 2: sys + step_obs_2 + res_2
68+
69+
--------------------------------------------------------------------
70+
Info
71+
--------------------------------------------------------------------
72+
The `info` dict returned by reset() and step() may include:
73+
- success (bool): whether the task/episode is considered
74+
successful, this will be used for wandb logging.
75+
"""
76+
77+
def __init__(self, env_config: Dict[str, Any]):
78+
"""
79+
Initialize the environment.
80+
81+
Args:
82+
env_config (Dict[str, Any]):
83+
Environment configuration. The exact schema is defined by
84+
the concrete environment implementation and/or GymBaseEnv.
85+
86+
Side effects:
87+
- Calls GymBaseEnv.__init__(env_config).
88+
"""
89+
super().__init__(env_config)
90+
91+
@abstractmethod
92+
async def close(self) -> None:
93+
"""
94+
Close the environment and release all resources.
95+
96+
This should clean up anything created by the environment, e.g.:
97+
- windows / renderers
98+
- subprocesses
99+
- file handles
100+
- GPU memory / models
101+
102+
Returns:
103+
None
104+
"""
105+
raise NotImplementedError
106+
107+
@abstractmethod
108+
async def system_prompt(self) -> Dict[str, Any]:
109+
"""
110+
Return the system-level prompt/observation for the environment.
111+
112+
Returns:
113+
obs (Dict[str, Any]):
114+
A dict representing the system prompt observation.
115+
116+
If returning images, it must follow:
117+
118+
obs = {
119+
"obs_str": "... <image> ...",
120+
"multi_modal_input": {
121+
"<image>": [PIL.Image.Image, ...]
122+
}
123+
}
124+
125+
If returning no images, it must follow:
126+
127+
obs = {
128+
"obs_str": "..."
129+
}
130+
"""
131+
raise NotImplementedError
132+
133+
@abstractmethod
134+
async def reset(self, seed: int) -> Tuple[Dict[str, Any], Dict[str, Any]]:
135+
"""
136+
Reset the environment to the initial state.
137+
138+
Args:
139+
seed (int):
140+
Random seed used to initialize the environment
141+
142+
Returns:
143+
obs (Dict[str, Any]):
144+
The initial observation after reset.
145+
146+
If returning images, it must follow:
147+
148+
obs = {
149+
"obs_str": "... <image> ...",
150+
"multi_modal_input": {
151+
"<image>": [PIL.Image.Image, ...]
152+
}
153+
}
154+
155+
If returning no images, it must follow:
156+
157+
obs = {
158+
"obs_str": "..."
159+
}
160+
161+
info (Dict[str, Any]):
162+
A dict containing any additional metadata about the reset,
163+
e.g. debug information, episode identifiers, etc.
164+
"""
165+
raise NotImplementedError
166+
167+
@abstractmethod
168+
async def step(
169+
self, action_str: str
170+
) -> Tuple[Dict[str, Any], float, bool, Dict[str, Any]]:
171+
"""
172+
Execute one environment step using an agent-provided action.
173+
174+
Args:
175+
action_str (str):
176+
The action produced by the agent, in text form.
177+
178+
Returns:
179+
obs (Dict[str, Any]):
180+
The next observation after applying the action.
181+
182+
If returning images, it must follow:
183+
184+
obs = {
185+
"obs_str": "... <image> ...",
186+
"multi_modal_input": {
187+
"<image>": [PIL.Image.Image, ...]
188+
}
189+
}
190+
191+
If returning no images, it must follow:
192+
193+
obs = {
194+
"obs_str": "..."
195+
}
196+
197+
reward (float):
198+
Scalar reward for the current step.
199+
200+
done (bool):
201+
Whether the current episode has terminated after this
202+
step.
203+
204+
info (Dict[str, Any]):
205+
Additional step-level metadata.
206+
207+
Common optional keys:
208+
- success (bool): whether the task/episode is
209+
considered successful, typically used for logging
210+
(e.g. wandb).
211+
"""
212+
raise NotImplementedError

openadapt_evals/adapters/verl_env.py

Lines changed: 8 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@
1010
1111
Dependencies:
1212
- openadapt-evals (always required)
13-
- vagen (optional; without it, the class implements the same protocol
14-
but doesn't inherit from GymImageEnv)
13+
- vagen (optional; a vendored copy of the GymImageEnv ABC is used
14+
as fallback when the full vagen package is not installed)
1515
1616
Usage with VAGEN training:
1717
Register in env_registry.yaml:
@@ -57,13 +57,14 @@
5757
except ImportError:
5858
Image = None # type: ignore[misc, assignment]
5959

60-
# Try importing VAGEN's base class for proper inheritance
60+
# Import VAGEN's GymImageEnv base class.
61+
# Prefer the real vagen package; fall back to our vendored copy.
6162
try:
6263
from vagen.envs.gym_image_env import GymImageEnv as _GymImageEnvBase
63-
64-
_HAS_VAGEN = True
6564
except ImportError:
66-
_HAS_VAGEN = False
65+
from openadapt_evals.adapters._vendored.gym_image_env import (
66+
GymImageEnv as _GymImageEnvBase,
67+
)
6768

6869
# --- Action parsing (matches openadapt-ml trainer DSL) ---
6970

@@ -171,19 +172,8 @@ def _build_obs_dict(
171172

172173
# --- Main environment class ---
173174

174-
# Dynamically choose base class
175-
if _HAS_VAGEN:
176-
_Base = _GymImageEnvBase
177-
else:
178-
179-
class _Base: # type: ignore[no-redef]
180-
"""Stub base when VAGEN is not installed."""
181-
182-
def __init__(self, env_config: dict[str, Any]) -> None:
183-
pass
184-
185175

186-
class WAADesktopEnv(_Base):
176+
class WAADesktopEnv(_GymImageEnvBase):
187177
"""VAGEN-compatible environment for WAA desktop automation.
188178
189179
Implements the GymImageEnv protocol (async reset/step/close/system_prompt)

0 commit comments

Comments
 (0)