Skip to content

Commit 153a67e

Browse files
abrichrclaude
andcommitted
fix: replace EnvironmentManagerBase with VAGEN registry-based env integration
The previous implementation incorrectly assumed verl-agent uses an EnvironmentManagerBase ABC with a hardcoded make_envs() dispatch. Research reveals VAGEN actually uses: - GymImageEnv protocol (which WAADesktopEnv already implements) - YAML-based env registry (vagen/configs/env_registry.yaml) - GymAgentLoop for training-time rollout orchestration Changes: - Replace patch_env_manager() with register_waa_env() (YAML registry) - Add register_in_vagen() and generate_env_spec() helpers to verl_env.py - Update launch_training() to generate proper VAGEN training config - Fix Integration Gap section in decision doc (no EnvironmentManagerBase) - Update training config YAML with architecture diagram - Add 5 new tests for registration helpers (40 total, all passing) - Export new helpers from adapters/__init__.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 826d8e8 commit 153a67e

7 files changed

Lines changed: 445 additions & 160 deletions

File tree

configs/train_waa_vagen.yaml

Lines changed: 35 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,35 @@
11
# VAGEN training config for WAA desktop automation
22
#
33
# This trains a VLM (e.g., Qwen2.5-VL-3B) to automate Windows desktop tasks
4-
# using GRPO/GiGPO via the verl-agent framework.
4+
# using GRPO/GiGPO via the VAGEN framework (verl-agent).
55
#
66
# Prerequisites:
77
# 1. WAA server running (via SSH tunnel): ssh -L 5001:localhost:5050 azureuser@<VM_IP>
8-
# 2. VAGEN installed: pip install vagen
9-
# 3. Register env: add to vagen's env_registry.yaml:
8+
# 2. VAGEN installed on GPU VM (see scripts/setup_gpu_training.sh)
9+
# 3. openadapt-evals installed on GPU VM (pip install openadapt-evals)
10+
# 4. Register env in VAGEN's env_registry.yaml:
1011
# WAADesktop: openadapt_evals.adapters.verl_env.WAADesktopEnv
12+
# (automated by: scripts/train_verl_e2e.py or oa-vm gpu-train)
13+
#
14+
# Architecture:
15+
# GPU VM CPU VM
16+
# ┌──────────────────────┐ ┌──────────────────┐
17+
# │ VAGEN / verl │ │ Docker │
18+
# │ GymAgentLoop │ HTTP │ QEMU (Win 11) │
19+
# │ WAADesktopEnv ─────│───────────>│ WAA Flask API │
20+
# │ GiGPO/GRPO trainer │ │ │
21+
# │ vLLM inference │ │ │
22+
# └──────────────────────┘ └──────────────────┘
1123
#
1224
# Usage:
13-
# python -m vagen.train --config configs/train_waa_vagen.yaml
25+
# # Via orchestration script (recommended):
26+
# python scripts/train_verl_e2e.py --cloud aws --task-id <UUID>
27+
#
28+
# # Via CLI:
29+
# oa-vm gpu-train --cloud aws --task-id <UUID>
1430
#
1531
# For mock testing (no VM):
16-
# Set server_url to "mock" and use WAAMockAdapter internally
32+
# Set server_url to "mock" in env config
1733

1834
# --- Model ---
1935
model:
@@ -26,11 +42,14 @@ model:
2642
# target_modules: [q_proj, k_proj, v_proj, o_proj]
2743

2844
# --- Environment ---
45+
# VAGEN loads envs from env_registry.yaml using these specs.
46+
# WAADesktopEnv implements GymImageEnv (async reset/step/close/system_prompt).
47+
# Each env instance connects to the WAA server independently via HTTP.
2948
envs:
3049
- name: WAADesktop
3150
n_envs: 8 # Number of parallel environments (= GRPO group size)
3251
data_source: waa
33-
seed: [1, 100, 1] # [start, end, step] for task selection
52+
seed: [1, 100, 1] # [start, end, step] for deterministic seeding
3453
max_turns: 15 # Max actions per episode
3554
response_length_per_turn: 512
3655
config:
@@ -40,27 +59,27 @@ envs:
4059
evaluate_at_done: true
4160
action_type: fractional # VLM outputs normalized 0-1 coordinates
4261

43-
# --- Training (GRPO) ---
62+
# --- Training (GRPO/GiGPO) ---
4463
algorithm:
4564
name: grpo # or "gigpo" for step-level advantages
4665
kl_coef: 0.0 # No KL penalty (DAPO/Open-Reasoner-Zero style)
47-
epsilon: 0.2 # PPO clip range (inactive with single epoch)
48-
gamma: 1.0 # No discounting for episodic tasks
66+
epsilon: 0.2 # PPO clip range
67+
gamma: 1.0 # No discounting for episodic tasks (use 0.95 for gigpo)
4968

5069
trainer:
5170
total_epochs: 100
5271
n_gpus_per_node: 2 # Minimum for VLM training
5372
micro_batch_size: 4
5473
gradient_accumulation_steps: 2
74+
test_freq: 5 # Evaluate every N epochs
75+
experiment_name: grpo_waa_desktop
76+
project_name: openadapt-waa-rl
77+
logger:
78+
- console
79+
- wandb
5580

5681
# --- Rollout ---
5782
rollout:
5883
temperature: 0.7
5984
top_p: 0.95
60-
mode: async # async sglang rollout for throughput
61-
62-
# --- Logging ---
63-
logging:
64-
project: openadapt-waa-rl
65-
log_interval: 1
66-
save_interval: 10
85+
mode: async # Async sglang rollout for throughput

docs/verl_agent_decision.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -250,28 +250,46 @@ By delegating to verl-agent, we avoid building and maintaining:
250250

251251
---
252252

253-
## Integration Gap: verl-agent Environment Protocol
253+
## Integration: VAGEN Environment Registry
254254

255255
Our `WAADesktopEnv` implements VAGEN's `GymImageEnv` protocol (async
256-
`reset`/`step`/`close`). However, verl-agent uses a **different, synchronous
257-
environment protocol** (`EnvironmentManagerBase`) with a **hardcoded dispatch**
258-
in `agent_system/environments/env_manager.py` — you cannot pass a Python class
259-
path as `env.env_name`.
260-
261-
To integrate with verl-agent, we need to:
262-
263-
1. **Patch `make_envs()`** — add an `elif "waa" in config.env.env_name.lower()`
264-
branch (automated by `scripts/train_verl_e2e.py`)
265-
2. **Implement `EnvironmentManagerBase`** — wraps our async `WAADesktopEnv` in
266-
verl-agent's sync vectorized env interface (`reset`, `step`, `build_text_obs`,
267-
`success_evaluator`)
268-
3. **Prepare parquet data** — verl-agent requires `data.train_files` and
269-
`data.val_files` even for env-based training
270-
4. **Use env-specific config**`env.waa.server_url` instead of `env.env_kwargs`
271-
272-
The `GymImageEnv` protocol remains our **portable interface**. The verl-agent
273-
`EnvironmentManagerBase` adapter is a thin sync wrapper around it. If we switch
274-
to a different framework, only the wrapper changes.
256+
`reset`/`step`/`close`/`system_prompt`), which is the **native environment
257+
interface** for VAGEN. No additional adapter is needed.
258+
259+
**Note**: Earlier analysis referenced an `EnvironmentManagerBase` ABC and a
260+
`make_envs()` dispatch function. These do not exist in the current VAGEN
261+
codebase. The actual architecture uses:
262+
263+
- `GymBaseEnv``GymImageEnv` — the environment ABC (which we implement)
264+
- `vagen/envs/registry.py` — YAML-based env registry for dispatch
265+
- `GymAgentLoop` — training-time rollout orchestrator that instantiates envs
266+
267+
Integration steps (automated by `scripts/train_verl_e2e.py`):
268+
269+
1. **Register in VAGEN's env registry** — add `WAADesktop:
270+
openadapt_evals.adapters.verl_env.WAADesktopEnv` to
271+
`vagen/configs/env_registry.yaml`. This is the only configuration needed.
272+
2. **Prepare parquet data** — VAGEN's `AgenticDataset` requires train/val
273+
parquet files even for env-based training
274+
3. **Configure training** — provide env spec (server URL, task ID, max turns)
275+
via the VAGEN training YAML (see `configs/train_waa_vagen.yaml`)
276+
277+
The `GymImageEnv` protocol is our **portable interface**. If we switch to a
278+
different framework, only the ~250-line `WAADesktopEnv` adapter changes. The
279+
environment, evaluation, and infrastructure code remain untouched.
280+
281+
### VAGEN Remote Env Pattern (Optional)
282+
283+
For production deployments where the WAA VM and GPU VM have poor connectivity,
284+
VAGEN provides a remote env service pattern:
285+
286+
- **Server** (WAA VM): `BaseGymHandler` + `build_gym_service()` → FastAPI
287+
- **Client** (GPU VM): `GymImageEnvClient` (registered as `RemoteEnv`)
288+
289+
This adds HTTP session management, multipart encoding (JSON + images), and
290+
automatic retry/failover. Currently unnecessary since `WAADesktopEnv` already
291+
handles remote connectivity via the WAA Flask API, but documented for future
292+
scaling to multi-VM env pools.
275293

276294
---
277295

openadapt_evals/adapters/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,11 @@
3939
RLEnvironment,
4040
RolloutStep,
4141
)
42-
from openadapt_evals.adapters.verl_env import WAADesktopEnv
42+
from openadapt_evals.adapters.verl_env import (
43+
WAADesktopEnv,
44+
generate_env_spec,
45+
register_in_vagen,
46+
)
4347
from openadapt_evals.adapters.waa import (
4448
WAAAdapter,
4549
WAAConfig,
@@ -72,6 +76,8 @@
7276
"WAALiveConfig",
7377
# verl-agent / VAGEN integration
7478
"WAADesktopEnv",
79+
"register_in_vagen",
80+
"generate_env_spec",
7581
# Task ID validation
7682
"SyntheticTaskError",
7783
"is_real_waa_task_id",

openadapt_evals/adapters/verl_env.py

Lines changed: 114 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,15 @@
1414
as fallback when the full vagen package is not installed)
1515
1616
Usage with VAGEN training:
17-
Register in env_registry.yaml:
17+
1. Register the env on the GPU VM:
18+
from openadapt_evals.adapters.verl_env import register_in_vagen
19+
register_in_vagen("/path/to/vagen/configs/env_registry.yaml")
20+
21+
2. Or manually add to vagen/configs/env_registry.yaml:
1822
env_registry:
1923
WAADesktop: openadapt_evals.adapters.verl_env.WAADesktopEnv
2024
21-
Training config:
25+
3. Training config (envs section):
2226
envs:
2327
- name: WAADesktop
2428
n_envs: 8
@@ -44,6 +48,7 @@
4448
import io
4549
import logging
4650
import re
51+
from pathlib import Path
4752
from typing import Any
4853

4954
from openadapt_evals.adapters.base import BenchmarkAction, BenchmarkObservation
@@ -354,3 +359,110 @@ async def step(
354359
)
355360

356361
return obs_dict, reward, done, info
362+
363+
364+
# --- VAGEN registration helpers ---
365+
366+
ENV_REGISTRY_KEY = "WAADesktop"
367+
ENV_CLASS_PATH = "openadapt_evals.adapters.verl_env.WAADesktopEnv"
368+
369+
370+
def register_in_vagen(registry_yaml_path: str | Path | None = None) -> bool:
371+
"""Register WAADesktopEnv in VAGEN's env_registry.yaml.
372+
373+
If VAGEN is installed and importable, also registers via the Python API.
374+
Otherwise, edits the YAML file directly.
375+
376+
Args:
377+
registry_yaml_path: Path to vagen/configs/env_registry.yaml.
378+
If None, attempts to find it from the vagen package location.
379+
380+
Returns:
381+
True if registration succeeded.
382+
"""
383+
# Try programmatic registration first (if vagen is installed)
384+
try:
385+
from vagen.envs.registry import register_env
386+
387+
register_env(ENV_REGISTRY_KEY, WAADesktopEnv)
388+
logger.info("Registered %s in VAGEN env registry (Python API)", ENV_REGISTRY_KEY)
389+
return True
390+
except ImportError:
391+
pass
392+
393+
# Fall back to YAML file editing
394+
if registry_yaml_path is None:
395+
# Try to find it from common locations
396+
candidates = [
397+
Path.home() / "verl-agent" / "vagen" / "configs" / "env_registry.yaml",
398+
Path.home() / "VAGEN" / "vagen" / "configs" / "env_registry.yaml",
399+
]
400+
for candidate in candidates:
401+
if candidate.exists():
402+
registry_yaml_path = candidate
403+
break
404+
405+
if registry_yaml_path is not None:
406+
registry_yaml_path = Path(registry_yaml_path)
407+
if not registry_yaml_path.exists():
408+
logger.warning("Registry file not found: %s", registry_yaml_path)
409+
return False
410+
411+
if registry_yaml_path is None:
412+
logger.warning(
413+
"Cannot find env_registry.yaml. Register manually by adding:\n"
414+
" %s: %s\n"
415+
"to vagen/configs/env_registry.yaml",
416+
ENV_REGISTRY_KEY,
417+
ENV_CLASS_PATH,
418+
)
419+
return False
420+
421+
content = registry_yaml_path.read_text()
422+
423+
if ENV_REGISTRY_KEY in content:
424+
logger.info("%s already registered in %s", ENV_REGISTRY_KEY, registry_yaml_path)
425+
return True
426+
427+
# Add entry to the env_registry section
428+
entry = f" {ENV_REGISTRY_KEY}: {ENV_CLASS_PATH}\n"
429+
if "env_registry:" in content:
430+
content = content.replace("env_registry:\n", f"env_registry:\n{entry}", 1)
431+
else:
432+
content += f"\nenv_registry:\n{entry}"
433+
434+
registry_yaml_path.write_text(content)
435+
logger.info("Registered %s in %s", ENV_REGISTRY_KEY, registry_yaml_path)
436+
return True
437+
438+
439+
def generate_env_spec(
440+
server_url: str = "http://localhost:5001",
441+
task_id: str = "REPLACE_WITH_WAA_TASK_UUID",
442+
n_envs: int = 8,
443+
max_turns: int = 15,
444+
) -> dict[str, Any]:
445+
"""Generate a VAGEN EnvSpec dict for WAA desktop training.
446+
447+
Returns a dict suitable for inclusion in a VAGEN training config YAML
448+
under the ``envs`` key.
449+
450+
Example:
451+
spec = generate_env_spec(server_url="http://10.0.0.5:5001", task_id="abc-123")
452+
# Write to YAML or pass to AgenticDataset
453+
"""
454+
return {
455+
"name": ENV_REGISTRY_KEY,
456+
"n_envs": n_envs,
457+
"data_source": "waa",
458+
"seed": [1, 100, 1],
459+
"max_turns": max_turns,
460+
"response_length_per_turn": 512,
461+
"config": {
462+
"server_url": server_url,
463+
"task_id": task_id,
464+
"max_steps": max_turns,
465+
"evaluate_at_done": True,
466+
"action_type": "fractional",
467+
},
468+
}

openadapt_evals/benchmarks/vm_cli.py

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7919,36 +7919,33 @@ def cmd_gpu_train(args):
79197919
print(f"ERROR: Setup failed")
79207920
return 1
79217921

7922-
# Launch training via the E2E script (handles data prep, env patching, config)
7922+
# Launch training via the E2E script (handles data prep, env registration, config)
79237923
# Import here to avoid circular deps
79247924
import sys
79257925
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "scripts"))
7926-
from train_verl_e2e import prepare_training_data, patch_env_manager
7926+
from train_verl_e2e import prepare_training_data, register_waa_env
7927+
7928+
print("Registering WAADesktopEnv in VAGEN env registry...")
7929+
register_waa_env(ip, args.waa_server, args.task_id, username=username)
79277930

79287931
print("Preparing training data...")
79297932
prepare_training_data(ip, group_size=8, username=username)
79307933

7931-
print("Patching verl-agent for WAA environment...")
7932-
patch_env_manager(ip, args.waa_server, args.task_id, username=username)
7933-
7934-
# Validated Hydra config (see docs/verl_agent_decision.md)
7934+
# Hydra overrides for verl training loop + VAGEN env config.
7935+
# WAADesktopEnv is registered in VAGEN's env_registry.yaml as 'WAADesktop'.
7936+
# The env connects to WAA server via HTTP (GymImageEnv protocol).
79357937
train_cmd = (
79367938
f"cd ~/verl-agent && "
79377939
f"conda run -n verl-agent python3 -m verl.trainer.main_ppo "
79387940
f"algorithm.adv_estimator={args.algorithm} "
7939-
f"algorithm.gamma=0.95 "
7941+
f"algorithm.gamma={'0.95' if args.algorithm == 'gigpo' else '1.0'} "
79407942
f"actor_rollout_ref.model.path={args.model} "
79417943
f"actor_rollout_ref.rollout.name=vllm "
79427944
f"actor_rollout_ref.rollout.tensor_model_parallel_size={args.n_gpus} "
79437945
f"actor_rollout_ref.rollout.gpu_memory_utilization=0.6 "
79447946
f"actor_rollout_ref.rollout.enable_chunked_prefill=False "
79457947
f"actor_rollout_ref.actor.ppo_mini_batch_size=64 "
79467948
f"actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 "
7947-
f"env.env_name=waa_desktop "
7948-
f"env.max_steps=15 "
7949-
f"env.rollout.n=8 "
7950-
f"env.waa.server_url={args.waa_server} "
7951-
f"env.waa.task_id={args.task_id} "
79527949
f"data.train_files=$HOME/data/verl-agent/visual/train.parquet "
79537950
f"data.val_files=$HOME/data/verl-agent/visual/test.parquet "
79547951
f"data.train_batch_size=8 "

0 commit comments

Comments
 (0)