Skip to content

Commit c295d19

Browse files
abrichrclaude
andcommitted
fix: correct is_action_valid logic, scroll_direction, stale refs, and DRY violation
Review fixes for the GPU training automation branch: - Fix is_action_valid: was inverted (DONE()→invalid, garbage→valid), now uses regex match on original action string - Fix scroll_direction: SCROLL parsing now populates BenchmarkAction.scroll_direction - Fix stale repo URLs: mll-lab-nu/VAGEN → RAGEN-AI/VAGEN across vendored files and docs - Fix stale branch ref: setup_gpu_training.sh referenced merged spike branch, now uses main - Fix stale repo URL: langfengQ/verl-agent → RAGEN-AI/VAGEN in setup script - Add --recurse-submodules to git clone (verl is a VAGEN submodule) - Remove dead params from register_waa_env() (waa_server, task_id, max_steps) - Deduplicate training command: vm_cli.py now delegates to launch_training() - Update test count in docs: 21 → 40+ - Add 3 new tests for is_action_valid behavior - Add scroll_direction assertion to existing scroll test All 43 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0d12255 commit c295d19

9 files changed

Lines changed: 70 additions & 67 deletions

File tree

docs/verl_agent_decision.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ loop (~546 lines in `openadapt_ml/training/grpo/trainer.py`).
138138

139139
### D. VAGEN / VAGEN-Lite
140140

141-
**Repository**: [mll-lab-nu/VAGEN](https://github.com/mll-lab-nu/VAGEN)
141+
**Repository**: [RAGEN-AI/VAGEN](https://github.com/RAGEN-AI/VAGEN)
142142

143143
- Built on verl's `agent_loop` abstraction (same ecosystem as verl-agent)
144144
- **Bi-Level GAE** for turn-aware credit assignment
@@ -299,7 +299,7 @@ scaling to multi-VM env pools.
299299
Works, well-tested (56 unit tests + 5 E2E tests). Episode-level rewards only.
300300

301301
2. **Spike complete**: `WAADesktopEnv` adapter in openadapt-evals (PR #84, merged).
302-
21 tests passing. Implements GymImageEnv protocol.
302+
40 tests passing. Implements GymImageEnv protocol.
303303

304304
3. **Next**: Test end-to-end with verl-agent on a GPU machine. If successful,
305305
the standalone trainer becomes a reference implementation / fallback, and
@@ -360,8 +360,8 @@ To validate the verl-agent integration provides real value over standalone:
360360

361361
## References
362362

363-
- [verl-agent](https://github.com/langfengQ/verl-agent)GiGPO paper implementation
364-
- [VAGEN](https://github.com/mll-lab-nu/VAGEN)Multi-turn VLM agent training
363+
- [VAGEN](https://github.com/RAGEN-AI/VAGEN)Multi-turn VLM agent training (GiGPO)
364+
- [verl-agent](https://github.com/langfengQ/verl-agent)GiGPO paper's original codebase
365365
- [verl](https://github.com/verl-project/verl) — Volcano Engine RL for LLMs
366366
- [GiGPO paper](https://arxiv.org/html/2505.10978) — Group-in-Group Policy Optimization
367367
- [VAGEN paper](https://arxiv.org/abs/2510.16907) — World Model Reasoning for VLM Agents

openadapt_evals/adapters/_vendored/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""Vendored pure-abstract base classes from VAGEN.
22
3-
These are copied from https://github.com/mll-lab-nu/VAGEN so that
3+
These are copied from https://github.com/RAGEN-AI/VAGEN so that
44
openadapt-evals can implement the GymImageEnv protocol without
55
requiring the full VAGEN package (and its heavy transitive
66
dependencies) to be installed.

openadapt_evals/adapters/_vendored/gym_base_env.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Vendored from https://github.com/mll-lab-nu/VAGEN
1+
# Vendored from https://github.com/RAGEN-AI/VAGEN
22
# These are pure abstract base classes with no heavy dependencies.
33
# Vendored to avoid requiring the full VAGEN installation.
44
# Last synced: 2026-03-02

openadapt_evals/adapters/_vendored/gym_image_env.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Vendored from https://github.com/mll-lab-nu/VAGEN
1+
# Vendored from https://github.com/RAGEN-AI/VAGEN
22
# These are pure abstract base classes with no heavy dependencies.
33
# Vendored to avoid requiring the full VAGEN installation.
44
# Last synced: 2026-03-02

openadapt_evals/adapters/verl_env.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ def _parse_action_str(action_str: str) -> BenchmarkAction:
118118
type="scroll",
119119
x=float(kwargs.get("x", 0.5)),
120120
y=float(kwargs.get("y", 0.5)),
121+
scroll_direction=kwargs.get("direction"),
121122
)
122123
elif cmd == "WAIT":
123124
return BenchmarkAction(type="wait")
@@ -292,7 +293,10 @@ async def step(
292293
# Compute reward
293294
reward = 0.0
294295
info: dict[str, Any] = rollout_step.info
295-
info["is_action_valid"] = action.type != "done" or action_str.strip() == ""
296+
# Action is valid if parsing succeeded. If parsing failed, _parse_action_str
297+
# returns type="done" as a fallback — detect this by checking whether the
298+
# original string actually contained a recognized action command.
299+
info["is_action_valid"] = bool(_ACTION_PATTERN.search(action_str))
296300

297301
if done and self._evaluate_at_done:
298302
try:

openadapt_evals/benchmarks/vm_cli.py

Lines changed: 18 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -7919,58 +7919,33 @@ def cmd_gpu_train(args):
79197919
print(f"ERROR: Setup failed")
79207920
return 1
79217921

7922-
# Launch training via the E2E script (handles data prep, env registration, config)
7923-
# Import here to avoid circular deps
7922+
# Delegate to the E2E script's launch_training() which handles:
7923+
# - env registry registration
7924+
# - training data preparation
7925+
# - training config generation
7926+
# - training launch
7927+
# This avoids duplicating the Hydra overrides in two places.
79247928
import sys
79257929
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "scripts"))
7926-
from train_verl_e2e import prepare_training_data, register_waa_env
7927-
7928-
print("Registering WAADesktopEnv in VAGEN env registry...")
7929-
register_waa_env(ip, args.waa_server, args.task_id, username=username)
7930-
7931-
print("Preparing training data...")
7932-
prepare_training_data(ip, group_size=8, username=username)
7933-
7934-
# Hydra overrides for verl training loop + VAGEN env config.
7935-
# WAADesktopEnv is registered in VAGEN's env_registry.yaml as 'WAADesktop'.
7936-
# The env connects to WAA server via HTTP (GymImageEnv protocol).
7937-
train_cmd = (
7938-
f"cd ~/verl-agent && "
7939-
f"conda run -n verl-agent python3 -m verl.trainer.main_ppo "
7940-
f"algorithm.adv_estimator={args.algorithm} "
7941-
f"algorithm.gamma={'0.95' if args.algorithm == 'gigpo' else '1.0'} "
7942-
f"actor_rollout_ref.model.path={args.model} "
7943-
f"actor_rollout_ref.rollout.name=vllm "
7944-
f"actor_rollout_ref.rollout.tensor_model_parallel_size={args.n_gpus} "
7945-
f"actor_rollout_ref.rollout.gpu_memory_utilization=0.6 "
7946-
f"actor_rollout_ref.rollout.enable_chunked_prefill=False "
7947-
f"actor_rollout_ref.actor.ppo_mini_batch_size=64 "
7948-
f"actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 "
7949-
f"data.train_files=$HOME/data/verl-agent/visual/train.parquet "
7950-
f"data.val_files=$HOME/data/verl-agent/visual/test.parquet "
7951-
f"data.train_batch_size=8 "
7952-
f"data.val_batch_size=128 "
7953-
f"data.max_prompt_length=2048 "
7954-
f"data.max_response_length=512 "
7955-
f"data.return_raw_chat=True "
7956-
f"data.filter_overlong_prompts=True "
7957-
f"trainer.n_gpus_per_node={args.n_gpus} "
7958-
f"trainer.nnodes=1 "
7959-
f"trainer.total_epochs={args.epochs} "
7960-
f"trainer.test_freq=5 "
7961-
f"trainer.experiment_name={args.algorithm}_waa_desktop "
7962-
f"trainer.logger=['console','wandb'] "
7963-
f"trainer.project_name=openadapt-waa-rl"
7964-
)
7930+
from train_verl_e2e import launch_training
79657931

79667932
print(f"Launching {args.algorithm} training on {args.n_gpus} GPU(s)...")
79677933
print(f"Model: {args.model}")
79687934
print(f"WAA server: {args.waa_server}")
79697935
print(f"Task: {args.task_id}")
79707936

79717937
try:
7972-
result = ssh_run(ip, train_cmd, username=username, stream=True)
7973-
return result.returncode
7938+
exit_code = launch_training(
7939+
ip=ip,
7940+
waa_server=args.waa_server,
7941+
task_id=args.task_id,
7942+
algorithm=args.algorithm,
7943+
model=args.model,
7944+
n_gpus=args.n_gpus,
7945+
epochs=args.epochs,
7946+
username=username,
7947+
)
7948+
return exit_code
79747949
finally:
79757950
if args.cleanup and not args.gpu_ip:
79767951
print(f"Deallocating GPU VM '{gpu_vm_name}'...")

scripts/setup_gpu_training.sh

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,11 @@ set -euo pipefail
2323
CONDA_ENV="verl-agent"
2424
VLLM_VERSION="0.11.0"
2525
FLASH_ATTN_VERSION="2.7.4.post1"
26-
VERL_AGENT_REPO="https://github.com/langfengQ/verl-agent.git"
26+
# VAGEN includes verl as a submodule + the GymImageEnv protocol, env registry,
27+
# and GymAgentLoop that we integrate with.
28+
VERL_AGENT_REPO="https://github.com/RAGEN-AI/VAGEN.git"
2729
OPENADAPT_EVALS_REPO="https://github.com/OpenAdaptAI/openadapt-evals.git"
28-
OPENADAPT_EVALS_BRANCH="spike/verl-agent-integration"
30+
OPENADAPT_EVALS_BRANCH="main"
2931

3032
log() { echo "=== [$(date '+%H:%M:%S')] $*"; }
3133

@@ -69,13 +71,13 @@ pip3 install "flash-attn==$FLASH_ATTN_VERSION" --no-build-isolation --no-cache-d
6971

7072
# --- Clone and install verl-agent ---
7173
if [ ! -d "$HOME/verl-agent" ]; then
72-
log "Cloning verl-agent..."
73-
git clone "$VERL_AGENT_REPO" "$HOME/verl-agent"
74+
log "Cloning VAGEN..."
75+
git clone --recurse-submodules "$VERL_AGENT_REPO" "$HOME/verl-agent"
7476
else
75-
log "verl-agent already cloned, pulling latest..."
76-
cd "$HOME/verl-agent" && git pull
77+
log "VAGEN already cloned, pulling latest..."
78+
cd "$HOME/verl-agent" && git pull && git submodule update --init --recursive
7779
fi
78-
log "Installing verl-agent..."
80+
log "Installing VAGEN..."
7981
cd "$HOME/verl-agent"
8082
pip install -e .
8183

@@ -103,10 +105,10 @@ python -c "from openadapt_evals.adapters.verl_env import WAADesktopEnv; print('W
103105

104106
log "Setup complete! Activate with: conda activate $CONDA_ENV"
105107
log ""
106-
log "To start training:"
107-
log " cd ~/verl-agent"
108-
log " python3 -m verl.trainer.main_ppo \\"
109-
log " algorithm.adv_estimator=gigpo \\"
110-
log " env.env_name=openadapt_evals.adapters.verl_env.WAADesktopEnv \\"
111-
log " trainer.n_gpus_per_node=$GPU_COUNT \\"
112-
log " ..."
108+
log "To start training, use the orchestration script:"
109+
log " python scripts/train_verl_e2e.py --gpu-ip \$(hostname -I | awk '{print \$1}') --task-id <WAA_UUID>"
110+
log ""
111+
log "Or via oa-vm CLI:"
112+
log " oa-vm gpu-train --gpu-ip \$(hostname -I | awk '{print \$1}') --task-id <WAA_UUID>"
113+
log ""
114+
log "GPU count: $GPU_COUNT"

scripts/train_verl_e2e.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ def prepare_training_data(ip: str, group_size: int = 8, username: str = "ubuntu"
166166
raise RuntimeError("Data preparation failed")
167167

168168

169-
def register_waa_env(ip: str, waa_server: str, task_id: str, max_steps: int = 15, username: str = "ubuntu"):
169+
def register_waa_env(ip: str, username: str = "ubuntu"):
170170
"""Register WAADesktopEnv in VAGEN's env registry on the GPU VM.
171171
172172
VAGEN uses a YAML registry (vagen/configs/env_registry.yaml) to dispatch
@@ -354,7 +354,7 @@ def launch_training(
354354
4. Launch training via VAGEN's entry point
355355
"""
356356
# Step 1: Register WAADesktopEnv in VAGEN's env registry
357-
register_waa_env(ip, waa_server, task_id, max_steps=max_turns, username=username)
357+
register_waa_env(ip, username=username)
358358

359359
# Step 2: Prepare parquet data files (required by VAGEN's AgenticDataset)
360360
prepare_training_data(ip, group_size=group_size, username=username)

tests/test_verl_env.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ def test_done(self):
5959
def test_scroll(self):
6060
action = _parse_action_str('SCROLL(x=0.50, y=0.50, direction="down")')
6161
assert action.type == "scroll"
62+
assert action.scroll_direction == "down"
6263

6364
def test_invalid_returns_done(self):
6465
action = _parse_action_str("random garbage text")
@@ -224,6 +225,27 @@ def test_protocol_has_required_methods(self):
224225
assert callable(env.close)
225226
assert callable(env.system_prompt)
226227

228+
def test_is_action_valid_on_good_action(self):
229+
"""Valid actions (parseable) should have is_action_valid=True."""
230+
env = _make_mock_env()
231+
asyncio.run(env.reset(seed=42))
232+
_, _, _, info = asyncio.run(env.step("CLICK(x=0.5, y=0.5)"))
233+
assert info["is_action_valid"] is True
234+
235+
def test_is_action_valid_on_done(self):
236+
"""Explicit DONE() should have is_action_valid=True."""
237+
env = _make_mock_env()
238+
asyncio.run(env.reset(seed=42))
239+
_, _, _, info = asyncio.run(env.step("DONE()"))
240+
assert info["is_action_valid"] is True
241+
242+
def test_is_action_valid_on_garbage(self):
243+
"""Unparseable actions should have is_action_valid=False."""
244+
env = _make_mock_env()
245+
asyncio.run(env.reset(seed=42))
246+
_, _, _, info = asyncio.run(env.step("random garbage text"))
247+
assert info["is_action_valid"] is False
248+
227249
def test_obs_contains_image_placeholder(self):
228250
"""Test that observations with screenshots include <image> placeholder."""
229251
env = _make_mock_env()

0 commit comments

Comments
 (0)