Skip to content

Commit 4f3cec0

Browse files
abrichrclaude
andcommitted
feat: add VAGEN/verl-agent environment adapter for VLM RL training
WAADesktopEnv implements the GymImageEnv protocol from VAGEN, enabling desktop GUI automation training with verl-agent's multi-turn VLM RL pipeline (GiGPO, GRPO, PPO). The adapter translates between openadapt-evals BenchmarkObservation (PNG bytes + a11y tree) and VAGEN's observation format (obs_str + multi_modal_input with PIL images). - Async interface (reset/step/close/system_prompt) - Action DSL parsing (CLICK, TYPE, KEY, SCROLL, WAIT, DONE) - Fractional coordinate support (0.0-1.0) - Lazy adapter initialization - 21 tests passing with mock adapter - Example VAGEN training config included Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4896b65 commit 4f3cec0

4 files changed

Lines changed: 616 additions & 0 deletions

File tree

configs/train_waa_vagen.yaml

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# VAGEN training config for WAA desktop automation
2+
#
3+
# This trains a VLM (e.g., Qwen2.5-VL-3B) to automate Windows desktop tasks
4+
# using GRPO/GiGPO via the verl-agent framework.
5+
#
6+
# Prerequisites:
7+
# 1. WAA server running (via SSH tunnel): ssh -L 5001:localhost:5050 azureuser@<VM_IP>
8+
# 2. VAGEN installed: pip install vagen
9+
# 3. Register env: add to vagen's env_registry.yaml:
10+
# WAADesktop: openadapt_evals.adapters.verl_env.WAADesktopEnv
11+
#
12+
# Usage:
13+
# python -m vagen.train --config configs/train_waa_vagen.yaml
14+
#
15+
# For mock testing (no VM):
16+
# Set server_url to "mock" and use WAAMockAdapter internally
17+
18+
# --- Model ---
19+
model:
20+
name: Qwen/Qwen2.5-VL-3B-Instruct
21+
# For larger models with LoRA:
22+
# name: Qwen/Qwen2.5-VL-7B-Instruct
23+
# lora:
24+
# r: 16
25+
# alpha: 32
26+
# target_modules: [q_proj, k_proj, v_proj, o_proj]
27+
28+
# --- Environment ---
29+
envs:
30+
- name: WAADesktop
31+
n_envs: 8 # Number of parallel environments (= GRPO group size)
32+
data_source: waa
33+
seed: [1, 100, 1] # [start, end, step] for task selection
34+
max_turns: 15 # Max actions per episode
35+
response_length_per_turn: 512
36+
config:
37+
server_url: "http://localhost:5001"
38+
task_id: "REPLACE_WITH_WAA_TASK_UUID"
39+
max_steps: 15
40+
evaluate_at_done: true
41+
action_type: fractional # VLM outputs normalized 0-1 coordinates
42+
43+
# --- Training (GRPO) ---
44+
algorithm:
45+
name: grpo # or "gigpo" for step-level advantages
46+
kl_coef: 0.0 # No KL penalty (DAPO/Open-Reasoner-Zero style)
47+
epsilon: 0.2 # PPO clip range (inactive with single epoch)
48+
gamma: 1.0 # No discounting for episodic tasks
49+
50+
trainer:
51+
total_epochs: 100
52+
n_gpus_per_node: 2 # Minimum for VLM training
53+
micro_batch_size: 4
54+
gradient_accumulation_steps: 2
55+
56+
# --- Rollout ---
57+
rollout:
58+
temperature: 0.7
59+
top_p: 0.95
60+
mode: async # async sglang rollout for throughput
61+
62+
# --- Logging ---
63+
logging:
64+
project: openadapt-waa-rl
65+
log_interval: 1
66+
save_interval: 10

openadapt_evals/adapters/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
RLEnvironment,
4040
RolloutStep,
4141
)
42+
from openadapt_evals.adapters.verl_env import WAADesktopEnv
4243
from openadapt_evals.adapters.waa import (
4344
WAAAdapter,
4445
WAAConfig,
@@ -69,6 +70,8 @@
6970
"WAAMockAdapter",
7071
"WAALiveAdapter",
7172
"WAALiveConfig",
73+
# verl-agent / VAGEN integration
74+
"WAADesktopEnv",
7275
# Task ID validation
7376
"SyntheticTaskError",
7477
"is_real_waa_task_id",

0 commit comments

Comments
 (0)