Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 41 additions & 18 deletions configs/train_waa_vagen.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,38 @@
# VAGEN training config for WAA desktop automation
#
# This trains a VLM (e.g., Qwen2.5-VL-3B) to automate Windows desktop tasks
# using GRPO/GiGPO via the verl-agent framework.
# using GRPO/GiGPO via the VAGEN framework (verl-agent).
#
# Prerequisites:
# 1. WAA server running (via SSH tunnel): ssh -L 5001:localhost:5050 azureuser@<VM_IP>
# 2. VAGEN installed: pip install vagen
# 3. Register env: add to vagen's env_registry.yaml:
# 1. WAA server reachable (via SSH tunnel if needed):
# ssh -N -L 5000:localhost:5000 -L 5001:localhost:5051 azureuser@<VM_IP>
# Port 5000: WAA Flask API (/screenshot, /execute_windows)
# Port 5001: evaluate_server (/setup, /evaluate) via socat bridge from container :5050
# 2. VAGEN installed on GPU VM (see scripts/setup_gpu_training.sh)
# 3. openadapt-evals installed on GPU VM (pip install openadapt-evals)
# 4. Register env in VAGEN's env_registry.yaml:
# WAADesktop: openadapt_evals.adapters.verl_env.WAADesktopEnv
# (automated by: scripts/train_verl_e2e.py or oa-vm gpu-train)
#
# Architecture:
# GPU VM CPU VM
# ┌──────────────────────┐ ┌──────────────────┐
# │ VAGEN / verl │ │ Docker │
# │ GymAgentLoop │ HTTP │ QEMU (Win 11) │
# │ WAADesktopEnv ─────│───────────>│ WAA Flask API │
# │ GiGPO/GRPO trainer │ │ │
# │ vLLM inference │ │ │
# └──────────────────────┘ └──────────────────┘
#
# Usage:
# python -m vagen.train --config configs/train_waa_vagen.yaml
# # Via orchestration script (recommended):
# python scripts/train_verl_e2e.py --cloud aws --task-id <UUID>
#
# # Via CLI:
# oa-vm gpu-train --cloud aws --task-id <UUID>
#
# For mock testing (no VM):
# Set server_url to "mock" and use WAAMockAdapter internally
# Set server_url to "mock" in env config

# --- Model ---
model:
Expand All @@ -26,41 +45,45 @@ model:
# target_modules: [q_proj, k_proj, v_proj, o_proj]

# --- Environment ---
# VAGEN loads envs from env_registry.yaml using these specs.
# WAADesktopEnv implements GymImageEnv (async reset/step/close/system_prompt).
# Each env instance connects to the WAA server independently via HTTP.
envs:
- name: WAADesktop
n_envs: 8 # Number of parallel environments (= GRPO group size)
data_source: waa
seed: [1, 100, 1] # [start, end, step] for task selection
seed: [1, 100, 1] # [start, end, step] for deterministic seeding
max_turns: 15 # Max actions per episode
response_length_per_turn: 512
config:
server_url: "http://localhost:5001"
server_url: "http://localhost:5000" # WAA Flask API (screenshots, actions)
evaluate_url: "http://localhost:5001" # evaluate_server (setup, evaluate)
task_id: "REPLACE_WITH_WAA_TASK_UUID"
max_steps: 15
evaluate_at_done: true
action_type: fractional # VLM outputs normalized 0-1 coordinates

# --- Training (GRPO) ---
# --- Training (GRPO/GiGPO) ---
algorithm:
name: grpo # or "gigpo" for step-level advantages
kl_coef: 0.0 # No KL penalty (DAPO/Open-Reasoner-Zero style)
epsilon: 0.2 # PPO clip range (inactive with single epoch)
gamma: 1.0 # No discounting for episodic tasks
epsilon: 0.2 # PPO clip range
gamma: 1.0 # No discounting for episodic tasks (use 0.95 for gigpo)

trainer:
total_epochs: 100
n_gpus_per_node: 2 # Minimum for VLM training
micro_batch_size: 4
gradient_accumulation_steps: 2
test_freq: 5 # Evaluate every N epochs
experiment_name: grpo_waa_desktop
project_name: openadapt-waa-rl
logger:
- console
- wandb

# --- Rollout ---
rollout:
temperature: 0.7
top_p: 0.95
mode: async # async sglang rollout for throughput

# --- Logging ---
logging:
project: openadapt-waa-rl
log_interval: 1
save_interval: 10
mode: async # Async sglang rollout for throughput
128 changes: 128 additions & 0 deletions docs/gpu_e2e_validation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# GPU E2E Validation Report

**Date**: 2026-03-04
**Status**: VALIDATED
**PR**: [#87](https://github.com/OpenAdaptAI/openadapt-evals/pull/87) (`feat/gpu-training-automation`)
**Author**: OpenAdapt engineering

## Summary

End-to-end validation of the verl-agent/VAGEN training pipeline on AWS
g5.xlarge (NVIDIA A10G, 24 GB VRAM). The full integration chain —
`WAADesktopEnv -> RLEnvironment -> WAALiveAdapter -> WAA Flask API` — was
confirmed working with the GPU VM connecting to an Azure WAA VM
(`waa-pool-00`) via a two-port proxy architecture. Five issues were
discovered and resolved during validation.

## Architecture

```
GPU VM (AWS g5.xlarge) WAA VM (Azure waa-pool-00)
+---------------------------+ +---------------------------+
| verl-agent / VAGEN | | Docker |
| +- WAADesktopEnv | HTTP | +- QEMU (Windows 11) |
| +- RLEnvironment | ---------> | +- WAA Flask API |
| +- WAALiveAdapter | :5000 | | /screenshot |
| | :5051* | | /execute_windows |
| PyTorch 2.8.0 | | +- evaluate_server |
| vLLM 0.11.0 | | /setup |
| Ray 2.54.0 | | /evaluate |
+---------------------------+ +---------------------------+
3.236.121.184 172.173.66.131

* evaluate_server.py listens on port 5050 inside the Docker container.
Docker port forwarding for 5050 is broken by QEMU NET_ADMIN, so a
socat/nsenter UNIX socket bridge exposes it as port 5051 on the VM host.
See architecture.md for details.
```

See [architecture.md](architecture.md) for the proxy chain deep dive.

## Environment

### GPU VM Specs

| Component | Value |
|-----------------|-------------------------------------------------------------|
| Instance type | g5.xlarge |
| GPU | NVIDIA A10G Tensor Core (24 GB VRAM, Ampere, CC 8.6) |
| vCPU | 4 (AMD EPYC 7R13) |
| Memory | 16 GiB |
| OS | Ubuntu 22.04 LTS |
| AMI | Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (20260222) |
| Region | us-east-1 |

### Software Stack

| Package | Version |
|----------------|----------|
| PyTorch | 2.8.0 |
| vLLM | 0.11.0 |
| Ray | 2.54.0 |
| VAGEN | 26.2.5 |
| Transformers | 5.2.0 |
| CUDA Toolkit | 12.8 |
| cuDNN | 9.10.2 |
| Python | 3.12 |

Full version listing: [artifacts/gpu_vm_stack_versions.txt](artifacts/gpu_vm_stack_versions.txt)

## Validation Steps and Results

| # | Test | Artifact Stage | Result |
|---|------------------------------------------|----------------|--------|
| 1 | GPU detected (`nvidia-smi`) | Stage 1 | PASS |
| 2 | Miniconda + conda env creation | Stages 2-3b | PASS (after TOS fix) |
| 3 | V100 -> A10G instance swap | Stage 4 | PASS |
| 4 | vLLM 0.11.0 install + import | Stage 5 | PASS |
| 5 | PyTorch 2.8.0 CUDA available | Stage 6 | PASS |
| 6 | VAGEN install + env registry load | Stage 7* | PASS |
| 7 | Docker port 5050 socat bridge | Stage 7 | PASS |
| 8 | WAADesktopEnv reset + screenshot | Stage 8 | PASS |
| 9 | WAALiveAdapter execute action | Stage 8 | PASS |
| 10 | Full RLEnvironment step loop | Stage 8 | PASS |

\* VAGEN install output also in [artifacts/vagen_registry_output.txt](artifacts/vagen_registry_output.txt).

## Issues Discovered

| # | Issue | Root Cause | Fix Applied |
|---|-------------------------------|-----------------------------------------------------|------------------------------------------------------|
| 1 | Conda TOS error | Miniconda 2025 requires explicit TOS acceptance | `conda tos accept --override-channels --channel ...` |
| 2 | PyTorch version conflict | vLLM 0.11.0 pins `torch==2.8.0`; pip pulled 2.10.0 | `pip install torch==2.8.0 --upgrade` |
| 3 | V100 GPU incompatible | V100 lacks GSP (required for modern NVIDIA drivers) | Switched p3.2xlarge (V100) to g5.xlarge (A10G) |
| 4 | Docker port 5050 broken | QEMU `NET_ADMIN` breaks Docker bridge networking | UNIX socket bridge via `nsenter` + `socat` |
| 5 | AMI selection | Multiple DL AMI variants; wrong one wastes setup | Standardized on OSS Nvidia Driver + PyTorch 2.7 AMI |

Details in [artifacts/e2e_test_output.txt](artifacts/e2e_test_output.txt).

## Cost

| Metric | Value |
|---------------------|--------------|
| Instance cost | $1.006/hr |
| Validation runtime | ~30 min |
| Estimated cost | ~$0.50 |
| Auto-shutdown | 30 min post-validation |

## Commits (PR #87)

```
f9e5804 feat: add GPU training automation for verl-agent E2E workflow
dda3fb2 fix: correct verl-agent Hydra config paths and document integration gap
dc4f088 fix: replace EnvironmentManagerBase with VAGEN registry-based env integration
dc1f81f fix: correct is_action_valid logic, scroll_direction, stale refs, and DRY violation
308cade fix: resolve lint errors (undefined use_fast, unused imports, f-strings)
e73df70 fix: add evaluate_url support and E2E validation test
17c919b fix: use Deep Learning AMI for GPU instances and fix setup issues
c2555ef docs: add GPU E2E validation report with artifacts
b7efb4f fix: resolve port inconsistencies and add missing context in validation docs
```

## Next Steps

1. Merge PR #87 once CI passes
2. Bump openadapt-ml PyTorch requirement to `>=2.8.0` (currently `>=2.9.1`, conflicts with vLLM)
3. Document UNIX socket bridge in deployment runbook
4. Evaluate spot instances for cost optimization during training runs
5. Run first GRPO/GiGPO training loop on validated stack
Loading