OpenAdaptAI · abrichr · Mar 4, 2026 · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/configs/train_waa_vagen.yaml b/configs/train_waa_vagen.yaml
@@ -1,19 +1,38 @@
 # VAGEN training config for WAA desktop automation
 #
 # This trains a VLM (e.g., Qwen2.5-VL-3B) to automate Windows desktop tasks
-# using GRPO/GiGPO via the verl-agent framework.
+# using GRPO/GiGPO via the VAGEN framework (verl-agent).
 #
 # Prerequisites:
-#   1. WAA server running (via SSH tunnel): ssh -L 5001:localhost:5050 azureuser@<VM_IP>
-#   2. VAGEN installed: pip install vagen
-#   3. Register env: add to vagen's env_registry.yaml:
+#   1. WAA server reachable (via SSH tunnel if needed):
+#        ssh -N -L 5000:localhost:5000 -L 5001:localhost:5051 azureuser@<VM_IP>
+#      Port 5000: WAA Flask API (/screenshot, /execute_windows)
+#      Port 5001: evaluate_server (/setup, /evaluate) via socat bridge from container :5050
+#   2. VAGEN installed on GPU VM (see scripts/setup_gpu_training.sh)
+#   3. openadapt-evals installed on GPU VM (pip install openadapt-evals)
+#   4. Register env in VAGEN's env_registry.yaml:
 #        WAADesktop: openadapt_evals.adapters.verl_env.WAADesktopEnv
+#      (automated by: scripts/train_verl_e2e.py or oa-vm gpu-train)
+#
+# Architecture:
+#   GPU VM                              CPU VM
+#   ┌──────────────────────┐            ┌──────────────────┐
+#   │ VAGEN / verl         │            │ Docker           │
+#   │  GymAgentLoop        │  HTTP      │  QEMU (Win 11)   │
+#   │   WAADesktopEnv ─────│───────────>│   WAA Flask API  │
+#   │  GiGPO/GRPO trainer  │            │                  │
+#   │  vLLM inference      │            │                  │
+#   └──────────────────────┘            └──────────────────┘
 #
 # Usage:
-#   python -m vagen.train --config configs/train_waa_vagen.yaml
+#   # Via orchestration script (recommended):
+#   python scripts/train_verl_e2e.py --cloud aws --task-id <UUID>
+#
+#   # Via CLI:
+#   oa-vm gpu-train --cloud aws --task-id <UUID>
 #
 # For mock testing (no VM):
-#   Set server_url to "mock" and use WAAMockAdapter internally
+#   Set server_url to "mock" in env config
 
 # --- Model ---
 model:
@@ -26,41 +45,45 @@ model:
   #   target_modules: [q_proj, k_proj, v_proj, o_proj]
 
 # --- Environment ---
+# VAGEN loads envs from env_registry.yaml using these specs.
+# WAADesktopEnv implements GymImageEnv (async reset/step/close/system_prompt).
+# Each env instance connects to the WAA server independently via HTTP.
 envs:
   - name: WAADesktop
     n_envs: 8                    # Number of parallel environments (= GRPO group size)
     data_source: waa
-    seed: [1, 100, 1]            # [start, end, step] for task selection
+    seed: [1, 100, 1]            # [start, end, step] for deterministic seeding
     max_turns: 15                # Max actions per episode
     response_length_per_turn: 512
     config:
-      server_url: "http://localhost:5001"
+      server_url: "http://localhost:5000"       # WAA Flask API (screenshots, actions)
+      evaluate_url: "http://localhost:5001"     # evaluate_server (setup, evaluate)
       task_id: "REPLACE_WITH_WAA_TASK_UUID"
       max_steps: 15
       evaluate_at_done: true
       action_type: fractional    # VLM outputs normalized 0-1 coordinates
 
-# --- Training (GRPO) ---
+# --- Training (GRPO/GiGPO) ---
 algorithm:
   name: grpo                     # or "gigpo" for step-level advantages
   kl_coef: 0.0                   # No KL penalty (DAPO/Open-Reasoner-Zero style)
-  epsilon: 0.2                   # PPO clip range (inactive with single epoch)
-  gamma: 1.0                     # No discounting for episodic tasks
+  epsilon: 0.2                   # PPO clip range
+  gamma: 1.0                     # No discounting for episodic tasks (use 0.95 for gigpo)
 
 trainer:
   total_epochs: 100
   n_gpus_per_node: 2             # Minimum for VLM training
   micro_batch_size: 4
   gradient_accumulation_steps: 2
+  test_freq: 5                   # Evaluate every N epochs
+  experiment_name: grpo_waa_desktop
+  project_name: openadapt-waa-rl
+  logger:
+    - console
+    - wandb
 
 # --- Rollout ---
 rollout:
   temperature: 0.7
   top_p: 0.95
-  mode: async                    # async sglang rollout for throughput
-
-# --- Logging ---
-logging:
-  project: openadapt-waa-rl
-  log_interval: 1
-  save_interval: 10
+  mode: async                    # Async sglang rollout for throughput
diff --git a/docs/gpu_e2e_validation/README.md b/docs/gpu_e2e_validation/README.md
@@ -0,0 +1,128 @@
+# GPU E2E Validation Report
+
+**Date**: 2026-03-04
+**Status**: VALIDATED
+**PR**: [#87](https://github.com/OpenAdaptAI/openadapt-evals/pull/87) (`feat/gpu-training-automation`)
+**Author**: OpenAdapt engineering
+
+## Summary
+
+End-to-end validation of the verl-agent/VAGEN training pipeline on AWS
+g5.xlarge (NVIDIA A10G, 24 GB VRAM). The full integration chain —
+`WAADesktopEnv -> RLEnvironment -> WAALiveAdapter -> WAA Flask API` — was
+confirmed working with the GPU VM connecting to an Azure WAA VM
+(`waa-pool-00`) via a two-port proxy architecture. Five issues were
+discovered and resolved during validation.
+
+## Architecture
+
+```
+GPU VM (AWS g5.xlarge)                    WAA VM (Azure waa-pool-00)
++---------------------------+             +---------------------------+
+|  verl-agent / VAGEN       |             |  Docker                   |
+|  +- WAADesktopEnv         |   HTTP      |  +- QEMU (Windows 11)    |
+|  +- RLEnvironment         | ---------> |     +- WAA Flask API      |
+|  +- WAALiveAdapter        |  :5000     |     |  /screenshot        |
+|                           |  :5051*    |     |  /execute_windows   |
+|  PyTorch 2.8.0            |             |     +- evaluate_server   |
+|  vLLM 0.11.0              |             |        /setup            |
+|  Ray 2.54.0               |             |        /evaluate         |
++---------------------------+             +---------------------------+
+      3.236.121.184                            172.173.66.131
+
+* evaluate_server.py listens on port 5050 inside the Docker container.
+  Docker port forwarding for 5050 is broken by QEMU NET_ADMIN, so a
+  socat/nsenter UNIX socket bridge exposes it as port 5051 on the VM host.
+  See architecture.md for details.
+```
+
+See [architecture.md](architecture.md) for the proxy chain deep dive.
+
+## Environment
+
+### GPU VM Specs
+
+| Component       | Value                                                       |
+|-----------------|-------------------------------------------------------------|
+| Instance type   | g5.xlarge                                                   |
+| GPU             | NVIDIA A10G Tensor Core (24 GB VRAM, Ampere, CC 8.6)       |
+| vCPU            | 4 (AMD EPYC 7R13)                                          |
+| Memory          | 16 GiB                                                     |
+| OS              | Ubuntu 22.04 LTS                                           |
+| AMI             | Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.7 (20260222) |
+| Region          | us-east-1                                                   |
+
+### Software Stack
+
+| Package        | Version  |
+|----------------|----------|
+| PyTorch        | 2.8.0    |
+| vLLM           | 0.11.0   |
+| Ray            | 2.54.0   |
+| VAGEN          | 26.2.5   |
+| Transformers   | 5.2.0    |
+| CUDA Toolkit   | 12.8     |
+| cuDNN          | 9.10.2   |
+| Python         | 3.12     |
+
+Full version listing: [artifacts/gpu_vm_stack_versions.txt](artifacts/gpu_vm_stack_versions.txt)
+
+## Validation Steps and Results
+
+| # | Test                                     | Artifact Stage | Result |
+|---|------------------------------------------|----------------|--------|
+| 1 | GPU detected (`nvidia-smi`)              | Stage 1        | PASS   |
+| 2 | Miniconda + conda env creation           | Stages 2-3b    | PASS (after TOS fix) |
+| 3 | V100 -> A10G instance swap               | Stage 4        | PASS   |
+| 4 | vLLM 0.11.0 install + import             | Stage 5        | PASS   |
+| 5 | PyTorch 2.8.0 CUDA available             | Stage 6        | PASS   |
+| 6 | VAGEN install + env registry load        | Stage 7*       | PASS   |
+| 7 | Docker port 5050 socat bridge            | Stage 7        | PASS   |
+| 8 | WAADesktopEnv reset + screenshot         | Stage 8        | PASS   |
+| 9 | WAALiveAdapter execute action            | Stage 8        | PASS   |
+| 10 | Full RLEnvironment step loop            | Stage 8        | PASS   |
+
+\* VAGEN install output also in [artifacts/vagen_registry_output.txt](artifacts/vagen_registry_output.txt).
+
+## Issues Discovered
+
+| # | Issue                          | Root Cause                                          | Fix Applied                                          |
+|---|-------------------------------|-----------------------------------------------------|------------------------------------------------------|
+| 1 | Conda TOS error               | Miniconda 2025 requires explicit TOS acceptance     | `conda tos accept --override-channels --channel ...` |
+| 2 | PyTorch version conflict       | vLLM 0.11.0 pins `torch==2.8.0`; pip pulled 2.10.0 | `pip install torch==2.8.0 --upgrade`                 |
+| 3 | V100 GPU incompatible          | V100 lacks GSP (required for modern NVIDIA drivers) | Switched p3.2xlarge (V100) to g5.xlarge (A10G)      |
+| 4 | Docker port 5050 broken        | QEMU `NET_ADMIN` breaks Docker bridge networking    | UNIX socket bridge via `nsenter` + `socat`           |
+| 5 | AMI selection                  | Multiple DL AMI variants; wrong one wastes setup    | Standardized on OSS Nvidia Driver + PyTorch 2.7 AMI |
+
+Details in [artifacts/e2e_test_output.txt](artifacts/e2e_test_output.txt).
+
+## Cost
+
+| Metric              | Value        |
+|---------------------|--------------|
+| Instance cost       | $1.006/hr    |
+| Validation runtime  | ~30 min      |
+| Estimated cost      | ~$0.50       |
+| Auto-shutdown       | 30 min post-validation |
+
+## Commits (PR #87)
+
+```
+f9e5804 feat: add GPU training automation for verl-agent E2E workflow
+dda3fb2 fix: correct verl-agent Hydra config paths and document integration gap
+dc4f088 fix: replace EnvironmentManagerBase with VAGEN registry-based env integration
+dc1f81f fix: correct is_action_valid logic, scroll_direction, stale refs, and DRY violation
+308cade fix: resolve lint errors (undefined use_fast, unused imports, f-strings)
+e73df70 fix: add evaluate_url support and E2E validation test
+17c919b fix: use Deep Learning AMI for GPU instances and fix setup issues
+c2555ef docs: add GPU E2E validation report with artifacts
+b7efb4f fix: resolve port inconsistencies and add missing context in validation docs
+```
+
+## Next Steps
+
+1. Merge PR #87 once CI passes
+2. Bump openadapt-ml PyTorch requirement to `>=2.8.0` (currently `>=2.9.1`, conflicts with vLLM)
+3. Document UNIX socket bridge in deployment runbook
+4. Evaluate spot instances for cost optimization during training runs
+5. Run first GRPO/GiGPO training loop on validated stack