fix: correct training entry point, env config, and GPU defaults for VAGEN#101
Merged
Conversation
…AGEN
Five blockers for running verl-agent training on g5.xlarge:
A) n_gpus default: 2 -> 1 (g5.xlarge has 1 GPU; multi-GPU is for g5.12xlarge)
- train_verl_e2e.py argparse default
- train_waa_vagen.yaml trainer.n_gpus_per_node
- vm_cli.py gpu-train --n-gpus default
B) n_envs: 8 -> 1 (single WAA VM; GRPO group size is rollout.n, not n_envs)
- train_waa_vagen.yaml envs[0].n_envs
C) Training entry point: verl.trainer.main_ppo -> vagen.main_ppo
- VAGEN has its own entry point with Hydra config support
- Added --config-path and --config-name Hydra args
D) Generated config: full training config -> env spec only
- _generate_training_config now emits only the envs section
- Algorithm, trainer, and rollout settings are Hydra overrides on CLI
- data.train_files/val_files point to the env spec YAML
E) Rollout config: added VAGEN-required Hydra overrides
- multi_turn.enable=True for multi-step desktop tasks
- rollout.n={group_size} for GRPO group size
- FSDP param/optimizer offload for single-GPU memory
- gradient checkpointing enabled
- total_training_steps replaces total_epochs (VAGEN uses steps)
- Added evaluate_url to log output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes 5 blockers that prevent verl-agent training scripts from running on g5.xlarge (single A10G GPU):
train_verl_e2e.py,train_waa_vagen.yaml, andvm_cli.pygpu-train parser. g5.xlarge has 1 GPU; 4 GPUs is for g5.12xlarge.train_waa_vagen.yaml. We have a single WAA VM; GRPO group size is controlled byrollout.n, notn_envs.verl.trainer.main_ppotovagen.main_ppowith Hydra--config-pathand--config-nameargs. VAGEN has its own entry point._generate_training_confignow emits only theenvssection (env spec YAML), not the full training config. Algorithm, trainer, and rollout settings are Hydra overrides on the command line.data.train_files/data.val_filesreference the env spec.multi_turn.enable=True,rollout.nfor GRPO group size, FSDP param/optimizer offload, gradient checkpointing,total_training_steps(replacestotal_epochs),save_freq,val_before_train, andevaluate_urllogging.Test plan
uv run pytest tests/test_verl_env.py -v-- all 38 tests passpython scripts/train_verl_e2e.py --gpu-ip <IP> --skip-setup --task-id <UUID>🤖 Generated with Claude Code