Entry points that wire Hydra configs, Isaac Sim, and PPO policies.
-
Hydra + W&B setup
- Hydra loads
cfg/train.yamland constructs the Isaac SimAppLauncher. - W&B is initialized from
cfg.wandb(project, mode, tags).
- Hydra loads
-
Create environment and policy
make_env_policy(cfg)builds:- the vectorized Isaac env (Active Adaptation
_Envsubclass), - the PPO policy (
ppo/ppo_roa/ppo_amp), - optional VecNorm transform.
- the vectorized Isaac env (Active Adaptation
-
Rollout and training loop
- The env is reset once and a rollout policy is created:
rollout_policy = policy.get_rollout_policy("train"). - A TensorDict buffer
data_bufof shape[num_envs, train_every, ...]is allocated based on a one-step probe. - For each iteration:
- With
ExplorationType.RANDOM, the rollout policy is applied fortrain_everysteps, andenv.step_and_maybe_resetfillsdata_buf(includingnext-fields andis_initmasks). - The critic is run once on
data_bufand the bootstrappednext["state_value"]is computed. policy.train_op(data_buf)is called to perform PPO updates (and any adaptation/estimation steps), returning a metrics dict.- Episode stats and env performance metrics are aggregated and logged to W&B; checkpoints are written when
should_save(i)is true.
- With
- After training, a final checkpoint is saved and
evaluate(...)runs an eval rollout withpolicy.get_rollout_policy("eval"), logging results to W&B before clean shutdown.
- The env is reset once and a rollout policy is created:
play.py— loads a checkpoint (local path orrun:<wandb-run>) and runs rollouts; can export ONNX whenexport_policy=true.eval.py/eval_multiple.py/eval_run.py— batch evaluation helpers;eval_run.pycan fetch and visualize remote W&B runs.vis/— MuJoCo visualization utilities (e.g.,mujoco_mocap_viewer.py,motion_data_publisher.py).
# Train (teacher policy)
python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase
# Finetune student
python scripts/train.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>
# Evaluate Student
python scripts/play.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<student-wandb_run_path>