Hi! First of al, thank you for this repo! I've been training for the Locomotion-Aliengo-Flat task (and the Locomotion-Go2-Flat task) using the AMP_PPO algorithm. I'm generally seeing that using AMP_PPO is performing worse than pure PPO, as I slowly decrease the reward_scale from 1.0 to 0.0, the rewards increase. I am using IsaacLab 2.3.2, rsl-rl-lib 3.1.2 and amp-rsl-rl 1.2.0.
For instance, keeping the hyperparameters for PPO and environment the same (as the repo's), and adding Discriminator config of
hidden_dims = [128, 128]
empirical_normalization = False
loss_type = "BCEWithLogits"
I get the following results, where performance is better with lower reward_scale as we use discriminator less. The videos I generated using play_amp.py script also show better performance with lower reward_scale.
rew_scale=0.005:

rew_scale=0.2:

rew_scale=0.5:

I was wondering if you have any insights or tips on how to use AMP_PPO successfully, from any ways to debug to suggestions for hyperparameters? Thank you so much for reading this and for your work!
Hi! First of al, thank you for this repo! I've been training for the
Locomotion-Aliengo-Flattask (and theLocomotion-Go2-Flattask) using theAMP_PPOalgorithm. I'm generally seeing that usingAMP_PPOis performing worse than purePPO, as I slowly decrease thereward_scalefrom 1.0 to 0.0, the rewards increase. I am usingIsaacLab 2.3.2,rsl-rl-lib 3.1.2andamp-rsl-rl 1.2.0.For instance, keeping the hyperparameters for PPO and environment the same (as the repo's), and adding Discriminator config of
I get the following results, where performance is better with lower
reward_scaleas we use discriminator less. The videos I generated usingplay_amp.pyscript also show better performance with lowerreward_scale.rew_scale=0.005:



rew_scale=0.2:
rew_scale=0.5:
I was wondering if you have any insights or tips on how to use
AMP_PPOsuccessfully, from any ways to debug to suggestions for hyperparameters? Thank you so much for reading this and for your work!