Deepmind Control (https://github.com/deepmind/dm_control)
- I could not find any ppo deepmind_control benchmark. It is a first version only. Will be updated later.
- Humanoid (Stand, Walk or Run)
poetry install -E envpool
poetry run pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
poetry run python runner.py --train --file rl_games/configs/dm_control/humanoid_walk.yaml
- No tuning. I just run it on a couple of envs.
- I used 4000 epochs which is ~32M steps for almost all envs except HumanoidRun. But a few millions of steps was enough for the most of the envs.
- Deepmind used a pretty strange reward and training rules. A simple reward transformation: log(reward + 1) achieves best scores faster.
| Env | Rewards |
|---|---|
| Ball In Cup Catch | 938 |
| Cartpole Balance | 988 |
| Cheetah Run | 685 |
| Fish Swim | 600 |
| Hopper Stand | 557 |
| Humanoid Stand | 653 |
| Humanoid Walk | 621 |
| Humanoid Run | 200 |
| Pendulum Swingup | 706 |
| Walker Stand | 907 |
| Walker Walk | 917 |
| Walker Run | 702 |