Bump G1 max_iterations to 5000 on Newton for rough-terrain parity

hujc7 · hujc7 · commit e1b7bd647b0c · 2026-04-24T06:07:08.000Z
PhysX G1 saturates near iter 3000: reward ≈ +18, ep_len ≈ 980.
Past iter 3000 PhysX does not meaningfully improve on either metric —
reward oscillates +16-19 through iter 7500, ep_len stays flat.

Newton vanilla reaches matching (reward, ep_len) = (+16, 984) at
iter 5000 and equals/exceeds PhysX by iter 6000 (+18.9 / 996). The
gap is sample-efficiency, not a ceiling.

Ablation (armature 0.01/0.03, damping 5→20, finger-removal from action
space, Newton upstream a27277) did not change Newton's curve shape.
Use the framework preset on max_iterations rather than tuning physics
or reward terms, keeping the env config engine-agnostic. Precedent:
Allegro Hand (5000), Spot (20000).
diff --git a/source/isaaclab_tasks/config/extension.toml b/source/isaaclab_tasks/config/extension.toml
@@ -1,7 +1,7 @@
 [package]
 
 # Note: Semantic Versioning is used: https://semver.org/
-version = "1.5.26"
+version = "1.5.27"
 
 # Description
 title = "Isaac Lab Environments"
diff --git a/source/isaaclab_tasks/docs/CHANGELOG.rst b/source/isaaclab_tasks/docs/CHANGELOG.rst
@@ -1,6 +1,22 @@
 Changelog
 ---------
 
+1.5.27 (2026-04-24)
+~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added Newton rough terrain support for the G1 biped locomotion velocity
+  env. The only engine-specific change is a ~1.7x ``max_iterations`` preset on
+  :class:`~isaaclab_tasks.manager_based.locomotion.velocity.config.g1.agents.rsl_rl_ppo_cfg.G1RoughPPORunnerCfg`
+  (Newton = 5000, PhysX = 3000). PhysX saturates near iter 3000 on both
+  reward (≈ +18) and episode length (≈ 980) and does not meaningfully
+  improve further; Newton reaches the same (reward, ep_len) quality at
+  iter 5000. The iteration budget is bumped rather than tuning physics
+  or reward terms.
+
+
 1.5.26 (2026-04-24)
 ~~~~~~~~~~~~~~~~~~~
 
diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/agents/rsl_rl_ppo_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/g1/agents/rsl_rl_ppo_cfg.py
@@ -6,12 +6,19 @@
 from isaaclab.utils import configclass
 
 from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg
+from isaaclab_tasks.utils import preset
 
 
 @configclass
 class G1RoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
     num_steps_per_env = 24
-    max_iterations = 3000
+    # Newton needs ~1.7x the PPO iterations to match PhysX on G1. PhysX saturates near iter 3000
+    # (reward ≈ +18, ep_len ≈ 980) and does not meaningfully improve on either metric past that —
+    # reward oscillates +16 to +19 through iter 7500, ep_len stays flat. Newton reaches the same
+    # (reward, ep_len) quality at iter 5000 (+16 / 984). Comparing reward alone is misleading:
+    # ep_len confirms the robot is stable in both cases. The gap is sample-efficiency, not a
+    # ceiling — no physics or reward tuning closes it.
+    max_iterations = preset(default=3000, newton=5000)
     save_interval = 50
     experiment_name = "g1_rough"
     policy = RslRlPpoActorCriticCfg(