diff --git a/docs/source/features/hydra.rst b/docs/source/features/hydra.rst index 25b5d4b8bdf1..cce08dc3750c 100644 --- a/docs/source/features/hydra.rst +++ b/docs/source/features/hydra.rst @@ -25,28 +25,28 @@ As a result, training with hydra arguments can be run with the following syntax: .. code-block:: shell - python scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 .. tab-item:: rl_games :sync: rl_games .. code-block:: shell - python scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.params.seed=2024 + ./isaaclab.sh train --library rl_games --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.params.seed=2024 .. tab-item:: skrl :sync: skrl .. code-block:: shell - python scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 + ./isaaclab.sh train --library skrl --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 .. tab-item:: sb3 :sync: sb3 .. code-block:: shell - python scripts/reinforcement_learning/sb3/train.py --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 + ./isaaclab.sh train --library sb3 --task=Isaac-Cartpole-v0 --headless env.actions.joint_effort.scale=10.0 agent.seed=2024 The above command will run the training script with the task ``Isaac-Cartpole-v0`` in headless mode, and set the ``env.actions.joint_effort.scale`` parameter to 10.0 and the ``agent.seed`` parameter to 2024. @@ -216,7 +216,7 @@ override is given: .. code-block:: bash # Use Newton physics backend - python train.py --task=Isaac-Reach-Franka-v0 env.physics=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task=Isaac-Reach-Franka-v0 env.physics=newton_mjwarp The ``default`` field can be set to ``None`` to make an optional feature that is disabled unless explicitly selected: @@ -236,10 +236,10 @@ disabled unless explicitly selected: .. code-block:: bash # camera is None -- no camera overhead - python train.py --task=Isaac-Reach-Franka-v0 + ./isaaclab.sh train --library rsl_rl --task=Isaac-Reach-Franka-v0 # activate camera with the "large" preset - python train.py --task=Isaac-Reach-Franka-v0 env.scene.camera=large + ./isaaclab.sh train --library rsl_rl --task=Isaac-Reach-Franka-v0 env.scene.camera=large .. _hydra-backend-solver-presets: @@ -299,10 +299,10 @@ is currently beta. .. code-block:: bash # Select the Kamino solver preset everywhere it is defined - python train.py --task=Isaac-Cartpole-v0 presets=newton_kamino + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-v0 presets=newton_kamino # Select the Kamino solver preset for a specific physics config path - python train.py --task=Isaac-Cartpole-v0 env.sim.physics=newton_kamino + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-v0 env.sim.physics=newton_kamino The ``newton_kamino`` preset is currently defined for ``Isaac-Cartpole-Direct-v0``, ``Isaac-Ant-Direct-v0``, ``Isaac-Cartpole-v0``, and ``Isaac-Ant-v0``. Passing @@ -352,7 +352,7 @@ including inside dict-valued fields such as ``actuators``: .. code-block:: bash # Select MJWarp preset globally -- sets armature to 0.01 - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 presets=newton_mjwarp Using Presets @@ -362,7 +362,7 @@ Using Presets .. code-block:: bash - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 \ + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 \ env.events=newton_mjwarp **Global presets** -- apply the same preset name everywhere it exists: @@ -370,21 +370,21 @@ Using Presets .. code-block:: bash # Apply "newton_mjwarp" preset to all configs that define it - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 \ + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 \ presets=newton_mjwarp **Multiple global presets** -- apply several non-conflicting presets: .. code-block:: bash - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 \ + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 \ presets=newton_mjwarp,inference **Combined** -- global presets + scalar overrides: .. code-block:: bash - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 \ + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 \ presets=newton_mjwarp \ env.sim.dt=0.002 @@ -419,10 +419,10 @@ actuator armature is set to ``0.01``. .. code-block:: bash # Default (PhysX events, armature=0.0) - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 # MJWarp (Newton events, armature=0.01) - python train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 presets=newton_mjwarp Summary diff --git a/docs/source/features/multi_gpu.rst b/docs/source/features/multi_gpu.rst index 03277d26e6e1..678e5cedeae3 100644 --- a/docs/source/features/multi_gpu.rst +++ b/docs/source/features/multi_gpu.rst @@ -96,14 +96,14 @@ To train with multiple GPUs, use the following command, where ``--nproc_per_node .. code-block:: shell - python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/train.py --library rl_games --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: rsl_rl :sync: rsl_rl .. code-block:: shell - python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: skrl :sync: skrl @@ -115,14 +115,14 @@ To train with multiple GPUs, use the following command, where ``--nproc_per_node .. code-block:: shell - python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: JAX :sync: jax .. code-block:: shell - python -m skrl.utils.distributed.jax --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax + python -m skrl.utils.distributed.jax --nnodes=1 --nproc_per_node=2 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax .. _multi-gpu-nccl-troubleshooting: @@ -171,14 +171,14 @@ For the master node, use the following command, where ``--nproc_per_node`` repre .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library rl_games --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: rsl_rl :sync: rsl_rl .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: skrl :sync: skrl @@ -190,14 +190,14 @@ For the master node, use the following command, where ``--nproc_per_node`` repre .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: JAX :sync: jax .. code-block:: shell - python -m skrl.utils.distributed.jax --nproc_per_node=2 --nnodes=2 --node_rank=0 --coordinator_address=ip_of_master_machine:5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax + python -m skrl.utils.distributed.jax --nproc_per_node=2 --nnodes=2 --node_rank=0 --coordinator_address=ip_of_master_machine:5555 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax Note that the port (``5555``) can be replaced with any other available port. @@ -211,14 +211,14 @@ For non-master nodes, use the following command, replacing ``--node_rank`` with .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library rl_games --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: rsl_rl :sync: rsl_rl .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: skrl :sync: skrl @@ -230,14 +230,14 @@ For non-master nodes, use the following command, replacing ``--node_rank`` with .. code-block:: shell - python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed + python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr= --master_port=5555 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed .. tab-item:: JAX :sync: jax .. code-block:: shell - python -m skrl.utils.distributed.jax --nproc_per_node=2 --nnodes=2 --node_rank=1 --coordinator_address=ip_of_master_machine:5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax + python -m skrl.utils.distributed.jax --nproc_per_node=2 --nnodes=2 --node_rank=1 --coordinator_address=ip_of_master_machine:5555 scripts/reinforcement_learning/train.py --library skrl --task=Isaac-Cartpole-v0 --headless --distributed --ml_framework jax For more details on multi-node training with PyTorch, please visit the `PyTorch documentation `_. diff --git a/docs/source/features/population_based_training.rst b/docs/source/features/population_based_training.rst index 906eaf516437..3b838a56ecde 100644 --- a/docs/source/features/population_based_training.rst +++ b/docs/source/features/population_based_training.rst @@ -114,7 +114,7 @@ Launch *N* workers, where *n* indicates each worker index: .. code-block:: bash # Run this once per worker (n = 0..N-1), all pointing to the same directory/workspace - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py \ + ./isaaclab.sh train --library rl_games \ --seed= \ --task=Isaac-Repose-Cube-Shadow-Direct-v0 \ --num_envs=8192 \ diff --git a/docs/source/features/ray.rst b/docs/source/features/ray.rst index dbf88d682a24..69e37d5b8eaf 100644 --- a/docs/source/features/ray.rst +++ b/docs/source/features/ray.rst @@ -138,7 +138,7 @@ In a different terminal, run the following. --cfg_file scripts/reinforcement_learning/ray/hyperparameter_tuning/vision_cartpole_cfg.py \ --cfg_class CartpoleTheiaJobCfg \ --run_mode local \ - --workflow scripts/reinforcement_learning/rl_games/train.py \ + --workflow scripts/reinforcement_learning/train.py --library rl_games \ --num_workers_per_node diff --git a/docs/source/features/visualization.rst b/docs/source/features/visualization.rst index c68816c0da00..804348f8080d 100644 --- a/docs/source/features/visualization.rst +++ b/docs/source/features/visualization.rst @@ -63,20 +63,20 @@ Launch visualizers from the command line with ``--visualizer`` (or ``--viz`` ali .. code-block:: bash # Launch all visualizers (comma-delimited list, no spaces) - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --viz kit,newton,rerun + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --viz kit,newton,rerun # Launch only the Newton visualizer - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --viz newton + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --viz newton # Launch the Viser web-based visualizer - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --viz viser + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --viz viser To run in headless mode, omit the ``--viz`` argument: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 .. note:: @@ -491,7 +491,7 @@ the num of environments can be overwritten and decreased using ``--num_envs``: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --viz rerun --num_envs 512 + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --viz rerun --num_envs 512 **Rerun Visualizer FPS Control** diff --git a/docs/source/how-to/profile_with_nsys.rst b/docs/source/how-to/profile_with_nsys.rst index 6ef17f55c555..6ad459cd20a8 100644 --- a/docs/source/how-to/profile_with_nsys.rst +++ b/docs/source/how-to/profile_with_nsys.rst @@ -42,7 +42,7 @@ The following command shows how to capture a profile for the ``Isaac-Cartpole-v0 -t nvtx,cuda \ --python-functions-trace=scripts/benchmarks/nsys_trace.json \ -o my_profile \ - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task=Isaac-Cartpole-v0 \ --headless \ --max_iterations=3 diff --git a/docs/source/how-to/record_video.rst b/docs/source/how-to/record_video.rst index 335ba7192557..b780191f3cfb 100644 --- a/docs/source/how-to/record_video.rst +++ b/docs/source/how-to/record_video.rst @@ -21,7 +21,7 @@ Example usage: .. code-block:: shell - python scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --video --video_length 100 --video_interval 500 + ./isaaclab.sh train --library rl_games --task=Isaac-Cartpole-v0 --headless --video --video_length 100 --video_interval 500 The recorded videos will be saved in the same directory as the training checkpoints, under diff --git a/docs/source/migration/migrating_from_isaacgymenvs.rst b/docs/source/migration/migrating_from_isaacgymenvs.rst index db6371c40c9d..8a46a515693a 100644 --- a/docs/source/migration/migrating_from_isaacgymenvs.rst +++ b/docs/source/migration/migrating_from_isaacgymenvs.rst @@ -916,7 +916,7 @@ To launch a training in Isaac Lab, use the command: .. code-block:: bash - python scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-Direct-v0 --headless + ./isaaclab.sh train --library rl_games --task=Isaac-Cartpole-Direct-v0 --headless Launching Inferencing ~~~~~~~~~~~~~~~~~~~~~ @@ -925,7 +925,7 @@ To launch inferencing in Isaac Lab, use the command: .. code-block:: bash - python scripts/reinforcement_learning/rl_games/play.py --task=Isaac-Cartpole-Direct-v0 --num_envs=25 --checkpoint= + ./isaaclab.sh play --library rl_games --task=Isaac-Cartpole-Direct-v0 --num_envs=25 --checkpoint= Additional Resources diff --git a/docs/source/migration/migrating_from_omniisaacgymenvs.rst b/docs/source/migration/migrating_from_omniisaacgymenvs.rst index b3a46f0a518f..021d853e3254 100644 --- a/docs/source/migration/migrating_from_omniisaacgymenvs.rst +++ b/docs/source/migration/migrating_from_omniisaacgymenvs.rst @@ -983,7 +983,7 @@ To launch a training in Isaac Lab, use the command: .. code-block:: bash - python scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-Direct-v0 --headless + ./isaaclab.sh train --library rl_games --task=Isaac-Cartpole-Direct-v0 --headless Launching Inferencing ~~~~~~~~~~~~~~~~~~~~~ @@ -992,7 +992,7 @@ To launch inferencing in Isaac Lab, use the command: .. code-block:: bash - python scripts/reinforcement_learning/rl_games/play.py --task=Isaac-Cartpole-Direct-v0 --num_envs=25 --checkpoint= + ./isaaclab.sh play --library rl_games --task=Isaac-Cartpole-Direct-v0 --num_envs=25 --checkpoint= .. _`OmniIsaacGymEnvs`: https://github.com/isaac-sim/OmniIsaacGymEnvs diff --git a/docs/source/migration/migrating_to_isaaclab_3-0.rst b/docs/source/migration/migrating_to_isaaclab_3-0.rst index be5703ab1261..45b079f78a29 100644 --- a/docs/source/migration/migrating_to_isaaclab_3-0.rst +++ b/docs/source/migration/migrating_to_isaaclab_3-0.rst @@ -444,10 +444,10 @@ Pass ``presets=newton_mjwarp`` (or ``presets=physx``) on the CLI to swap the ent .. code-block:: bash # Run with Newton backend - python train.py task=Isaac-Franka-Cabinet-v0 presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task=Isaac-Franka-Cabinet-v0 presets=newton_mjwarp # Run with default (PhysX) backend - python train.py task=Isaac-Franka-Cabinet-v0 + ./isaaclab.sh train --library rsl_rl --task=Isaac-Franka-Cabinet-v0 Adding Multi-Backend Support to an Environment ----------------------------------------------- diff --git a/docs/source/overview/core-concepts/multi_backend_architecture.rst b/docs/source/overview/core-concepts/multi_backend_architecture.rst index 8fd0e326437f..c9479b990afd 100644 --- a/docs/source/overview/core-concepts/multi_backend_architecture.rst +++ b/docs/source/overview/core-concepts/multi_backend_architecture.rst @@ -161,10 +161,10 @@ Users then select the MJWarp Newton preset at the command line: .. code-block:: bash # Default (PhysX) - python train.py --task Isaac-Cartpole-v0 + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 # MJWarp (Newton backend) - python train.py --task Isaac-Cartpole-v0 presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 presets=newton_mjwarp The Physics Manager ------------------- diff --git a/docs/source/overview/core-concepts/sensors/camera.rst b/docs/source/overview/core-concepts/sensors/camera.rst index 29673f2bf465..bf765b4bf1c3 100644 --- a/docs/source/overview/core-concepts/sensors/camera.rst +++ b/docs/source/overview/core-concepts/sensors/camera.rst @@ -149,13 +149,13 @@ The active preset is selected at launch via the ``presets=`` CLI argument: .. code-block:: bash # Use Newton Warp renderer - python train.py task=Isaac-Cartpole-RGB-Camera-Direct-v0 presets=newton_renderer + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-RGB-Camera-Direct-v0 presets=newton_renderer # Use OVRTX renderer - python train.py task=Isaac-Cartpole-RGB-Camera-Direct-v0 presets=ovrtx_renderer + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-RGB-Camera-Direct-v0 presets=ovrtx_renderer # Use default (Isaac RTX) - python train.py task=Isaac-Cartpole-RGB-Camera-Direct-v0 + ./isaaclab.sh train --library rsl_rl --task=Isaac-Cartpole-RGB-Camera-Direct-v0 Accessing camera data @@ -173,7 +173,7 @@ When using the RTX renderer, add ``--enable_cameras`` when launching: .. code-block:: shell - python scripts/reinforcement_learning/rl_games/train.py \ + ./isaaclab.sh train --library rl_games \ --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras diff --git a/docs/source/overview/environments.rst b/docs/source/overview/environments.rst index a28d129f7027..264c7477fa85 100644 --- a/docs/source/overview/environments.rst +++ b/docs/source/overview/environments.rst @@ -1136,8 +1136,8 @@ inferencing, including reading from an already trained checkpoint and disabling - single-camera: append ``presets=single_camera,isaacsim_rtx_renderer`` - dual-camera: append ``presets=duo_camera,isaacsim_rtx_renderer`` - The same ``presets=`` flags must be passed to both the training and - play scripts. There is no separately registered + The same ``presets=`` flags must be passed to both the ``train`` and + ``play`` commands. There is no separately registered ``Isaac-Dexsuite-Kuka-Allegro-Lift-Single-Camera-v0`` environment; all observation-mode variants share the base task name and are selected via the preset system. @@ -1152,8 +1152,8 @@ inferencing, including reading from an already trained checkpoint and disabling - single-camera: append ``presets=single_camera,isaacsim_rtx_renderer`` - dual-camera: append ``presets=duo_camera,isaacsim_rtx_renderer`` - The same ``presets=`` flags must be passed to both the training and - play scripts. + The same ``presets=`` flags must be passed to both the ``train`` and + ``play`` commands. - Isaac-Dexsuite-Kuka-Allegro-Reorient-Play-v0 - Manager Based - **rl_games** (PPO), **rsl_rl** (PPO) diff --git a/docs/source/overview/own-project/template.rst b/docs/source/overview/own-project/template.rst index cb52effde62c..371304f52e21 100644 --- a/docs/source/overview/own-project/template.rst +++ b/docs/source/overview/own-project/template.rst @@ -178,14 +178,14 @@ Here are some general commands to get started with it: .. code-block:: bash - python scripts/reinforcement_learning//train.py --task= + ./isaaclab.sh train --library --task= .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows .. code-block:: batch - python scripts\reinforcement_learning\\train.py --task= + isaaclab.bat train --library --task= * Run a task with dummy agents. diff --git a/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst b/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst index 008ebf91c0d8..65ca7db7af64 100644 --- a/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst +++ b/docs/source/overview/reinforcement-learning/rl_existing_scripts.rst @@ -3,18 +3,42 @@ Reinforcement Learning Scripts We provide wrappers to different reinforcement libraries. These wrappers convert the data from the environments into the respective libraries function argument and return types. +The unified reinforcement learning entrypoints can be launched with the Isaac Lab +``train`` and ``play`` commands: + +.. code:: bash + + ./isaaclab.sh train --library --task + ./isaaclab.sh play --library --task + +These commands are resolved by the Isaac Lab CLI. From a source checkout, ``uv`` can also +create and sync the environment automatically: + +.. code:: bash + + uv run train --help + uv run train --library rsl_rl \ + --task Isaac-Cartpole-Direct-v0 --headless presets=newton_mjwarp + uv run play --library rsl_rl \ + --task --checkpoint + +The default ``uv run`` environment includes the RSL-RL, tasks, and Newton dependency +stacks. Add extras such as ``ov`` or ``rtx`` only when a workflow needs them. +Isaac Sim Kit workflows, including PhysX, should use the existing full installation +path. When running Python directly from an activated environment, use the full +script path. Newton Backend -------------- -All training and play scripts support the **Newton physics backend** via the ``presets=newton_mjwarp`` +All training and play commands support the **Newton physics backend** via the ``presets=newton_mjwarp`` Hydra override. Appending ``presets=newton_mjwarp`` to any command below switches the physics engine from the default PhysX to Newton: .. code:: bash # Generic pattern — works with any framework and task that supports Newton - ./isaaclab.sh -p scripts/reinforcement_learning//train.py \ + ./isaaclab.sh train --library \ --task --headless presets=newton_mjwarp .. note:: @@ -45,12 +69,12 @@ model) with ``presets=rgb``: .. code:: bash # Train with RGB-only observations - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Repose-Cube-Shadow-Vision-Direct-v0 --headless \ --enable_cameras presets=rgb # Play — must use the same preset to load the matching checkpoint - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Repose-Cube-Shadow-Vision-Direct-Play-v0 \ --enable_cameras presets=rgb @@ -92,13 +116,13 @@ RL-Games # install python module (for rl-games) ./isaaclab.sh -i rl_games # run script for training - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task Isaac-Ant-v0 --headless + ./isaaclab.sh train --library rl_games --task Isaac-Ant-v0 --headless # run script for training with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task Isaac-Ant-v0 --headless presets=newton_mjwarp + ./isaaclab.sh train --library rl_games --task Isaac-Ant-v0 --headless presets=newton_mjwarp # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/play.py --task Isaac-Ant-v0 --num_envs 32 --checkpoint /PATH/TO/model.pth + ./isaaclab.sh play --library rl_games --task Isaac-Ant-v0 --num_envs 32 --checkpoint /PATH/TO/model.pth # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/play.py --task Isaac-Ant-v0 --headless --video --video_length 200 + ./isaaclab.sh play --library rl_games --task Isaac-Ant-v0 --headless --video --video_length 200 .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -108,13 +132,13 @@ RL-Games :: install python module (for rl-games) isaaclab.bat -i rl_games :: run script for training - isaaclab.bat -p scripts\reinforcement_learning\rl_games\train.py --task Isaac-Ant-v0 --headless + isaaclab.bat train --library rl_games --task Isaac-Ant-v0 --headless :: run script for training with Newton backend - isaaclab.bat -p scripts\reinforcement_learning\rl_games\train.py --task Isaac-Ant-v0 --headless presets=newton_mjwarp + isaaclab.bat train --library rl_games --task Isaac-Ant-v0 --headless presets=newton_mjwarp :: run script for playing with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\rl_games\play.py --task Isaac-Ant-v0 --num_envs 32 --checkpoint /PATH/TO/model.pth + isaaclab.bat play --library rl_games --task Isaac-Ant-v0 --num_envs 32 --checkpoint /PATH/TO/model.pth :: run script for recording video of a trained agent (requires installing `ffmpeg`) - isaaclab.bat -p scripts\reinforcement_learning\rl_games\play.py --task Isaac-Ant-v0 --headless --video --video_length 200 + isaaclab.bat play --library rl_games --task Isaac-Ant-v0 --headless --video --video_length 200 RSL-RL ------ @@ -133,13 +157,13 @@ RSL-RL # install python module (for rsl-rl) ./isaaclab.sh -i rsl_rl # run script for training - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Reach-Franka-v0 --headless + ./isaaclab.sh train --library rsl_rl --task Isaac-Reach-Franka-v0 --headless # run script for training with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --load_run run_folder_name --checkpoint /PATH/TO/model.pt + ./isaaclab.sh play --library rsl_rl --task Isaac-Reach-Franka-v0 --num_envs 32 --load_run run_folder_name --checkpoint /PATH/TO/model.pt # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 + ./isaaclab.sh play --library rsl_rl --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -149,13 +173,13 @@ RSL-RL :: install python module (for rsl-rl) isaaclab.bat -i rsl_rl :: run script for training - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\train.py --task Isaac-Reach-Franka-v0 --headless + isaaclab.bat train --library rsl_rl --task Isaac-Reach-Franka-v0 --headless :: run script for training with Newton backend - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\train.py --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp + isaaclab.bat train --library rsl_rl --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp :: run script for playing with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --load_run run_folder_name --checkpoint /PATH/TO/model.pt + isaaclab.bat play --library rsl_rl --task Isaac-Reach-Franka-v0 --num_envs 32 --load_run run_folder_name --checkpoint /PATH/TO/model.pt :: run script for recording video of a trained agent (requires installing `ffmpeg`) - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\play.py --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 + isaaclab.bat play --library rsl_rl --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 - Training and distilling an agent with `RSL-RL `__ on ``Isaac-Velocity-Flat-Anymal-D-v0``: @@ -171,13 +195,13 @@ RSL-RL # install python module (for rsl-rl) ./isaaclab.sh -i rsl_rl # run script for rl training of the teacher agent - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless + ./isaaclab.sh train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless # run script for rl training of the teacher agent with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless presets=newton_mjwarp + ./isaaclab.sh train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless presets=newton_mjwarp # run script for distilling the teacher agent into a student agent - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless --agent rsl_rl_distillation_cfg_entry_point --load_run teacher_run_folder_name + ./isaaclab.sh train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless --agent rsl_rl_distillation_cfg_entry_point --load_run teacher_run_folder_name # run script for playing the student with 64 environments - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Flat-Anymal-D-v0 --num_envs 64 --agent rsl_rl_distillation_cfg_entry_point + ./isaaclab.sh play --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --num_envs 64 --agent rsl_rl_distillation_cfg_entry_point .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -187,13 +211,13 @@ RSL-RL :: install python module (for rsl-rl) isaaclab.bat -i rsl_rl :: run script for rl training of the teacher agent - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless + isaaclab.bat train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless :: run script for rl training of the teacher agent with Newton backend - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless presets=newton_mjwarp + isaaclab.bat train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless presets=newton_mjwarp :: run script for distilling the teacher agent into a student agent - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --headless --agent rsl_rl_distillation_cfg_entry_point --load_run teacher_run_folder_name + isaaclab.bat train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --headless --agent rsl_rl_distillation_cfg_entry_point --load_run teacher_run_folder_name :: run script for playing the student with 64 environments - isaaclab.bat -p scripts\reinforcement_learning\rsl_rl\play.py --task Isaac-Velocity-Flat-Anymal-D-v0 --num_envs 64 --agent rsl_rl_distillation_cfg_entry_point + isaaclab.bat play --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --num_envs 64 --agent rsl_rl_distillation_cfg_entry_point SKRL ---- @@ -216,13 +240,13 @@ SKRL # install python module (for skrl) ./isaaclab.sh -i skrl # run script for training - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Reach-Franka-v0 --headless + ./isaaclab.sh train --library skrl --task Isaac-Reach-Franka-v0 --headless # run script for training with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp + ./isaaclab.sh train --library skrl --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --checkpoint /PATH/TO/model.pt + ./isaaclab.sh play --library skrl --task Isaac-Reach-Franka-v0 --num_envs 32 --checkpoint /PATH/TO/model.pt # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 + ./isaaclab.sh play --library skrl --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -232,13 +256,13 @@ SKRL :: install python module (for skrl) isaaclab.bat -i skrl :: run script for training - isaaclab.bat -p scripts\reinforcement_learning\skrl\train.py --task Isaac-Reach-Franka-v0 --headless + isaaclab.bat train --library skrl --task Isaac-Reach-Franka-v0 --headless :: run script for training with Newton backend - isaaclab.bat -p scripts\reinforcement_learning\skrl\train.py --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp + isaaclab.bat train --library skrl --task Isaac-Reach-Franka-v0 --headless presets=newton_mjwarp :: run script for playing with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\skrl\play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --checkpoint /PATH/TO/model.pt + isaaclab.bat play --library skrl --task Isaac-Reach-Franka-v0 --num_envs 32 --checkpoint /PATH/TO/model.pt :: run script for recording video of a trained agent (requires installing `ffmpeg`) - isaaclab.bat -p scripts\reinforcement_learning\skrl\play.py --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 + isaaclab.bat play --library skrl --task Isaac-Reach-Franka-v0 --headless --video --video_length 200 .. tab-item:: JAX @@ -268,13 +292,13 @@ SKRL # install skrl dependencies for JAX ./isaaclab.sh -p -m pip install skrl["jax"] # run script for training - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Reach-Franka-v0 --headless --ml_framework jax + ./isaaclab.sh train --library skrl --task Isaac-Reach-Franka-v0 --headless --ml_framework jax # run script for training with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Reach-Franka-v0 --headless --ml_framework jax presets=newton_mjwarp + ./isaaclab.sh train --library skrl --task Isaac-Reach-Franka-v0 --headless --ml_framework jax presets=newton_mjwarp # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task Isaac-Reach-Franka-v0 --num_envs 32 --ml_framework jax --checkpoint /PATH/TO/model.pt + ./isaaclab.sh play --library skrl --task Isaac-Reach-Franka-v0 --num_envs 32 --ml_framework jax --checkpoint /PATH/TO/model.pt # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task Isaac-Reach-Franka-v0 --headless --ml_framework jax --video --video_length 200 + ./isaaclab.sh play --library skrl --task Isaac-Reach-Franka-v0 --headless --ml_framework jax --video --video_length 200 - Training the multi-agent environment ``Isaac-Shadow-Hand-Over-Direct-v0`` with skrl: @@ -289,9 +313,9 @@ SKRL # install python module (for skrl) ./isaaclab.sh -i skrl # run script for training with the MAPPO algorithm (IPPO is also supported) - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Shadow-Hand-Over-Direct-v0 --headless --algorithm MAPPO + ./isaaclab.sh train --library skrl --task Isaac-Shadow-Hand-Over-Direct-v0 --headless --algorithm MAPPO # run script for playing with 32 environments with the MAPPO algorithm (IPPO is also supported) - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/play.py --task Isaac-Shadow-Hand-Over-Direct-v0 --num_envs 32 --algorithm MAPPO --checkpoint /PATH/TO/model.pt + ./isaaclab.sh play --library skrl --task Isaac-Shadow-Hand-Over-Direct-v0 --num_envs 32 --algorithm MAPPO --checkpoint /PATH/TO/model.pt .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -301,9 +325,9 @@ SKRL :: install python module (for skrl) isaaclab.bat -i skrl :: run script for training with the MAPPO algorithm (IPPO is also supported) - isaaclab.bat -p scripts\reinforcement_learning\skrl\train.py --task Isaac-Shadow-Hand-Over-Direct-v0 --headless --algorithm MAPPO + isaaclab.bat train --library skrl --task Isaac-Shadow-Hand-Over-Direct-v0 --headless --algorithm MAPPO :: run script for playing with 32 environments with the MAPPO algorithm (IPPO is also supported) - isaaclab.bat -p scripts\reinforcement_learning\skrl\play.py --task Isaac-Shadow-Hand-Over-Direct-v0 --num_envs 32 --algorithm MAPPO --checkpoint /PATH/TO/model.pt + isaaclab.bat play --library skrl --task Isaac-Shadow-Hand-Over-Direct-v0 --num_envs 32 --algorithm MAPPO --checkpoint /PATH/TO/model.pt Stable-Baselines3 ----------------- @@ -323,13 +347,13 @@ Stable-Baselines3 # install python module (for stable-baselines3) ./isaaclab.sh -i sb3 # run script for training - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless + ./isaaclab.sh train --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless # run script for training with Newton backend - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless presets=newton_mjwarp + ./isaaclab.sh train --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless presets=newton_mjwarp # run script for playing with 32 environments - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip + ./isaaclab.sh play --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip # run script for recording video of a trained agent (requires installing `ffmpeg`) - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 + ./isaaclab.sh play --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows @@ -339,13 +363,13 @@ Stable-Baselines3 :: install python module (for stable-baselines3) isaaclab.bat -i sb3 :: run script for training - isaaclab.bat -p scripts\reinforcement_learning\sb3\train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless + isaaclab.bat train --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless :: run script for training with Newton backend - isaaclab.bat -p scripts\reinforcement_learning\sb3\train.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless presets=newton_mjwarp + isaaclab.bat train --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless presets=newton_mjwarp :: run script for playing with 32 environments - isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip + isaaclab.bat play --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --num_envs 32 --checkpoint /PATH/TO/model.zip :: run script for recording video of a trained agent (requires installing `ffmpeg`) - isaaclab.bat -p scripts\reinforcement_learning\sb3\play.py --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 + isaaclab.bat play --library sb3 --task Isaac-Velocity-Flat-Unitree-A1-v0 --headless --video --video_length 200 RLinf ----- @@ -375,35 +399,35 @@ large VLA models that don't fit on a single GPU. .. code:: bash # Train with default config (assemble trocar task with GR00T) - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/train.py + ./isaaclab.sh train --library rlinf # Train with a specific config - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/train.py \ + ./isaaclab.sh train --library rlinf \ --config_name isaaclab_ppo_gr00t_assemble_trocar # Train with task override and custom settings - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/train.py \ + ./isaaclab.sh train --library rlinf \ --config_name isaaclab_ppo_gr00t_assemble_trocar \ --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 \ --num_envs 64 --max_epochs 1000 # List available tasks - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/train.py --list_tasks + ./isaaclab.sh train --library rlinf --list_tasks - Evaluating a trained VLA agent: .. code:: bash # Evaluate a trained checkpoint - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/play.py \ + ./isaaclab.sh play --library rlinf \ --model_path /path/to/checkpoint # Evaluate with video recording - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/play.py \ + ./isaaclab.sh play --library rlinf \ --model_path /path/to/checkpoint --video # Evaluate with specific number of environments and episodes - ./isaaclab.sh -p scripts/reinforcement_learning/rlinf/play.py \ + ./isaaclab.sh play --library rlinf \ --model_path /path/to/checkpoint --num_envs 8 --num_episodes 10 diff --git a/docs/source/overview/reinforcement-learning/rl_frameworks.rst b/docs/source/overview/reinforcement-learning/rl_frameworks.rst index 605ee7fd93d7..02376b979e2c 100644 --- a/docs/source/overview/reinforcement-learning/rl_frameworks.rst +++ b/docs/source/overview/reinforcement-learning/rl_frameworks.rst @@ -106,7 +106,7 @@ Training commands (check for the *'Training time: XXX seconds'* line in the term .. code:: bash - python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless - python scripts/reinforcement_learning/skrl/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless - python scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless - python scripts/reinforcement_learning/sb3/train.py --task Isaac-Humanoid-v0 --max_iterations 500 --headless + ./isaaclab.sh train --library rl_games --task Isaac-Humanoid-v0 --max_iterations 500 --headless + ./isaaclab.sh train --library skrl --task Isaac-Humanoid-v0 --max_iterations 500 --headless + ./isaaclab.sh train --library rsl_rl --task Isaac-Humanoid-v0 --max_iterations 500 --headless + ./isaaclab.sh train --library sb3 --task Isaac-Humanoid-v0 --max_iterations 500 --headless diff --git a/docs/source/policy_deployment/01_io_descriptors/io_descriptors_101.rst b/docs/source/policy_deployment/01_io_descriptors/io_descriptors_101.rst index d31de818399a..4a63c9b42ded 100644 --- a/docs/source/policy_deployment/01_io_descriptors/io_descriptors_101.rst +++ b/docs/source/policy_deployment/01_io_descriptors/io_descriptors_101.rst @@ -78,10 +78,10 @@ This can be done by setting the ``export_io_descriptors`` flag in the command li .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors - ./isaaclab.sh -p scripts/reinforcement_learning/skrl/train.py --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors + ./isaaclab.sh train --library rsl_rl --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors + ./isaaclab.sh train --library sb3 --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors + ./isaaclab.sh train --library rl_games --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors + ./isaaclab.sh train --library skrl --task Isaac-Velocity-Flat-Anymal-D-v0 --export_io_descriptors Attaching IO Descriptors to Custom Observation Terms diff --git a/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst b/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst index 0021c2da8b09..7fa25eabdf7a 100644 --- a/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst +++ b/docs/source/policy_deployment/02_gear_assembly/gear_assembly_policy.rst @@ -263,7 +263,7 @@ These friction values were determined through iterative visual comparison: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F140-v0 \ --headless \ --video --video_length 800 --video_interval 5000 @@ -272,7 +272,7 @@ These friction values were determined through iterative visual comparison: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0 \ --headless \ --video --video_length 800 --video_interval 5000 @@ -650,7 +650,7 @@ First, launch the training with a small number of environments and visualization .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F140-ROS-Inference-v0 \ --num_envs 4 \ --visualizer kit @@ -659,7 +659,7 @@ First, launch the training with a small number of environments and visualization .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F85-ROS-Inference-v0 \ --num_envs 4 \ --visualizer kit @@ -668,7 +668,7 @@ First, launch the training with a small number of environments and visualization .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0 \ --num_envs 4 \ --visualizer kit @@ -697,7 +697,7 @@ Now launch the full training run with more parallel environments in headless mod .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F140-ROS-Inference-v0 \ --headless \ --num_envs 256 \ @@ -707,7 +707,7 @@ Now launch the full training run with more parallel environments in headless mod .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F85-ROS-Inference-v0 \ --headless \ --num_envs 256 \ @@ -717,7 +717,7 @@ Now launch the full training run with more parallel environments in headless mod .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-ROS-Inference-v0 \ --headless \ --num_envs 256 \ @@ -823,7 +823,7 @@ CUDA Out of Memory .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F140-v0 \ --headless \ --num_envs 128 # Reduce from 256 to 128, 64, etc. @@ -854,7 +854,7 @@ CUDA Out of Memory .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-GearAssembly-UR10e-2F140-v0 \ --headless \ --num_envs 256 @@ -874,7 +874,7 @@ To use it, run the standard ``play.py`` script: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-Play-v0 \ --num_envs 1 \ --checkpoint diff --git a/docs/source/policy_deployment/04_reach/reach_policy.rst b/docs/source/policy_deployment/04_reach/reach_policy.rst index 9317ed6f6c04..e9bff196608a 100644 --- a/docs/source/policy_deployment/04_reach/reach_policy.rst +++ b/docs/source/policy_deployment/04_reach/reach_policy.rst @@ -421,7 +421,7 @@ Before starting full training, launch a quick visualization run to verify the en .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-Reach-UR10e-ROS-Inference-v0 \ --num_envs 4 \ --visualizer kit @@ -430,7 +430,7 @@ Before starting full training, launch a quick visualization run to verify the en .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-Reach-Rizon4s-ROS-Inference-v0 \ --num_envs 4 \ --visualizer kit @@ -465,7 +465,7 @@ Launch full training with many parallel environments in headless mode: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-Reach-UR10e-ROS-Inference-v0 \ --headless \ --num_envs 4096 \ @@ -475,7 +475,7 @@ Launch full training with many parallel environments in headless mode: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Deploy-Reach-Rizon4s-ROS-Inference-v0 \ --headless \ --num_envs 4096 \ @@ -571,7 +571,7 @@ Once training completes, evaluate the policy in the play environment: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Deploy-Reach-UR10e-Play-v0 \ --num_envs 50 \ --visualizer kit @@ -580,7 +580,7 @@ Once training completes, evaluate the policy in the play environment: .. code-block:: bash - python scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Deploy-Reach-Rizon4s-Play-v0 \ --num_envs 50 \ --visualizer kit @@ -596,12 +596,12 @@ To load a specific checkpoint, use these arguments: .. code-block:: bash # Load from a specific run folder - python scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Deploy-Reach-UR10e-Play-v0 \ --load_run 2025-01-15_14-30-00 # Load a specific checkpoint file - python scripts/reinforcement_learning/rsl_rl/play.py \ + ./isaaclab.sh play --library rsl_rl \ --task Isaac-Deploy-Reach-UR10e-Play-v0 \ --checkpoint /path/to/model_1500.pt diff --git a/docs/source/setup/installation/cloud_installation.rst b/docs/source/setup/installation/cloud_installation.rst index b6d9137680a3..a5f7ba180312 100644 --- a/docs/source/setup/installation/cloud_installation.rst +++ b/docs/source/setup/installation/cloud_installation.rst @@ -153,7 +153,7 @@ To run Isaac Lab commands, open a terminal on the workstation: .. code-block:: bash - ~/IsaacLab/isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task=Isaac-Cartpole-Direct-v0 --headless diff --git a/docs/source/setup/installation/include/pip_extras_note.rst b/docs/source/setup/installation/include/pip_extras_note.rst index 67df28e15089..c5e3bf2a0d4b 100644 --- a/docs/source/setup/installation/include/pip_extras_note.rst +++ b/docs/source/setup/installation/include/pip_extras_note.rst @@ -4,5 +4,5 @@ the bundled training scripts under ``scripts/reinforcement_learning/`` you must install with the ``[all]`` extras (or the per-framework extras ``[skrl]`` / ``[sb3]`` / ``[rsl-rl]``); otherwise commands such - as ``python scripts/reinforcement_learning/skrl/train.py ...`` fail + as ``./isaaclab.sh train --library skrl ...`` fail at import time with ``ModuleNotFoundError: No module named 'skrl'``. diff --git a/docs/source/setup/installation/include/src_verify_isaaclab.rst b/docs/source/setup/installation/include/src_verify_isaaclab.rst index 4488e30cb7ab..7a51edca283b 100644 --- a/docs/source/setup/installation/include/src_verify_isaaclab.rst +++ b/docs/source/setup/installation/include/src_verify_isaaclab.rst @@ -65,14 +65,14 @@ We recommend adding ``--headless`` for faster training. .. code:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Ant-v0 --headless + ./isaaclab.sh train --library rsl_rl --task=Isaac-Ant-v0 --headless .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows .. code:: batch - isaaclab.bat -p scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Ant-v0 --headless + isaaclab.bat train --library rsl_rl --task=Isaac-Ant-v0 --headless ... Or a robot dog! @@ -84,14 +84,14 @@ We recommend adding ``--headless`` for faster training. .. code:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless + ./isaaclab.sh train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless .. tab-item:: :icon:`fa-brands fa-windows` Windows :sync: windows .. code:: batch - isaaclab.bat -p scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless + isaaclab.bat train --library rsl_rl --task=Isaac-Velocity-Rough-Anymal-C-v0 --headless Isaac Lab provides the tools you'll need to create your own **Tasks** and **Workflows** for whatever your project needs may be. Take a look at our :ref:`how-to` like :ref:`Adding your own learning Library ` or :ref:`Wrapping Environments ` for details. diff --git a/docs/source/setup/installation/kitless_installation.rst b/docs/source/setup/installation/kitless_installation.rst index 506a70eeb0bd..2a7365ff5258 100644 --- a/docs/source/setup/installation/kitless_installation.rst +++ b/docs/source/setup/installation/kitless_installation.rst @@ -23,7 +23,7 @@ With the virtual environment activated, clone the repository and run the kit-les ./isaaclab.sh --install # or ./isaaclab.sh -i # Kickoff training with MJWarp physics and Newton visualizer - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task=Isaac-Cartpole-Direct-v0 \ --num_envs=16 --max_iterations=10 \ presets=newton_mjwarp --visualizer newton diff --git a/docs/source/setup/quick_installation.rst b/docs/source/setup/quick_installation.rst index c377318c5c1d..0489429222b3 100644 --- a/docs/source/setup/quick_installation.rst +++ b/docs/source/setup/quick_installation.rst @@ -3,7 +3,8 @@ Quick Installation ======================= -``./isaaclab.sh -i`` installs everything needed to run with Newton Physics out of the box. +The fastest path from a fresh clone is to let ``uv`` create and sync the environment for +the command you are running: .. code-block:: bash @@ -14,17 +15,21 @@ Quick Installation git clone https://github.com/isaac-sim/IsaacLab.git cd IsaacLab - # Create environment and install - uv venv --python 3.12 --seed env_isaaclab - source env_isaaclab/bin/activate - ./isaaclab.sh -i + # Check the unified training entrypoint + uv run train --help - # Run training (MJWarp on the Newton backend, 16 envs) - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + # Run training with the default RSL-RL + Newton stack + uv run train --library rsl_rl \ --task=Isaac-Cartpole-Direct-v0 \ --num_envs=16 --max_iterations=10 \ presets=newton_mjwarp --visualizer newton +The default ``uv`` environment includes the RSL-RL, tasks, and Newton dependency +stacks. Add extras such as ``ov`` or ``rtx`` only when a workflow needs them. +Isaac Sim Kit workflows, including PhysX, should use the existing full installation +path. The ``./isaaclab.sh -i`` installer remains available for users who prefer an +explicit virtual environment setup. + Running Tasks ------------------- @@ -34,20 +39,20 @@ The ``presets=`` Hydra override selects the physics backend and renderer at runt .. code-block:: bash # MJWarp (Newton backend, Kit-less) - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=newton_mjwarp \ --visualizer newton - # PhysX (Kit — requires Isaac Sim) - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + # PhysX (Kit — requires Isaac Sim from the full installation path) + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=physx # MJWarp with a specific visualizer - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=newton_mjwarp \ diff --git a/docs/source/setup/quickstart.rst b/docs/source/setup/quickstart.rst index 7b4f7ace2b5e..f29cbc8eed9e 100644 --- a/docs/source/setup/quickstart.rst +++ b/docs/source/setup/quickstart.rst @@ -28,6 +28,28 @@ There are many ways to :ref:`install ` Isaac Lab. Fo **without Isaac Sim** (Newton backend only), see the :ref:`kitless-installation` section of the installation guide — just clone the repo and run ``./isaaclab.sh -i``. +If you are using ``uv`` from a source checkout, you can also let ``uv`` create and sync the +environment directly from the command you want to run: + +.. code-block:: bash + + git clone https://github.com/isaac-sim/IsaacLab.git + cd IsaacLab + + # Default environment, useful for checking entrypoints + uv run train --help + + # Newton backend training without Isaac Sim + uv run train --library rsl_rl \ + --task Isaac-Cartpole-Direct-v0 --headless presets=newton_mjwarp + + # Add OVRTX/OVPhysX extras only when the workflow needs them + uv run --extra ov --extra rtx train --library rsl_rl \ + --task Isaac-Cartpole-Direct-v0 --headless presets=newton_mjwarp + +The default ``uv`` environment includes the RSL-RL, tasks, and Newton dependency stacks. +Isaac Sim Kit workflows, including PhysX, should use the existing full installation path. + For the full pip-based installation (recommended for most users), we use **uv** as the preferred package manager. To begin, create a virtual environment: @@ -176,12 +198,13 @@ To try now, click the button below. To learn more about how to use this project, Launch Training ------------------- -The various backends of Isaac Lab are accessed through their corresponding ``train.py`` and ``play.py`` scripts located in the ``scripts/reinforcement_learning`` directory. -Invoking these scripts will require a **Task Name** and a corresponding **Entry Point** to the gymnasium API. For example +The various reinforcement learning libraries in Isaac Lab are accessed through the unified +``train`` and ``play`` commands. +Invoking these commands will require a **Task Name** and a corresponding **Entry Point** to the gymnasium API. For example .. code-block:: bash - python scripts/reinforcement_learning/skrl/train.py --task=Isaac-Ant-v0 + ./isaaclab.sh train --library skrl --task=Isaac-Ant-v0 This will train the mujoco ant to "run". You can see the various launch option available to you with the ``--help`` flag. Note specifically the ``--num_envs`` option and the ``--headless`` flag, both of which can be useful when trying to develop and debug a new environment. Options specified at this level automatically overwrite any configuration equivalent that may be defined in the code @@ -195,20 +218,20 @@ Use the ``presets=`` argument to select the physics backend at runtime: .. code-block:: bash # MJWarp (Newton backend, Kit-less) with Newton visualizer - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=newton_mjwarp \ --visualizer newton # PhysX (Kit) — requires Isaac Sim installed - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=physx # MJWarp with a specific visualizer - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ + ./isaaclab.sh train --library rsl_rl \ --task Isaac-Cartpole-Direct-v0 \ --num_envs 4096 \ presets=newton_mjwarp \ diff --git a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst index 351189183d40..3a8b8deaf3e1 100644 --- a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst +++ b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst @@ -43,7 +43,7 @@ we can finally run training! Let's see what happens! .. code-block:: bash - python scripts/reinforcement_learning/skrl/train.py --task=Template-Isaac-Lab-Tutorial-Direct-v0 + ./isaaclab.sh train --library skrl --task=Template-Isaac-Lab-Tutorial-Direct-v0 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_naive_webp.webp diff --git a/docs/source/tutorials/03_envs/configuring_rl_training.rst b/docs/source/tutorials/03_envs/configuring_rl_training.rst index 2eb2b0b5e763..bccab1b0b7f7 100644 --- a/docs/source/tutorials/03_envs/configuring_rl_training.rst +++ b/docs/source/tutorials/03_envs/configuring_rl_training.rst @@ -87,13 +87,13 @@ entry point. All the scripts in the ``scripts/reinforcement_learning`` directory are configured by default to read the ``_cfg_entry_point`` from the ``kwargs`` dictionary to retrieve the configuration instance. -For instance, the following code block shows how the ``train.py`` script reads the configuration +For instance, the following code block shows how the SB3 training logic reads the configuration instance for the Stable-Baselines3 library: .. dropdown:: Code for train.py with SB3 :icon: code - .. literalinclude:: ../../../../scripts/reinforcement_learning/sb3/train.py + .. literalinclude:: ../../../../scripts/reinforcement_learning/sb3/train_sb3.py :language: python :emphasize-lines: 26-28, 102-103 :linenos: @@ -113,7 +113,7 @@ we can use the ``--agent`` argument to specify the configuration instance to use .. code-block:: bash # standard PPO training - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --headless \ + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --headless \ --run_name ppo * Training with the PPO configuration with symmetry augmentation: @@ -121,12 +121,12 @@ we can use the ``--agent`` argument to specify the configuration instance to use .. code-block:: bash # PPO training with symmetry augmentation - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --headless \ + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --headless \ --agent rsl_rl_with_symmetry_cfg_entry_point \ --run_name ppo_with_symmetry_data_augmentation # you can use hydra to disable symmetry augmentation but enable mirror loss computation - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Cartpole-v0 --headless \ + ./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0 --headless \ --agent rsl_rl_with_symmetry_cfg_entry_point \ --run_name ppo_without_symmetry_data_augmentation \ agent.algorithm.symmetry_cfg.use_data_augmentation=false diff --git a/docs/source/tutorials/03_envs/create_direct_rl_env.rst b/docs/source/tutorials/03_envs/create_direct_rl_env.rst index 05ef7a336620..643b231d953c 100644 --- a/docs/source/tutorials/03_envs/create_direct_rl_env.rst +++ b/docs/source/tutorials/03_envs/create_direct_rl_env.rst @@ -205,7 +205,7 @@ To run training for the direct workflow Cartpole environment, we can use the fol .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-Direct-v0 + ./isaaclab.sh train --library rl_games --task=Isaac-Cartpole-Direct-v0 .. figure:: ../../_static/tutorials/tutorial_create_direct_workflow.jpg :align: center diff --git a/docs/source/tutorials/03_envs/modify_direct_rl_env.rst b/docs/source/tutorials/03_envs/modify_direct_rl_env.rst index a93fa0789cac..a3522c1da19c 100644 --- a/docs/source/tutorials/03_envs/modify_direct_rl_env.rst +++ b/docs/source/tutorials/03_envs/modify_direct_rl_env.rst @@ -111,7 +111,7 @@ After the minor modification has been done, and similar to the previous tutorial .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task Isaac-H1-Direct-v0 --headless + ./isaaclab.sh train --library rl_games --task Isaac-H1-Direct-v0 --headless When the training is finished, we can visualize the result with the following command. To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal @@ -119,7 +119,7 @@ where you started the simulation. .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/play.py --task Isaac-H1-Direct-v0 --num_envs 64 --viz kit + ./isaaclab.sh play --library rl_games --task Isaac-H1-Direct-v0 --num_envs 64 --viz kit .. figure:: ../../_static/tutorials/tutorial_modify_direct_rl_env.jpg :align: center diff --git a/docs/source/tutorials/03_envs/policy_inference_in_usd.rst b/docs/source/tutorials/03_envs/policy_inference_in_usd.rst index ddae4511ff30..33ca0b0f0c6e 100644 --- a/docs/source/tutorials/03_envs/policy_inference_in_usd.rst +++ b/docs/source/tutorials/03_envs/policy_inference_in_usd.rst @@ -42,7 +42,7 @@ First, we need to train the ``Isaac-Velocity-Rough-H1-v0`` task by running the f .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Isaac-Velocity-Rough-H1-v0 --headless + ./isaaclab.sh train --library rsl_rl --task Isaac-Velocity-Rough-H1-v0 --headless When the training is finished, we can visualize the result with the following command. To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal @@ -50,7 +50,7 @@ where you started the simulation. .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt --viz kit + ./isaaclab.sh play --library rsl_rl --task Isaac-Velocity-Rough-H1-v0 --num_envs 64 --checkpoint logs/rsl_rl/h1_rough/EXPERIMENT_NAME/POLICY_FILE.pt --viz kit After running the play script, the policy will be exported to jit and onnx files under the experiment logs directory. diff --git a/docs/source/tutorials/03_envs/run_rl_training.rst b/docs/source/tutorials/03_envs/run_rl_training.rst index f6d21b64e7a4..ac2ce8d8a1b9 100644 --- a/docs/source/tutorials/03_envs/run_rl_training.rst +++ b/docs/source/tutorials/03_envs/run_rl_training.rst @@ -40,7 +40,7 @@ For this tutorial, we use the training script from `Stable-Baselines3`_ workflow .. dropdown:: Code for train.py :icon: code - .. literalinclude:: ../../../../scripts/reinforcement_learning/sb3/train.py + .. literalinclude:: ../../../../scripts/reinforcement_learning/sb3/train_sb3.py :language: python :emphasize-lines: 57, 66, 68-70, 81, 90-98, 100, 105-113, 115-116, 121-126, 133-136 :linenos: @@ -88,7 +88,7 @@ Rendering can still be active for sensor/camera data capture when enabled by the .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 + ./isaaclab.sh train --library sb3 --task Isaac-Cartpole-v0 --num_envs 64 Headless execution with off-screen render @@ -100,7 +100,7 @@ in the workflow and pass ``--video`` to record the agent behavior. .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --video + ./isaaclab.sh train --library sb3 --task Isaac-Cartpole-v0 --num_envs 64 --video The videos are saved to the ``logs/sb3/Isaac-Cartpole-v0//videos/train`` directory. You can open these videos using any video player. @@ -116,7 +116,7 @@ training script as follows: .. code-block:: bash - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --viz kit + ./isaaclab.sh train --library sb3 --task Isaac-Cartpole-v0 --num_envs 64 --viz kit This will open the Kit visualizer window and you can see the agent training in the environment. However, this can slow down the training process because interactive visual feedback is enabled. As a workaround, you @@ -142,7 +142,7 @@ Once the training is complete, you can visualize the trained agent by executing .. code:: bash # execute from the root directory of the repository - ./isaaclab.sh -p scripts/reinforcement_learning/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32 --use_last_checkpoint --viz kit + ./isaaclab.sh play --library sb3 --task Isaac-Cartpole-v0 --num_envs 32 --use_last_checkpoint --viz kit The above command will load the latest checkpoint from the ``logs/sb3/Isaac-Cartpole-v0`` directory. You can also specify a specific checkpoint by passing the ``--checkpoint`` flag. diff --git a/pyproject.toml b/pyproject.toml index 513ce684d93f..dd324e22ef97 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -3,6 +3,69 @@ # # SPDX-License-Identifier: BSD-3-Clause +[project] +name = "isaaclab-dev" +version = "0.1.0" +description = "Isaac Lab source checkout development environment." +requires-python = ">=3.12" +dependencies = [ + "isaaclab", + "isaaclab-assets", + "isaaclab-contrib", + "isaaclab-newton[all]", + "isaaclab-ov", + "isaaclab-ovphysx", + "isaaclab-physx[newton]", + "isaaclab-rl[rsl-rl]", + "isaaclab-tasks", + "torch==2.10.0", + "torchaudio==2.10.0", + "torchvision==0.25.0", +] + +[project.optional-dependencies] +assets = [ + "isaaclab-assets", +] +contrib = [ + "isaaclab-contrib", +] +mimic = [ + "isaaclab-mimic", +] +newton = [ + "isaaclab-newton[all]", + "isaaclab-physx[newton]", +] +ov = [ + "isaaclab-ovphysx[ovphysx]", +] +physx = [ + "isaaclab-physx", +] +rl = [ + "isaaclab-rl[rsl-rl]", +] +rl-all = [ + "isaaclab-rl[all]", +] +rtx = [ + "isaaclab-ov[ovrtx]", +] +tasks = [ + "isaaclab-assets", + "isaaclab-contrib", + "isaaclab-ov", + "isaaclab-ovphysx", + "isaaclab-tasks", +] +tasks-experimental = [ + "isaaclab-tasks-experimental", +] +visualizers = [ + "isaaclab-visualizers[all]", +] + [tool.ruff] line-length = 120 target-version = "py310" @@ -147,14 +210,61 @@ markers = [ url = "https://pypi.nvidia.com" explicit = false -# Some Isaac Sim dependencies (e.g. mujoco-usd-converter, tinyobjloader) have -# mismatched versions across pypi.nvidia.com and PyPI. unsafe-best-match lets uv -# resolve the correct version from any index, and prerelease=allow covers packages -# that only publish pre-release wheels (e.g. tinyobjloader==2.0.0rc13). +[[tool.uv.index]] +name = "pytorch-cu128" +url = "https://download.pytorch.org/whl/cu128" +explicit = true + +[[tool.uv.index]] +name = "pytorch-cu130" +url = "https://download.pytorch.org/whl/cu130" +explicit = true + +# Some NVIDIA-hosted dependencies have mismatched versions across pypi.nvidia.com +# and PyPI. unsafe-best-match lets uv resolve the correct version from any index, +# and prerelease=allow covers packages that only publish pre-release wheels. [tool.uv] index-strategy = "unsafe-best-match" prerelease = "allow" override-dependencies = ["numpy>=2"] +package = false + +[tool.uv.sources] +isaaclab = { path = "source/isaaclab", editable = true } +"isaaclab-assets" = { path = "source/isaaclab_assets", editable = true } +"isaaclab-contrib" = { path = "source/isaaclab_contrib", editable = true } +"isaaclab-experimental" = { path = "source/isaaclab_experimental", editable = true } +"isaaclab-mimic" = { path = "source/isaaclab_mimic", editable = true } +"isaaclab-newton" = { path = "source/isaaclab_newton", editable = true } +"isaaclab-ov" = { path = "source/isaaclab_ov", editable = true } +"isaaclab-ovphysx" = { path = "source/isaaclab_ovphysx", editable = true } +"isaaclab-physx" = { path = "source/isaaclab_physx", editable = true } +"isaaclab-rl" = { path = "source/isaaclab_rl", editable = true } +"isaaclab-tasks" = { path = "source/isaaclab_tasks", editable = true } +"isaaclab-tasks-experimental" = { path = "source/isaaclab_tasks_experimental", editable = true } +"isaaclab-teleop" = { path = "source/isaaclab_teleop", editable = true } +"isaaclab-visualizers" = { path = "source/isaaclab_visualizers", editable = true } +torch = [ + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'x86_64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'AMD64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'win32'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'aarch64'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'arm64'" }, +] +torchaudio = [ + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'x86_64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'AMD64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'win32'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'aarch64'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'arm64'" }, +] +torchvision = [ + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'x86_64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'linux' and platform_machine == 'AMD64'" }, + { index = "pytorch-cu128", marker = "sys_platform == 'win32'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'aarch64'" }, + { index = "pytorch-cu130", marker = "sys_platform == 'linux' and platform_machine == 'arm64'" }, +] [tool.uv.pip] index-strategy = "unsafe-best-match" diff --git a/scripts/reinforcement_learning/common.py b/scripts/reinforcement_learning/common.py new file mode 100644 index 000000000000..280ca22dea90 --- /dev/null +++ b/scripts/reinforcement_learning/common.py @@ -0,0 +1,281 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Common utilities for reinforcement learning entrypoints.""" + +from __future__ import annotations + +import argparse +import importlib.util +import logging +import os +import runpy +import sys +from pathlib import Path +from types import ModuleType +from typing import Any + +import gymnasium as gym + +from isaaclab.envs import DirectMARLEnvCfg, ManagerBasedRLEnvCfg +from isaaclab.utils.dict import print_dict +from isaaclab.utils.io import dump_yaml + +from isaaclab_tasks.utils import add_launcher_args + + +def dispatch_library_entrypoint( + argv: list[str] | None, + entrypoints: dict[str, Path], + *, + action: str, + description: str, + library_help: str, + run_as_script: bool = False, +) -> int: + """Dispatch a unified entrypoint to a library-specific implementation. + + Args: + argv: Command-line arguments, excluding the script path. + entrypoints: Mapping from library name to implementation path. + action: Action name used to create a unique module name. + description: Top-level parser description. + library_help: Help text for the ``--library`` argument. + run_as_script: Whether to execute the selected implementation as a script. + + Returns: + Process exit code. + """ + if argv is None: + argv = sys.argv[1:] + + parser = argparse.ArgumentParser(add_help=False) + parser.add_argument("--library", choices=sorted(entrypoints)) + args_cli, library_args = parser.parse_known_args(argv) + + if args_cli.library is None: + help_parser = argparse.ArgumentParser(description=description) + help_parser.add_argument("--library", choices=sorted(entrypoints), required=True, help=library_help) + help_parser.add_argument("args", nargs=argparse.REMAINDER, help="Arguments forwarded to the selected library.") + help_parser.print_help() + return 0 if "-h" in argv or "--help" in argv else 2 + + module_path = entrypoints[args_cli.library] + if run_as_script: + original_argv = sys.argv + original_path = list(sys.path) + try: + sys.argv = [str(module_path)] + library_args + sys.path.insert(0, str(module_path.parent)) + runpy.run_path(str(module_path), run_name="__main__") + finally: + sys.argv = original_argv + sys.path[:] = original_path + return 0 + + module = import_local_module(f"isaaclab_rl_{action}_{args_cli.library}", module_path) + module.run(library_args) + return 0 + + +def add_common_train_args( + parser: argparse.ArgumentParser, + *, + agent_default: str | None, + agent_help: str, + include_agent: bool = True, + include_distributed: bool = True, +) -> None: + """Add common Isaac Lab reinforcement learning training arguments. + + Args: + parser: The parser to add arguments to. + agent_default: Default agent config entry point. + agent_help: Help text for the ``--agent`` argument. + include_agent: Whether to include the ``--agent`` argument. + include_distributed: Whether to include the ``--distributed`` argument. + """ + parser.add_argument("--video", action="store_true", default=False, help="Record videos during training.") + parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).") + parser.add_argument( + "--video_interval", type=int, default=2000, help="Interval between video recordings (in steps)." + ) + parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") + parser.add_argument("--task", type=str, default=None, help="Name of the task.") + if include_agent: + parser.add_argument("--agent", type=str, default=agent_default, help=agent_help) + parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") + if include_distributed: + parser.add_argument( + "--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes." + ) + parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.") + parser.add_argument("--export_io_descriptors", action="store_true", default=False, help="Export IO descriptors.") + parser.add_argument( + "--ray-proc-id", + "-rid", + type=int, + default=None, + help="Automatically configured by Ray integration, otherwise None.", + ) + + +def add_isaaclab_launcher_args(parser: argparse.ArgumentParser) -> None: + """Add Isaac Lab simulation launcher arguments to a parser. + + Args: + parser: The parser to add arguments to. + """ + add_launcher_args(parser) + + +def enable_cameras_for_video(args_cli: argparse.Namespace) -> None: + """Enable camera rendering when video recording is requested. + + Args: + args_cli: Parsed command-line arguments. + """ + if getattr(args_cli, "video", False): + args_cli.enable_cameras = True + + +def set_hydra_args(hydra_args: list[str]) -> None: + """Replace ``sys.argv`` with arguments intended for Hydra. + + Args: + hydra_args: Remaining command-line arguments not consumed by argparse. + """ + sys.argv = [sys.argv[0]] + hydra_args + + +def import_local_module(module_name: str, module_path: Path) -> ModuleType: + """Import a module from an explicit file path. + + Args: + module_name: Unique module name to use in ``sys.modules``. + module_path: Path to the Python file to import. + + Returns: + The imported module. + """ + spec = importlib.util.spec_from_file_location(module_name, module_path) + if spec is None or spec.loader is None: + raise ImportError(f"Could not load module {module_name!r} from {module_path}") + module = importlib.util.module_from_spec(spec) + sys.modules[module_name] = module + spec.loader.exec_module(module) + return module + + +def apply_env_overrides(args_cli: argparse.Namespace, env_cfg: Any, *, apply_device: bool = True) -> None: + """Apply common environment overrides from command-line arguments. + + Args: + args_cli: Parsed command-line arguments. + env_cfg: Isaac Lab environment config. + apply_device: Whether to apply the ``--device`` override for non-distributed runs. + """ + if getattr(args_cli, "num_envs", None) is not None: + env_cfg.scene.num_envs = args_cli.num_envs + + if apply_device and not getattr(args_cli, "distributed", False): + device = getattr(args_cli, "device", None) + env_cfg.sim.device = device if device is not None else env_cfg.sim.device + + +def validate_distributed_device(args_cli: argparse.Namespace) -> None: + """Reject unsupported CPU distributed training configuration. + + Args: + args_cli: Parsed command-line arguments. + + Raises: + ValueError: If distributed training is requested with a CPU device. + """ + device = getattr(args_cli, "device", None) + if getattr(args_cli, "distributed", False) and device is not None and "cpu" in device: + raise ValueError( + "Distributed training is not supported when using CPU device. " + "Please use GPU device (e.g., --device cuda) for distributed training." + ) + + +def configure_io_descriptors(env_cfg: Any, args_cli: argparse.Namespace, logger: logging.Logger) -> None: + """Configure IO descriptor export on supported environment configs. + + Args: + env_cfg: Isaac Lab environment config. + args_cli: Parsed command-line arguments. + logger: Logger used for unsupported environment warnings. + """ + if isinstance(env_cfg, ManagerBasedRLEnvCfg): + env_cfg.export_io_descriptors = args_cli.export_io_descriptors + else: + logger.warning( + "IO descriptors are only supported for manager based RL environments. No IO descriptors will be exported." + ) + + +def create_isaaclab_env( + task: str, + env_cfg: Any, + args_cli: argparse.Namespace, + *, + convert_marl_to_single_agent: bool, +): + """Create the Isaac Lab Gymnasium environment. + + Args: + task: Task name to instantiate. + env_cfg: Isaac Lab environment config. + args_cli: Parsed command-line arguments. + convert_marl_to_single_agent: Whether to convert direct MARL environments to single-agent environments. + + Returns: + The created Gymnasium environment. + """ + env = gym.make(task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + if convert_marl_to_single_agent and isinstance(env.unwrapped.cfg, DirectMARLEnvCfg): + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + return env + + +def wrap_record_video(env, log_dir: str, args_cli: argparse.Namespace): + """Wrap an environment with video recording when requested. + + Args: + env: Gymnasium environment to wrap. + log_dir: Training log directory. + args_cli: Parsed command-line arguments. + + Returns: + The original or video-wrapped environment. + """ + if not args_cli.video: + return env + + video_kwargs = { + "video_folder": os.path.join(log_dir, "videos", "train"), + "step_trigger": lambda step: step % args_cli.video_interval == 0, + "video_length": args_cli.video_length, + "disable_logger": True, + } + print("[INFO] Recording videos during training.") + print_dict(video_kwargs, nesting=4) + return gym.wrappers.RecordVideo(env, **video_kwargs) + + +def dump_train_configs(log_dir: str, env_cfg: Any, agent_cfg: Any) -> None: + """Dump training configuration files under a run log directory. + + Args: + log_dir: Training log directory. + env_cfg: Isaac Lab environment config. + agent_cfg: Reinforcement learning agent config. + """ + dump_yaml(os.path.join(log_dir, "params", "env.yaml"), env_cfg) + dump_yaml(os.path.join(log_dir, "params", "agent.yaml"), agent_cfg) diff --git a/scripts/reinforcement_learning/play.py b/scripts/reinforcement_learning/play.py new file mode 100644 index 000000000000..1a12ff61d9dc --- /dev/null +++ b/scripts/reinforcement_learning/play.py @@ -0,0 +1,38 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Unified play entrypoint for Isaac Lab reinforcement learning workflows.""" + +from __future__ import annotations + +from pathlib import Path + +from common import dispatch_library_entrypoint + +SCRIPT_DIR = Path(__file__).resolve().parent + +LIBRARY_ENTRYPOINTS = { + "rl_games": SCRIPT_DIR / "rl_games" / "play_rl_games.py", + "rlinf": SCRIPT_DIR / "rlinf" / "play_rlinf.py", + "rsl_rl": SCRIPT_DIR / "rsl_rl" / "play_rsl_rl.py", + "sb3": SCRIPT_DIR / "sb3" / "play_sb3.py", + "skrl": SCRIPT_DIR / "skrl" / "play_skrl.py", +} + + +def main(argv: list[str] | None = None) -> int: + """Run the selected reinforcement learning play library.""" + return dispatch_library_entrypoint( + argv, + LIBRARY_ENTRYPOINTS, + action="play", + description="Play an RL agent with a selected reinforcement learning library.", + library_help="Training library used by the checkpoint.", + run_as_script=True, + ) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/reinforcement_learning/ray/task_runner.py b/scripts/reinforcement_learning/ray/task_runner.py index 7bb3596fc918..4183d039651e 100644 --- a/scripts/reinforcement_learning/ray/task_runner.py +++ b/scripts/reinforcement_learning/ray/task_runner.py @@ -67,7 +67,7 @@ concurrent: false tasks: - name: "Isaac-Cartpole-v0" - py_args: "-m torch.distributed.run --nnodes=1 --nproc_per_node=2 --rdzv_endpoint=localhost:29501 /workspace/isaaclab/scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --max_iterations 200 --headless --distributed" + py_args: "-m torch.distributed.run --nnodes=1 --nproc_per_node=2 --rdzv_endpoint=localhost:29501 /workspace/isaaclab/scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --max_iterations 200 --headless --distributed" num_gpus: 2 num_cpus: 10 memory: 10737418240 @@ -87,7 +87,7 @@ concurrent: true tasks: - name: "Isaac-Cartpole-v0-multi-node-train-1" - py_args: "-m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 /workspace/isaaclab/scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed --max_iterations 1000" + py_args: "-m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 /workspace/isaaclab/scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --headless --distributed --max_iterations 1000" num_gpus: 1 num_cpus: 10 memory: 10*1024*1024*1024 @@ -95,7 +95,7 @@ specific: "hostname" hostname: "xxx" - name: "Isaac-Cartpole-v0-multi-node-train-2" - py_args: "-m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=x.x.x.x:5555 /workspace/isaaclab/scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed --max_iterations 1000" + py_args: "-m torch.distributed.run --nproc_per_node=1 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=x.x.x.x:5555 /workspace/isaaclab/scripts/reinforcement_learning/train.py --library rsl_rl --task=Isaac-Cartpole-v0 --headless --distributed --max_iterations 1000" num_gpus: 1 num_cpus: 10 memory: 10*1024*1024*1024 diff --git a/scripts/reinforcement_learning/ray/tuner.py b/scripts/reinforcement_learning/ray/tuner.py index 99dc7e8d08f5..01176d703912 100644 --- a/scripts/reinforcement_learning/ray/tuner.py +++ b/scripts/reinforcement_learning/ray/tuner.py @@ -65,7 +65,7 @@ DOCKER_PREFIX = "/workspace/isaaclab/" BASE_DIR = os.path.expanduser("~") PYTHON_EXEC = "./isaaclab.sh -p" -WORKFLOW = "scripts/reinforcement_learning/rl_games/train.py" +WORKFLOW = "scripts/reinforcement_learning/train.py --library rl_games" NUM_WORKERS_PER_NODE = 1 # needed for local parallelism PROCESS_RESPONSE_TIMEOUT = 200.0 # seconds to wait before killing the process when it stops responding MAX_LINES_TO_SEARCH_EXPERIMENT_LOGS = 1000 # maximum number of lines to read from the training process logs diff --git a/scripts/reinforcement_learning/ray/util.py b/scripts/reinforcement_learning/ray/util.py index a73ebdf493dc..64ce70553679 100644 --- a/scripts/reinforcement_learning/ray/util.py +++ b/scripts/reinforcement_learning/ray/util.py @@ -60,7 +60,7 @@ def get_latest_scalars(path: str) -> dict: def get_invocation_command_from_cfg( cfg: dict, python_cmd: str = "/workspace/isaaclab/isaaclab.sh -p", - workflow: str = "scripts/reinforcement_learning/rl_games/train.py", + workflow: str = "scripts/reinforcement_learning/train.py --library rl_games", ) -> str: """Generate command with proper Hydra arguments""" runner_args = [] diff --git a/scripts/reinforcement_learning/rl_games/play.py b/scripts/reinforcement_learning/rl_games/play.py index dd61d80da530..3c240b6b7f54 100644 --- a/scripts/reinforcement_learning/rl_games/play.py +++ b/scripts/reinforcement_learning/rl_games/play.py @@ -5,6 +5,16 @@ """Script to play a checkpoint if an RL agent from RL-Games.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rl_games/play.py is deprecated. Use " + "`./isaaclab.sh play --library rl_games --task ` instead. " + "Example: `./isaaclab.sh play --library rl_games --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import math diff --git a/scripts/reinforcement_learning/rl_games/play_rl_games.py b/scripts/reinforcement_learning/rl_games/play_rl_games.py new file mode 100644 index 000000000000..c1b42d4dca0e --- /dev/null +++ b/scripts/reinforcement_learning/rl_games/play_rl_games.py @@ -0,0 +1,203 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Script to play a checkpoint of an RL agent from RL-Games.""" + +import argparse +import contextlib +import math +import os +import random +import sys +import time + +import gymnasium as gym +import torch +from rl_games.common import env_configurations, vecenv +from rl_games.common.player import BasePlayer +from rl_games.torch_runner import Runner + +from isaaclab.envs import DirectMARLEnvCfg +from isaaclab.utils.assets import retrieve_file_path +from isaaclab.utils.dict import print_dict + +from isaaclab_rl.rl_games import RlGamesGpuEnv, RlGamesVecEnvWrapper +from isaaclab_rl.utils.pretrained_checkpoint import get_published_pretrained_checkpoint + +import isaaclab_tasks # noqa: F401 +from isaaclab_tasks.utils import add_launcher_args, get_checkpoint_path, launch_simulation, resolve_task_config + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + +# -- argparse ---------------------------------------------------------------- +parser = argparse.ArgumentParser(description="Play a checkpoint of an RL agent from RL-Games.") +parser.add_argument("--video", action="store_true", default=False, help="Record videos during play.") +parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).") +parser.add_argument( + "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations." +) +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", type=str, default="rl_games_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--checkpoint", type=str, default=None, help="Path to model checkpoint.") +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument( + "--use_pretrained_checkpoint", + action="store_true", + help="Use the pre-trained checkpoint from Nucleus.", +) +parser.add_argument( + "--use_last_checkpoint", + action="store_true", + help="When no checkpoint provided, use the last saved model. Otherwise use the best saved model.", +) +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +add_launcher_args(parser) +args_cli, hydra_args = parser.parse_known_args() + +if args_cli.video: + args_cli.enable_cameras = True + +sys.argv = [sys.argv[0]] + hydra_args + + +def main(): + """Play with RL-Games agent.""" + env_cfg, agent_cfg = resolve_task_config(args_cli.task, args_cli.agent) + with launch_simulation(env_cfg, args_cli): + # grab task name for checkpoint path + task_name = args_cli.task.split(":")[-1] + train_task_name = task_name.replace("-Play", "") + + # override configurations with non-hydra CLI arguments + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # randomly sample a seed if seed = -1 + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + agent_cfg["params"]["seed"] = args_cli.seed if args_cli.seed is not None else agent_cfg["params"]["seed"] + env_cfg.seed = agent_cfg["params"]["seed"] + + # specify directory for logging experiments + log_root_path = os.path.join("logs", "rl_games", agent_cfg["params"]["config"]["name"]) + log_root_path = os.path.abspath(log_root_path) + print(f"[INFO] Loading experiment from directory: {log_root_path}") + # find checkpoint + if args_cli.use_pretrained_checkpoint: + resume_path = get_published_pretrained_checkpoint("rl_games", train_task_name) + if not resume_path: + print("[INFO] Unfortunately a pre-trained checkpoint is currently unavailable for this task.") + return + elif args_cli.checkpoint is None: + run_dir = agent_cfg["params"]["config"].get("full_experiment_name", ".*") + if args_cli.use_last_checkpoint: + checkpoint_file = ".*" + else: + checkpoint_file = f"{agent_cfg['params']['config']['name']}.pth" + resume_path = get_checkpoint_path(log_root_path, run_dir, checkpoint_file, other_dirs=["nn"]) + else: + resume_path = retrieve_file_path(args_cli.checkpoint) + log_dir = os.path.dirname(os.path.dirname(resume_path)) + + # set the log directory for the environment + env_cfg.log_dir = log_dir + + # wrap around environment for rl-games + rl_device = agent_cfg["params"]["config"]["device"] + clip_obs = agent_cfg["params"]["env"].get("clip_observations", math.inf) + clip_actions = agent_cfg["params"]["env"].get("clip_actions", math.inf) + obs_groups = agent_cfg["params"]["env"].get("obs_groups") + concate_obs_groups = agent_cfg["params"]["env"].get("concate_obs_groups", True) + + # create isaac environment + env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + + # convert to single-agent instance if required by the RL algorithm + if isinstance(env.unwrapped.cfg, DirectMARLEnvCfg): + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + + # wrap for video recording + if args_cli.video: + video_kwargs = { + "video_folder": os.path.join(log_root_path, log_dir, "videos", "play"), + "step_trigger": lambda step: step == 0, + "video_length": args_cli.video_length, + "disable_logger": True, + } + print("[INFO] Recording videos during play.") + print_dict(video_kwargs, nesting=4) + env = gym.wrappers.RecordVideo(env, **video_kwargs) + + # wrap around environment for rl-games + env = RlGamesVecEnvWrapper(env, rl_device, clip_obs, clip_actions, obs_groups, concate_obs_groups) + + # register the environment to rl-games registry + vecenv.register( + "IsaacRlgWrapper", + lambda config_name, num_actors, **kwargs: RlGamesGpuEnv(config_name, num_actors, **kwargs), + ) + env_configurations.register("rlgpu", {"vecenv_type": "IsaacRlgWrapper", "env_creator": lambda **kwargs: env}) + + # load previously trained model + agent_cfg["params"]["load_checkpoint"] = True + agent_cfg["params"]["load_path"] = resume_path + print(f"[INFO]: Loading model checkpoint from: {agent_cfg['params']['load_path']}") + + # set number of actors into agent config + agent_cfg["params"]["config"]["num_actors"] = env.unwrapped.num_envs + runner = Runner() + runner.load(agent_cfg) + agent: BasePlayer = runner.create_player() + agent.restore(resume_path) + agent.reset() + + dt = env.unwrapped.step_dt + + # reset environment + obs = env.reset() + if isinstance(obs, dict): + obs = obs["obs"] + timestep = 0 + _ = agent.get_batch_size(obs, 1) + if agent.is_rnn: + agent.init_rnn() + # simulate environment + try: + while True: + start_time = time.time() + with torch.inference_mode(): + obs = agent.obs_to_torch(obs) + actions = agent.get_action(obs, is_deterministic=agent.is_deterministic) + obs, _, dones, _ = env.step(actions) + + if len(dones) > 0: + if agent.is_rnn and agent.states is not None: + for s in agent.states: + s[:, dones, :] = 0.0 + if args_cli.video: + timestep += 1 + if timestep == args_cli.video_length: + break + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + # close the simulator + env.close() + except KeyboardInterrupt: + pass + + +if __name__ == "__main__": + main() diff --git a/scripts/reinforcement_learning/rl_games/train.py b/scripts/reinforcement_learning/rl_games/train.py index 697ca06660a3..89aa4f28ae8a 100644 --- a/scripts/reinforcement_learning/rl_games/train.py +++ b/scripts/reinforcement_learning/rl_games/train.py @@ -5,6 +5,16 @@ """Script to train RL agent with RL-Games.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rl_games/train.py is deprecated. Use " + "`./isaaclab.sh train --library rl_games --task ` instead. " + "Example: `./isaaclab.sh train --library rl_games --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import logging diff --git a/scripts/reinforcement_learning/rl_games/train_rl_games.py b/scripts/reinforcement_learning/rl_games/train_rl_games.py new file mode 100644 index 000000000000..0cf210c27b2e --- /dev/null +++ b/scripts/reinforcement_learning/rl_games/train_rl_games.py @@ -0,0 +1,197 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""RL-Games training logic for the unified reinforcement learning entrypoint.""" + +from __future__ import annotations + +import argparse +import contextlib +import logging +import math +import os +import random +import time +from datetime import datetime +from distutils.util import strtobool + +from common import ( + add_common_train_args, + add_isaaclab_launcher_args, + apply_env_overrides, + configure_io_descriptors, + create_isaaclab_env, + dump_train_configs, + enable_cameras_for_video, + set_hydra_args, + validate_distributed_device, + wrap_record_video, +) + +import isaaclab_tasks # noqa: F401 + +logger = logging.getLogger(__name__) + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + + +def _parse_args(argv: list[str]) -> argparse.Namespace: + """Parse RL-Games training arguments.""" + parser = argparse.ArgumentParser(description="Train an RL agent with RL-Games.") + add_common_train_args( + parser, + agent_default="rl_games_cfg_entry_point", + agent_help="Name of the RL agent configuration entry point.", + ) + parser.add_argument("--checkpoint", type=str, default=None, help="Path to model checkpoint.") + parser.add_argument("--sigma", type=str, default=None, help="The policy's initial standard deviation.") + parser.add_argument("--wandb-project-name", type=str, default=None, help="the wandb's project name") + parser.add_argument("--wandb-entity", type=str, default=None, help="the entity (team) of wandb's project") + parser.add_argument("--wandb-name", type=str, default=None, help="the name of wandb's run") + parser.add_argument( + "--track", + type=lambda x: bool(strtobool(x)), + default=False, + nargs="?", + const=True, + help="if toggled, this experiment will be tracked with Weights and Biases", + ) + add_isaaclab_launcher_args(parser) + args_cli, hydra_args = parser.parse_known_args(argv) + enable_cameras_for_video(args_cli) + set_hydra_args(hydra_args) + return args_cli + + +def run(argv: list[str]) -> None: + """Train an RL-Games agent.""" + from rl_games.common import env_configurations, vecenv + from rl_games.common.algo_observer import IsaacAlgoObserver + from rl_games.torch_runner import Runner + + from isaaclab.envs import DirectMARLEnvCfg + from isaaclab.utils.assets import retrieve_file_path + + from isaaclab_rl.rl_games import MultiObserver, PbtAlgoObserver, RlGamesGpuEnv, RlGamesVecEnvWrapper + + from isaaclab_tasks.utils import launch_simulation, resolve_task_config + + args_cli = _parse_args(argv) + env_cfg, agent_cfg = resolve_task_config(args_cli.task, args_cli.agent) + + with launch_simulation(env_cfg, args_cli): + apply_env_overrides(args_cli, env_cfg) + validate_distributed_device(args_cli) + + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + agent_cfg["params"]["seed"] = args_cli.seed if args_cli.seed is not None else agent_cfg["params"]["seed"] + agent_cfg["params"]["config"]["max_epochs"] = ( + args_cli.max_iterations + if args_cli.max_iterations is not None + else agent_cfg["params"]["config"]["max_epochs"] + ) + if args_cli.checkpoint is not None: + resume_path = retrieve_file_path(args_cli.checkpoint) + agent_cfg["params"]["load_checkpoint"] = True + agent_cfg["params"]["load_path"] = resume_path + print(f"[INFO]: Loading model checkpoint from: {agent_cfg['params']['load_path']}") + train_sigma = float(args_cli.sigma) if args_cli.sigma is not None else None + + if args_cli.distributed: + agent_cfg["params"]["seed"] += int(os.getenv("RANK", "0")) + agent_cfg["params"]["config"]["device"] = env_cfg.sim.device + agent_cfg["params"]["config"]["device_name"] = env_cfg.sim.device + agent_cfg["params"]["config"]["multi_gpu"] = True + + env_cfg.seed = agent_cfg["params"]["seed"] + + config_name = agent_cfg["params"]["config"]["name"] + log_root_path = os.path.join("logs", "rl_games", config_name) + if "pbt" in agent_cfg and agent_cfg["pbt"]["directory"] != ".": + log_root_path = os.path.join(agent_cfg["pbt"]["directory"], log_root_path) + else: + log_root_path = os.path.abspath(log_root_path) + + print(f"[INFO] Logging experiment in directory: {log_root_path}") + log_dir = agent_cfg["params"]["config"].get( + "full_experiment_name", datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + ) + agent_cfg["params"]["config"]["train_dir"] = log_root_path + agent_cfg["params"]["config"]["full_experiment_name"] = log_dir + wandb_project = config_name if args_cli.wandb_project_name is None else args_cli.wandb_project_name + experiment_name = log_dir if args_cli.wandb_name is None else args_cli.wandb_name + + run_log_dir = os.path.join(log_root_path, log_dir) + dump_train_configs(run_log_dir, env_cfg, agent_cfg) + print(f"Exact experiment name requested from command line: {run_log_dir}") + + rl_device = agent_cfg["params"]["config"]["device"] + clip_obs = agent_cfg["params"]["env"].get("clip_observations", math.inf) + clip_actions = agent_cfg["params"]["env"].get("clip_actions", math.inf) + obs_groups = agent_cfg["params"]["env"].get("obs_groups") + concate_obs_groups = agent_cfg["params"]["env"].get("concate_obs_groups", True) + + configure_io_descriptors(env_cfg, args_cli, logger) + env_cfg.log_dir = run_log_dir + + env = create_isaaclab_env( + args_cli.task, + env_cfg, + args_cli, + convert_marl_to_single_agent=isinstance(env_cfg, DirectMARLEnvCfg), + ) + env = wrap_record_video(env, run_log_dir, args_cli) + + start_time = time.time() + env = RlGamesVecEnvWrapper(env, rl_device, clip_obs, clip_actions, obs_groups, concate_obs_groups) + + vecenv.register( + "IsaacRlgWrapper", + lambda config_name, num_actors, **kwargs: RlGamesGpuEnv(config_name, num_actors, **kwargs), + ) + env_configurations.register("rlgpu", {"vecenv_type": "IsaacRlgWrapper", "env_creator": lambda **kwargs: env}) + + agent_cfg["params"]["config"]["num_actors"] = env.unwrapped.num_envs + + if "pbt" in agent_cfg and agent_cfg["pbt"]["enabled"]: + observers = MultiObserver([IsaacAlgoObserver(), PbtAlgoObserver(agent_cfg, args_cli)]) + runner = Runner(observers) + else: + runner = Runner(IsaacAlgoObserver()) + + runner.load(agent_cfg) + runner.reset() + + global_rank = int(os.getenv("RANK", "0")) + if args_cli.track and global_rank == 0: + if args_cli.wandb_entity is None: + raise ValueError("Weights and Biases entity must be specified for tracking.") + import wandb + + wandb.init( + project=wandb_project, + entity=args_cli.wandb_entity, + name=experiment_name, + sync_tensorboard=True, + monitor_gym=True, + save_code=True, + ) + if not wandb.run.resumed: + wandb.config.update({"env_cfg": env_cfg.to_dict()}) + wandb.config.update({"agent_cfg": agent_cfg}) + + try: + if args_cli.checkpoint is not None: + runner.run({"train": True, "play": False, "sigma": train_sigma, "checkpoint": resume_path}) + else: + runner.run({"train": True, "play": False, "sigma": train_sigma}) + print(f"Training time: {round(time.time() - start_time, 2)} seconds") + env.close() + except KeyboardInterrupt: + pass diff --git a/scripts/reinforcement_learning/rlinf/README.md b/scripts/reinforcement_learning/rlinf/README.md index 4ca96ba4fd2c..c975998526ea 100644 --- a/scripts/reinforcement_learning/rlinf/README.md +++ b/scripts/reinforcement_learning/rlinf/README.md @@ -75,32 +75,32 @@ source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/ ```bash # Basic training (uses default config) -python train.py +./isaaclab.sh train --library rlinf # Training with a specific config -python train.py --config_name isaaclab_ppo_gr00t_assemble_trocar +./isaaclab.sh train --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar # Training with task override -python train.py --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 +./isaaclab.sh train --library rlinf --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 # Training with custom settings -python train.py --num_envs 64 --max_epochs 1000 +./isaaclab.sh train --library rlinf --num_envs 64 --max_epochs 1000 # List available tasks -python train.py --list_tasks +./isaaclab.sh train --library rlinf --list_tasks ``` ### Evaluation ```bash # Evaluate a trained checkpoint -python play.py --model_path /path/to/checkpoint +./isaaclab.sh play --library rlinf --model_path /path/to/checkpoint # Evaluate with video recording -python play.py --model_path /path/to/checkpoint --video +./isaaclab.sh play --library rlinf --model_path /path/to/checkpoint --video # Evaluate with specific number of environments -python play.py --model_path /path/to/checkpoint --num_envs 8 +./isaaclab.sh play --library rlinf --model_path /path/to/checkpoint --num_envs 8 ``` ## Configuration @@ -282,7 +282,7 @@ The task is registered automatically at runtime via the extension module. Task I ### 4. Run Training ```bash -python train.py --config_path /path/to/your/config/dir \ +./isaaclab.sh train --library rlinf --config_path /path/to/your/config/dir \ --config_name isaaclab_ppo_gr00t_my_task ``` diff --git a/scripts/reinforcement_learning/rlinf/play.py b/scripts/reinforcement_learning/rlinf/play.py index f63e02d3e1f2..ce1aba22ddfc 100644 --- a/scripts/reinforcement_learning/rlinf/play.py +++ b/scripts/reinforcement_learning/rlinf/play.py @@ -10,15 +10,15 @@ Usage: # Evaluate a trained checkpoint (config YAML in the same directory as play.py) - python play.py --config_name isaaclab_ppo_gr00t_assemble_trocar \\ + ./isaaclab.sh play --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar \\ --model_path /path/to/checkpoint # Evaluate with config YAML in a custom directory - python play.py --config_path /path/to/config/dir \\ + ./isaaclab.sh play --library rlinf --config_path /path/to/config/dir \\ --config_name isaaclab_ppo_gr00t_assemble_trocar --model_path /path/to/checkpoint # Evaluate with video recording - python play.py --config_name isaaclab_ppo_gr00t_assemble_trocar \\ + ./isaaclab.sh play --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar \\ --model_path /path/to/checkpoint --video Note: @@ -26,6 +26,17 @@ are too large to run on a single GPU without FSDP. """ +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rlinf/play.py is deprecated. Use " + "`./isaaclab.sh play --library rlinf --config_name ` instead. " + "Example: `./isaaclab.sh play --library rlinf " + "--config_name isaaclab_ppo_gr00t_assemble_trocar`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import logging import os diff --git a/scripts/reinforcement_learning/rlinf/play_rlinf.py b/scripts/reinforcement_learning/rlinf/play_rlinf.py new file mode 100644 index 000000000000..ec5656118c2f --- /dev/null +++ b/scripts/reinforcement_learning/rlinf/play_rlinf.py @@ -0,0 +1,179 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Script to evaluate a trained RLinf agent. + +This script runs evaluation using RLinf's distributed infrastructure, +which is required for VLA model inference. + +Usage: + # Evaluate a trained checkpoint (config YAML in the same directory as play.py) + ./isaaclab.sh play --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar \\ + --model_path /path/to/checkpoint + + # Evaluate with config YAML in a custom directory + ./isaaclab.sh play --library rlinf --config_path /path/to/config/dir \\ + --config_name isaaclab_ppo_gr00t_assemble_trocar --model_path /path/to/checkpoint + + # Evaluate with video recording + ./isaaclab.sh play --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar \\ + --model_path /path/to/checkpoint --video + +Note: + Evaluation requires the full RLinf infrastructure since VLA models + are too large to run on a single GPU without FSDP. +""" + +import argparse +import logging +import os +from datetime import datetime +from pathlib import Path + +SCRIPT_DIR = Path(__file__).parent.absolute() +# required for RLinf to register IsaacLab tasks and converters +os.environ.setdefault("RLINF_EXT_MODULE", "isaaclab_contrib.rl.rlinf.extension") + +# local imports +import cli_args # noqa: E402 # isort: skip + +# add argparse arguments +parser = argparse.ArgumentParser(description="Evaluate a trained RLinf agent.") +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task (overrides YAML config if set).") +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment (overrides config if set)") +parser.add_argument( + "--model_path", type=str, default=None, help="Path to the model checkpoint (optional, can be set in config)." +) +parser.add_argument( + "--num_episodes", type=int, default=None, help="Number of evaluation episodes (overrides config if set)." +) +parser.add_argument("--video", action="store_true", default=False, help="Enable video recording.") +cli_args.add_rlinf_args(parser) +args_cli = parser.parse_args() + +# Resolve config path and name from CLI args +if not args_cli.config_name: + parser.error("--config_name is required (e.g. --config_name isaaclab_ppo_gr00t_assemble_trocar)") +config_dir = args_cli.config_path or str(SCRIPT_DIR) +config_name = args_cli.config_name +os.environ["RLINF_CONFIG_FILE"] = str(Path(config_dir) / f"{config_name}.yaml") + +# Add config dir to PYTHONPATH so that Ray rollout workers can resolve +# data_config_class references like "gr00t_config:IsaacLabDataConfig" +if config_dir not in os.environ.get("PYTHONPATH", ""): + os.environ["PYTHONPATH"] = config_dir + os.pathsep + os.environ.get("PYTHONPATH", "") + + +"""launch RLinf evaluation.""" +import rlinf # noqa: F401 +import torch.multiprocessing as mp # noqa: E402 +from hydra import compose, initialize_config_dir # noqa: E402 +from hydra.core.global_hydra import GlobalHydra # noqa: E402 +from omegaconf import open_dict # noqa: E402 +from rlinf.config import validate_cfg # noqa: E402 +from rlinf.runners.embodied_eval_runner import EmbodiedEvalRunner # noqa: E402 +from rlinf.scheduler import Cluster # noqa: E402 +from rlinf.utils.placement import HybridComponentPlacement # noqa: E402 +from rlinf.workers.env.env_worker import EnvWorker # noqa: E402 +from rlinf.workers.rollout.hf.huggingface_worker import MultiStepRolloutWorker # noqa: E402 + +logger = logging.getLogger(__name__) + +mp.set_start_method("spawn", force=True) + + +def main(): + """Launch RLinf evaluation.""" + print(f"[INFO] Using config: {config_name}") + print(f"[INFO] Config path: {config_dir}") + + # Initialize Hydra and load config + GlobalHydra.instance().clear() + initialize_config_dir(config_dir=config_dir, version_base="1.1") + cfg = compose(config_name=config_name) + + # Get task_id from config (eval task) + task_id = cfg.env.eval.init_params.id + print(f"[INFO] Task: {task_id}") + + # Setup logging directory + timestamp = datetime.now().strftime("%Y%m%d-%H:%M:%S") + log_dir = SCRIPT_DIR / "logs" / "rlinf" / "eval" / f"{timestamp}-{task_id.replace('/', '_')}" + log_dir.mkdir(parents=True, exist_ok=True) + print(f"[INFO] Logging to: {log_dir}") + + # Apply runtime overrides + with open_dict(cfg): + # Set evaluation mode + cfg.runner.only_eval = True + # Set logging + cfg.runner.logger.log_path = str(log_dir) + + # Override checkpoint if provided via CLI + if args_cli.model_path: + cfg.rollout.model.model_path = args_cli.model_path + + # Enable video saving if requested + if args_cli.video: + cfg.env.eval.video_cfg.save_video = True + cfg.env.eval.video_cfg.video_base_dir = str(log_dir / "videos") + + # Override task if provided via CLI + if args_cli.task: + cfg.env.eval.init_params.id = args_cli.task + cfg.env.train.init_params.id = args_cli.task + + # Apply CLI args + if args_cli.num_envs is not None: + cfg.env.eval.total_num_envs = args_cli.num_envs + if args_cli.seed is not None: + cfg.actor.seed = args_cli.seed + if args_cli.num_episodes is not None: + cfg.algorithm.eval_rollout_epoch = args_cli.num_episodes + + # Validate config + cfg = validate_cfg(cfg) + + # Print config summary + print("\n" + "=" * 60) + print("RLinf Evaluation Configuration") + print("=" * 60) + print(f" Task: {cfg.env.eval.init_params.id}") + print(f" Num envs: {cfg.env.eval.total_num_envs}") + print(f" Model: {cfg.rollout.model.model_path}") + print(f" Videos: {cfg.env.eval.video_cfg.save_video}") + if cfg.env.eval.video_cfg.save_video: + print(f" Video dir: {cfg.env.eval.video_cfg.video_base_dir}") + print(f" Log dir: {log_dir}") + print("=" * 60 + "\n") + + # Create cluster and workers + cluster = Cluster(cluster_cfg=cfg.cluster) + component_placement = HybridComponentPlacement(cfg, cluster) + + # Create rollout worker + rollout_placement = component_placement.get_strategy("rollout") + rollout_group = MultiStepRolloutWorker.create_group(cfg).launch( + cluster, name=cfg.rollout.group_name, placement_strategy=rollout_placement + ) + + # Create env worker + env_placement = component_placement.get_strategy("env") + env_group = EnvWorker.create_group(cfg).launch(cluster, name=cfg.env.group_name, placement_strategy=env_placement) + + # Run evaluation + runner = EmbodiedEvalRunner( + cfg=cfg, + rollout=rollout_group, + env=env_group, + ) + + runner.init_workers() + runner.run() + + +if __name__ == "__main__": + main() diff --git a/scripts/reinforcement_learning/rlinf/train.py b/scripts/reinforcement_learning/rlinf/train.py index e0e79ab2a89d..def2e4261c9d 100644 --- a/scripts/reinforcement_learning/rlinf/train.py +++ b/scripts/reinforcement_learning/rlinf/train.py @@ -12,14 +12,14 @@ Usage: # Train an IsaacLab task (config YAML in the same directory as train.py) - python train.py --config_name isaaclab_ppo_gr00t_assemble_trocar + ./isaaclab.sh train --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar # Train with config YAML in a custom directory - python train.py --config_path /path/to/config/dir \\ + ./isaaclab.sh train --library rlinf --config_path /path/to/config/dir \\ --config_name isaaclab_ppo_gr00t_assemble_trocar # Train with task override and custom settings - python train.py --config_name isaaclab_ppo_gr00t_assemble_trocar \\ + ./isaaclab.sh train --library rlinf --config_name isaaclab_ppo_gr00t_assemble_trocar \\ --task Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 --num_envs 64 --max_epochs 1000 Note: @@ -27,6 +27,17 @@ The model_path should point to a HuggingFace format checkpoint directory. """ +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rlinf/train.py is deprecated. Use " + "`./isaaclab.sh train --library rlinf --config_name ` instead. " + "Example: `./isaaclab.sh train --library rlinf " + "--config_name isaaclab_ppo_gr00t_assemble_trocar`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import logging import os diff --git a/scripts/reinforcement_learning/rlinf/train_rlinf.py b/scripts/reinforcement_learning/rlinf/train_rlinf.py new file mode 100644 index 000000000000..973765cb1bba --- /dev/null +++ b/scripts/reinforcement_learning/rlinf/train_rlinf.py @@ -0,0 +1,177 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""RLinf training logic for the unified reinforcement learning entrypoint.""" + +from __future__ import annotations + +import argparse +import logging +import os +from datetime import datetime +from pathlib import Path + +from common import import_local_module + +logger = logging.getLogger(__name__) + +RL_ROOT = Path(__file__).resolve().parents[1] +RLINF_DIR = RL_ROOT / "rlinf" +CLI_ARGS = import_local_module("isaaclab_rlinf_cli_args", RLINF_DIR / "cli_args.py") + + +def _parse_args(argv: list[str]) -> argparse.Namespace: + """Parse RLinf training arguments.""" + parser = argparse.ArgumentParser(description="Train an RL agent with RLinf.") + parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") + parser.add_argument("--task", type=str, default=None, help="Name of the task.") + parser.add_argument( + "--seed", + type=int, + default=None, + help="Seed used for the environment (overrides config if set)", + ) + parser.add_argument("--max_epochs", type=int, default=None, help="RL Policy training iterations.") + parser.add_argument("--list_tasks", action="store_true", default=False, help="List all available tasks and exit.") + parser.add_argument("--model_path", type=str, default=None, help="Path to pretrained model checkpoint (required).") + CLI_ARGS.add_rlinf_args(parser) + args_cli = parser.parse_args(argv) + if not args_cli.list_tasks and not args_cli.config_name: + parser.error("--config_name is required (e.g. --config_name isaaclab_ppo_gr00t_assemble_trocar)") + return args_cli + + +def _list_tasks() -> None: + """List available RLinf tasks.""" + print("\n" + "=" * 60) + print("Available RLinf Tasks") + print("=" * 60) + + print("\n[RLinf Registered Tasks]") + try: + from rlinf.envs.isaaclab import REGISTER_ISAACLAB_ENVS + + for task_id in sorted(REGISTER_ISAACLAB_ENVS.keys()): + print(f" - {task_id}") + except ImportError: + print(" (Could not import RLinf registry)") + + print("\n" + "=" * 60) + + +def run(argv: list[str]) -> None: + """Launch RLinf training.""" + os.environ.setdefault("RLINF_EXT_MODULE", "isaaclab_contrib.rl.rlinf.extension") + args_cli = _parse_args(argv) + + if args_cli.list_tasks: + _list_tasks() + return + + config_dir = args_cli.config_path or str(RLINF_DIR) + config_name = args_cli.config_name + os.environ["RLINF_CONFIG_FILE"] = str(Path(config_dir) / f"{config_name}.yaml") + + if config_dir not in os.environ.get("PYTHONPATH", ""): + os.environ["PYTHONPATH"] = config_dir + os.pathsep + os.environ.get("PYTHONPATH", "") + + import rlinf # noqa: F401 + import torch.multiprocessing as mp + from hydra import compose, initialize_config_dir + from hydra.core.global_hydra import GlobalHydra + from omegaconf import open_dict + from rlinf.config import validate_cfg + from rlinf.runners.embodied_runner import EmbodiedRunner + from rlinf.scheduler import Cluster + from rlinf.utils.placement import HybridComponentPlacement + from rlinf.workers.env.env_worker import EnvWorker + from rlinf.workers.rollout.hf.huggingface_worker import MultiStepRolloutWorker + + mp.set_start_method("spawn", force=True) + + print(f"[INFO] Using config: {config_name}") + print(f"[INFO] Config path: {config_dir}") + + GlobalHydra.instance().clear() + initialize_config_dir(config_dir=config_dir, version_base="1.1") + cfg = compose(config_name=config_name) + + task_id = cfg.env.train.init_params.id + print(f"[INFO] Task: {task_id}") + + timestamp = datetime.now().strftime("%Y%m%d-%H:%M:%S") + log_dir = RLINF_DIR / "logs" / "rlinf" / f"{timestamp}-{task_id.replace('/', '_')}" + log_dir.mkdir(parents=True, exist_ok=True) + print(f"[INFO] Logging to: {log_dir}") + + with open_dict(cfg): + cfg.runner.logger.log_path = str(log_dir) + + if args_cli.task: + cfg.env.train.init_params.id = args_cli.task + cfg.env.eval.init_params.id = args_cli.task + + if args_cli.num_envs is not None: + cfg.env.train.total_num_envs = args_cli.num_envs + cfg.env.eval.total_num_envs = args_cli.num_envs + if args_cli.seed is not None: + cfg.actor.seed = args_cli.seed + if args_cli.max_epochs is not None: + cfg.runner.max_epochs = args_cli.max_epochs + if args_cli.model_path is not None: + cfg.actor.model.model_path = args_cli.model_path + cfg.rollout.model.model_path = args_cli.model_path + if args_cli.only_eval: + cfg.runner.only_eval = True + if args_cli.resume_dir: + cfg.runner.resume_dir = args_cli.resume_dir + + cfg = validate_cfg(cfg) + + print("\n" + "=" * 60) + print("RLinf Training Configuration") + print("=" * 60) + print(f" Task: {cfg.env.train.init_params.id}") + print(f" Num envs: {cfg.env.train.total_num_envs}") + print(f" Max epochs: {cfg.runner.max_epochs}") + print(f" Model: {cfg.actor.model.model_path}") + print(f" Algorithm: {cfg.algorithm.loss_type}") + print(f" Log dir: {log_dir}") + print("=" * 60 + "\n") + + cluster = Cluster(cluster_cfg=cfg.cluster) + component_placement = HybridComponentPlacement(cfg, cluster) + + actor_placement = component_placement.get_strategy("actor") + if cfg.algorithm.loss_type == "embodied_sac": + from rlinf.workers.actor.fsdp_sac_policy_worker import EmbodiedSACFSDPPolicy + + actor_worker_cls = EmbodiedSACFSDPPolicy + else: + from rlinf.workers.actor.fsdp_actor_worker import EmbodiedFSDPActor + + actor_worker_cls = EmbodiedFSDPActor + + actor_group = actor_worker_cls.create_group(cfg).launch( + cluster, name=cfg.actor.group_name, placement_strategy=actor_placement + ) + + rollout_placement = component_placement.get_strategy("rollout") + rollout_group = MultiStepRolloutWorker.create_group(cfg).launch( + cluster, name=cfg.rollout.group_name, placement_strategy=rollout_placement + ) + + env_placement = component_placement.get_strategy("env") + env_group = EnvWorker.create_group(cfg).launch(cluster, name=cfg.env.group_name, placement_strategy=env_placement) + + runner = EmbodiedRunner( + cfg=cfg, + actor=actor_group, + rollout=rollout_group, + env=env_group, + ) + + runner.init_workers() + runner.run() diff --git a/scripts/reinforcement_learning/rsl_rl/play.py b/scripts/reinforcement_learning/rsl_rl/play.py index 224ff1e5493c..464a0d282319 100644 --- a/scripts/reinforcement_learning/rsl_rl/play.py +++ b/scripts/reinforcement_learning/rsl_rl/play.py @@ -5,6 +5,16 @@ """Script to play a checkpoint if an RL agent from RSL-RL.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rsl_rl/play.py is deprecated. Use " + "`./isaaclab.sh play --library rsl_rl --task ` instead. " + "Example: `./isaaclab.sh play --library rsl_rl --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import importlib.metadata as metadata diff --git a/scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py b/scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py new file mode 100644 index 000000000000..42ba11beda82 --- /dev/null +++ b/scripts/reinforcement_learning/rsl_rl/play_rsl_rl.py @@ -0,0 +1,229 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Script to play a checkpoint of an RL agent from RSL-RL.""" + +import argparse +import contextlib +import importlib.metadata as metadata +import os +import sys +import time + +import gymnasium as gym +import torch +from packaging import version +from rsl_rl.runners import DistillationRunner, OnPolicyRunner + +from isaaclab.envs import DirectMARLEnvCfg, DirectRLEnvCfg, ManagerBasedRLEnvCfg +from isaaclab.utils.assets import retrieve_file_path +from isaaclab.utils.dict import print_dict +from isaaclab.utils.string import list_intersection, string_to_callable + +from isaaclab_rl.rsl_rl import ( + RslRlBaseRunnerCfg, + RslRlVecEnvWrapper, + export_policy_as_jit, + export_policy_as_onnx, + handle_deprecated_rsl_rl_cfg, +) +from isaaclab_rl.utils.pretrained_checkpoint import get_published_pretrained_checkpoint + +import isaaclab_tasks # noqa: F401 +from isaaclab_tasks.utils import add_launcher_args, get_checkpoint_path, launch_simulation +from isaaclab_tasks.utils.hydra import hydra_task_config + +# local imports +import cli_args # isort: skip + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + +# -- argparse ---------------------------------------------------------------- +parser = argparse.ArgumentParser(description="Play a checkpoint of an RL agent from RSL-RL.") +parser.add_argument("--video", action="store_true", default=False, help="Record videos during play.") +parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).") +parser.add_argument( + "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations." +) +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument( + "--use_pretrained_checkpoint", + action="store_true", + help="Use the pre-trained checkpoint from Nucleus.", +) +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +parser.add_argument("--external_callback", default=None, help="Fully qualified path to an externally defined callback.") +cli_args.add_rsl_rl_args(parser) +add_launcher_args(parser) +args_cli, remaining_args = parser.parse_known_args() + +if args_cli.video: + args_cli.enable_cameras = True + + +# Call an external callback if requested. This gives opportunity to external code to register the environments +# The function is expected to return a list of arguments that were not consumed by the callback. +remaining_args_env_registration = None +if args_cli.external_callback: + external_callback_function = string_to_callable(args_cli.external_callback, separator=".") + remaining_args_env_registration = external_callback_function() + +# clear out sys.argv for Hydra +# The remaining arguments are the arguments that were not consumed by both this scripts +# argparser and (optionally) the external callback function. +remaining_args = list_intersection(remaining_args, remaining_args_env_registration) +sys.argv = [sys.argv[0]] + remaining_args + +# Check for installed RSL-RL version +installed_version = metadata.version("rsl-rl-lib") + + +@hydra_task_config(args_cli.task, args_cli.agent) +def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agent_cfg: RslRlBaseRunnerCfg): + """Play with RSL-RL agent.""" + with launch_simulation(env_cfg, args_cli): + # grab task name for checkpoint path + task_name = args_cli.task.split(":")[-1] + train_task_name = task_name.replace("-Play", "") + + # override configurations with non-hydra CLI arguments + agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli) + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + + # handle deprecated configurations + agent_cfg = handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version) + + # set the environment seed + # note: certain randomizations occur in the environment initialization so we set the seed here + env_cfg.seed = agent_cfg.seed + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # specify directory for logging experiments + log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name) + log_root_path = os.path.abspath(log_root_path) + print(f"[INFO] Loading experiment from directory: {log_root_path}") + if args_cli.use_pretrained_checkpoint: + resume_path = get_published_pretrained_checkpoint("rsl_rl", train_task_name) + if not resume_path: + print("[INFO] Unfortunately a pre-trained checkpoint is currently unavailable for this task.") + return + elif args_cli.checkpoint: + resume_path = retrieve_file_path(args_cli.checkpoint) + else: + resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint) + + log_dir = os.path.dirname(resume_path) + + # set the log directory for the environment + env_cfg.log_dir = log_dir + + # create isaac environment + env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + + # convert to single-agent instance if required by the RL algorithm + if isinstance(env.unwrapped.cfg, DirectMARLEnvCfg): + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + + # wrap for video recording + if args_cli.video: + video_kwargs = { + "video_folder": os.path.join(log_dir, "videos", "play"), + "step_trigger": lambda step: step == 0, + "video_length": args_cli.video_length, + "disable_logger": True, + } + print("[INFO] Recording videos during play.") + print_dict(video_kwargs, nesting=4) + env = gym.wrappers.RecordVideo(env, **video_kwargs) + + # wrap around environment for rsl-rl + env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions) + + print(f"[INFO]: Loading model checkpoint from: {resume_path}") + # load previously trained model + if agent_cfg.class_name == "OnPolicyRunner": + runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device) + elif agent_cfg.class_name == "DistillationRunner": + runner = DistillationRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device) + else: + raise ValueError(f"Unsupported runner class: {agent_cfg.class_name}") + runner.load(resume_path) + + # obtain the trained policy for inference + policy = runner.get_inference_policy(device=env.unwrapped.device) + + # export the trained policy to JIT and ONNX formats + export_model_dir = os.path.join(os.path.dirname(resume_path), "exported") + + if version.parse(installed_version) >= version.parse("4.0.0"): + # use the new export functions for rsl-rl >= 4.0.0 + runner.export_policy_to_jit(path=export_model_dir, filename="policy.pt") + runner.export_policy_to_onnx(path=export_model_dir, filename="policy.onnx") + policy_nn = None # Not needed for rsl-rl >= 4.0.0 + else: + # extract the neural network for rsl-rl < 4.0.0 + if version.parse(installed_version) >= version.parse("2.3.0"): + policy_nn = runner.alg.policy + else: + policy_nn = runner.alg.actor_critic + + # extract the normalizer + if hasattr(policy_nn, "actor_obs_normalizer"): + normalizer = policy_nn.actor_obs_normalizer + elif hasattr(policy_nn, "student_obs_normalizer"): + normalizer = policy_nn.student_obs_normalizer + else: + normalizer = None + + # export to JIT and ONNX + export_policy_as_jit(policy_nn, normalizer=normalizer, path=export_model_dir, filename="policy.pt") + export_policy_as_onnx(policy_nn, normalizer=normalizer, path=export_model_dir, filename="policy.onnx") + + dt = env.unwrapped.step_dt + + # reset environment + obs = env.get_observations() + timestep = 0 + # simulate environment + try: + while True: + start_time = time.time() + # run everything in inference mode + with torch.inference_mode(): + # agent stepping + actions = policy(obs) + # env stepping + obs, _, dones, _ = env.step(actions) + # reset recurrent states for episodes that have terminated + if version.parse(installed_version) >= version.parse("4.0.0"): + policy.reset(dones) + else: + policy_nn.reset(dones) + if args_cli.video: + timestep += 1 + if timestep == args_cli.video_length: + break + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + # close the simulator + env.close() + except KeyboardInterrupt: + pass + + +if __name__ == "__main__": + main() diff --git a/scripts/reinforcement_learning/rsl_rl/train.py b/scripts/reinforcement_learning/rsl_rl/train.py index eefc13a8aa2c..2324e7377769 100644 --- a/scripts/reinforcement_learning/rsl_rl/train.py +++ b/scripts/reinforcement_learning/rsl_rl/train.py @@ -5,6 +5,16 @@ """Script to train RL agent with RSL-RL.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/rsl_rl/train.py is deprecated. Use " + "`./isaaclab.sh train --library rsl_rl --task ` instead. " + "Example: `./isaaclab.sh train --library rsl_rl --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import importlib.metadata as metadata diff --git a/scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py b/scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py new file mode 100644 index 000000000000..4dbf147160da --- /dev/null +++ b/scripts/reinforcement_learning/rsl_rl/train_rsl_rl.py @@ -0,0 +1,179 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""RSL-RL training logic for the unified reinforcement learning entrypoint.""" + +from __future__ import annotations + +import argparse +import contextlib +import importlib.metadata as metadata +import logging +import os +import platform +import time +from datetime import datetime +from pathlib import Path + +from common import ( + add_common_train_args, + add_isaaclab_launcher_args, + apply_env_overrides, + configure_io_descriptors, + create_isaaclab_env, + dump_train_configs, + enable_cameras_for_video, + import_local_module, + set_hydra_args, + validate_distributed_device, + wrap_record_video, +) +from packaging import version + +import isaaclab_tasks # noqa: F401 + +logger = logging.getLogger(__name__) + +RSL_RL_VERSION = "3.0.1" +RL_ROOT = Path(__file__).resolve().parents[1] +CLI_ARGS = import_local_module("isaaclab_rsl_rl_cli_args", RL_ROOT / "rsl_rl" / "cli_args.py") + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + + +def _check_rsl_rl_version() -> str: + """Check that the installed RSL-RL version is supported.""" + installed_version = metadata.version("rsl-rl-lib") + if version.parse(installed_version) < version.parse(RSL_RL_VERSION): + if platform.system() == "Windows": + cmd = [r".\isaaclab.bat", "-p", "-m", "pip", "install", f"rsl-rl-lib=={RSL_RL_VERSION}"] + else: + cmd = ["./isaaclab.sh", "-p", "-m", "pip", "install", f"rsl-rl-lib=={RSL_RL_VERSION}"] + print( + f"Please install the correct version of RSL-RL.\nExisting version is: '{installed_version}'" + f" and required version is: '{RSL_RL_VERSION}'.\nTo install the correct version, run:" + f"\n\n\t{' '.join(cmd)}\n" + ) + raise SystemExit(1) + return installed_version + + +def _parse_args(argv: list[str]) -> argparse.Namespace: + """Parse RSL-RL training arguments.""" + from isaaclab.utils.string import list_intersection, string_to_callable + + parser = argparse.ArgumentParser(description="Train an RL agent with RSL-RL.") + add_common_train_args( + parser, + agent_default="rsl_rl_cfg_entry_point", + agent_help="Name of the RL agent configuration entry point.", + ) + parser.add_argument( + "--external_callback", + default=None, + help="Fully qualified path to an externally defined callback.", + ) + CLI_ARGS.add_rsl_rl_args(parser) + add_isaaclab_launcher_args(parser) + args_cli, remaining_args = parser.parse_known_args(argv) + enable_cameras_for_video(args_cli) + + remaining_args_env_registration = None + if args_cli.external_callback: + external_callback_function = string_to_callable(args_cli.external_callback, separator=".") + remaining_args_env_registration = external_callback_function() + + set_hydra_args(list_intersection(remaining_args, remaining_args_env_registration)) + return args_cli + + +def run(argv: list[str]) -> None: + """Train an RSL-RL agent.""" + import torch + from rsl_rl.runners import DistillationRunner, OnPolicyRunner + + from isaaclab.envs import DirectMARLEnvCfg + + from isaaclab_rl.rsl_rl import RslRlVecEnvWrapper, handle_deprecated_rsl_rl_cfg + + from isaaclab_tasks.utils import get_checkpoint_path, launch_simulation, resolve_task_config + + torch.backends.cuda.matmul.allow_tf32 = True + torch.backends.cudnn.allow_tf32 = True + torch.backends.cudnn.deterministic = False + torch.backends.cudnn.benchmark = False + + args_cli = _parse_args(argv) + installed_version = _check_rsl_rl_version() + env_cfg, agent_cfg = resolve_task_config(args_cli.task, args_cli.agent) + + with launch_simulation(env_cfg, args_cli): + agent_cfg = CLI_ARGS.update_rsl_rl_cfg(agent_cfg, args_cli) + apply_env_overrides(args_cli, env_cfg) + agent_cfg.max_iterations = ( + args_cli.max_iterations if args_cli.max_iterations is not None else agent_cfg.max_iterations + ) + + agent_cfg = handle_deprecated_rsl_rl_cfg(agent_cfg, installed_version) + + env_cfg.seed = agent_cfg.seed + validate_distributed_device(args_cli) + + if args_cli.distributed: + global_rank = int(os.getenv("RANK", "0")) + agent_cfg.device = env_cfg.sim.device + + seed = agent_cfg.seed + global_rank + env_cfg.seed = seed + agent_cfg.seed = seed + + log_root_path = os.path.abspath(os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)) + print(f"[INFO] Logging experiment in directory: {log_root_path}") + log_dir = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + print(f"Exact experiment name requested from command line: {log_dir}") + if agent_cfg.run_name: + log_dir += f"_{agent_cfg.run_name}" + log_dir = os.path.join(log_root_path, log_dir) + + configure_io_descriptors(env_cfg, args_cli, logger) + env_cfg.log_dir = log_dir + + env = create_isaaclab_env( + args_cli.task, + env_cfg, + args_cli, + convert_marl_to_single_agent=isinstance(env_cfg, DirectMARLEnvCfg), + ) + + if agent_cfg.resume or agent_cfg.algorithm.class_name == "Distillation": + resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint) + + env = wrap_record_video(env, log_dir, args_cli) + + start_time = time.time() + env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions) + + if agent_cfg.class_name == "OnPolicyRunner": + runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=log_dir, device=agent_cfg.device) + elif agent_cfg.class_name == "DistillationRunner": + runner = DistillationRunner(env, agent_cfg.to_dict(), log_dir=log_dir, device=agent_cfg.device) + else: + raise ValueError(f"Unsupported runner class: {agent_cfg.class_name}") + + runner.add_git_repo_to_log(__file__) + if agent_cfg.resume or agent_cfg.algorithm.class_name == "Distillation": + print(f"[INFO]: Loading model checkpoint from: {resume_path}") + runner.load(resume_path) + + dump_train_configs(log_dir, env_cfg, agent_cfg) + + try: + runner.learn(num_learning_iterations=agent_cfg.max_iterations, init_at_random_ep_len=True) + print(f"Training time: {round(time.time() - start_time, 2)} seconds") + env.close() + except KeyboardInterrupt: + pass diff --git a/scripts/reinforcement_learning/sb3/play.py b/scripts/reinforcement_learning/sb3/play.py index a1f8757a1e8c..69ca75d3ccac 100644 --- a/scripts/reinforcement_learning/sb3/play.py +++ b/scripts/reinforcement_learning/sb3/play.py @@ -5,6 +5,16 @@ """Script to play a checkpoint if an RL agent from Stable-Baselines3.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/sb3/play.py is deprecated. Use " + "`./isaaclab.sh play --library sb3 --task ` instead. " + "Example: `./isaaclab.sh play --library sb3 --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import os diff --git a/scripts/reinforcement_learning/sb3/play_sb3.py b/scripts/reinforcement_learning/sb3/play_sb3.py new file mode 100644 index 000000000000..e345c3c62880 --- /dev/null +++ b/scripts/reinforcement_learning/sb3/play_sb3.py @@ -0,0 +1,188 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Script to play a checkpoint of an RL agent from Stable-Baselines3.""" + +import argparse +import contextlib +import os +import random +import sys +import time +from pathlib import Path + +import gymnasium as gym +import torch +from stable_baselines3 import PPO +from stable_baselines3.common.vec_env import VecNormalize + +from isaaclab.envs import DirectMARLEnvCfg +from isaaclab.utils.dict import print_dict + +from isaaclab_rl.sb3 import Sb3VecEnvWrapper, process_sb3_cfg +from isaaclab_rl.utils.pretrained_checkpoint import get_published_pretrained_checkpoint + +import isaaclab_tasks # noqa: F401 +from isaaclab_tasks.utils import add_launcher_args, get_checkpoint_path, launch_simulation, resolve_task_config + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + +# -- argparse ---------------------------------------------------------------- +parser = argparse.ArgumentParser(description="Play a checkpoint of an RL agent from Stable-Baselines3.") +parser.add_argument("--video", action="store_true", default=False, help="Record videos during play.") +parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).") +parser.add_argument( + "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations." +) +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", type=str, default="sb3_cfg_entry_point", help="Name of the RL agent configuration entry point." +) +parser.add_argument("--checkpoint", type=str, default=None, help="Path to model checkpoint.") +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument( + "--use_pretrained_checkpoint", + action="store_true", + help="Use the pre-trained checkpoint from Nucleus.", +) +parser.add_argument( + "--use_last_checkpoint", + action="store_true", + help="When no checkpoint provided, use the last saved model. Otherwise use the best saved model.", +) +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +parser.add_argument( + "--keep_all_info", + action="store_true", + default=False, + help="Use a slower SB3 wrapper but keep all the extra training info.", +) +add_launcher_args(parser) +args_cli, hydra_args = parser.parse_known_args() + +if args_cli.video: + args_cli.enable_cameras = True + +sys.argv = [sys.argv[0]] + hydra_args + + +def main(): + """Play with stable-baselines agent.""" + env_cfg, agent_cfg = resolve_task_config(args_cli.task, args_cli.agent) + with launch_simulation(env_cfg, args_cli): + # grab task name for checkpoint path + task_name = args_cli.task.split(":")[-1] + train_task_name = task_name.replace("-Play", "") + # randomly sample a seed if seed = -1 + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + # override configurations with non-hydra CLI arguments + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + agent_cfg["seed"] = args_cli.seed if args_cli.seed is not None else agent_cfg["seed"] + env_cfg.seed = agent_cfg["seed"] + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # directory for logging into + log_root_path = os.path.join("logs", "sb3", train_task_name) + log_root_path = os.path.abspath(log_root_path) + # checkpoint and log_dir stuff + if args_cli.use_pretrained_checkpoint: + checkpoint_path = get_published_pretrained_checkpoint("sb3", train_task_name) + if not checkpoint_path: + print("[INFO] Unfortunately a pre-trained checkpoint is currently unavailable for this task.") + return + elif args_cli.checkpoint is None: + if args_cli.use_last_checkpoint: + checkpoint = "model_.*.zip" + else: + checkpoint = "model.zip" + checkpoint_path = get_checkpoint_path(log_root_path, ".*", checkpoint, sort_alpha=False) + else: + checkpoint_path = args_cli.checkpoint + log_dir = os.path.dirname(checkpoint_path) + + # set the log directory for the environment + env_cfg.log_dir = log_dir + + # create isaac environment + env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + + # post-process agent configuration + agent_cfg = process_sb3_cfg(agent_cfg, env.unwrapped.num_envs) + + # convert to single-agent instance if required by the RL algorithm + if isinstance(env.unwrapped.cfg, DirectMARLEnvCfg): + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + + # wrap for video recording + if args_cli.video: + video_kwargs = { + "video_folder": os.path.join(log_dir, "videos", "play"), + "step_trigger": lambda step: step == 0, + "video_length": args_cli.video_length, + "disable_logger": True, + } + print("[INFO] Recording videos during play.") + print_dict(video_kwargs, nesting=4) + env = gym.wrappers.RecordVideo(env, **video_kwargs) + # wrap around environment for stable baselines + env = Sb3VecEnvWrapper(env, fast_variant=not args_cli.keep_all_info) + + vec_norm_path = checkpoint_path.replace("/model", "/model_vecnormalize").replace(".zip", ".pkl") + vec_norm_path = Path(vec_norm_path) + + # normalize environment (if needed) + if vec_norm_path.exists(): + print(f"Loading saved normalization: {vec_norm_path}") + env = VecNormalize.load(vec_norm_path, env) + env.training = False + env.norm_reward = False + elif "normalize_input" in agent_cfg: + env = VecNormalize( + env, + training=True, + norm_obs="normalize_input" in agent_cfg and agent_cfg.pop("normalize_input"), + clip_obs="clip_obs" in agent_cfg and agent_cfg.pop("clip_obs"), + ) + + # create agent from stable baselines + print(f"Loading checkpoint from: {checkpoint_path}") + agent = PPO.load(checkpoint_path, env, print_system_info=True) + + dt = env.unwrapped.step_dt + + # reset environment + obs = env.reset() + timestep = 0 + # simulate environment + try: + while True: + start_time = time.time() + with torch.inference_mode(): + actions, _ = agent.predict(obs, deterministic=True) + obs, _, _, _ = env.step(actions) + if args_cli.video: + timestep += 1 + if timestep == args_cli.video_length: + break + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + # close the simulator + env.close() + except KeyboardInterrupt: + pass + + +if __name__ == "__main__": + main() diff --git a/scripts/reinforcement_learning/sb3/train.py b/scripts/reinforcement_learning/sb3/train.py index 7bf757ef5483..449e04e2dc72 100644 --- a/scripts/reinforcement_learning/sb3/train.py +++ b/scripts/reinforcement_learning/sb3/train.py @@ -6,6 +6,16 @@ """Script to train RL agent with Stable Baselines3.""" +import warnings + +warnings.warn( + "scripts/reinforcement_learning/sb3/train.py is deprecated. Use " + "`./isaaclab.sh -p scripts/reinforcement_learning/train.py --library sb3 --task ` instead. " + "Example: `./isaaclab.sh -p scripts/reinforcement_learning/train.py --library sb3 --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import logging diff --git a/scripts/reinforcement_learning/sb3/train_sb3.py b/scripts/reinforcement_learning/sb3/train_sb3.py new file mode 100644 index 000000000000..1f7a45c118b2 --- /dev/null +++ b/scripts/reinforcement_learning/sb3/train_sb3.py @@ -0,0 +1,176 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Stable-Baselines3 training logic for the unified reinforcement learning entrypoint.""" + +from __future__ import annotations + +import argparse +import contextlib +import logging +import os +import random +import signal +import sys +import time +from datetime import datetime +from pathlib import Path + +from common import ( + add_common_train_args, + add_isaaclab_launcher_args, + apply_env_overrides, + configure_io_descriptors, + create_isaaclab_env, + dump_train_configs, + enable_cameras_for_video, + set_hydra_args, + wrap_record_video, +) + +import isaaclab_tasks # noqa: F401 + +logger = logging.getLogger(__name__) + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + + +def _cleanup_pbar(*args): + """Stop training and clean up rich progress bars on Ctrl+C.""" + import gc + + tqdm_objects = [obj for obj in gc.get_objects() if "tqdm" in type(obj).__name__] + for tqdm_object in tqdm_objects: + if "tqdm_rich" in type(tqdm_object).__name__: + tqdm_object.close() + raise KeyboardInterrupt + + +def _parse_args(argv: list[str]) -> argparse.Namespace: + """Parse Stable-Baselines3 training arguments.""" + parser = argparse.ArgumentParser(description="Train an RL agent with Stable-Baselines3.") + add_common_train_args( + parser, + agent_default="sb3_cfg_entry_point", + agent_help="Name of the RL agent configuration entry point.", + include_distributed=False, + ) + parser.add_argument("--log_interval", type=int, default=100_000, help="Log data every n timesteps.") + parser.add_argument("--checkpoint", type=str, default=None, help="Continue the training from checkpoint.") + parser.add_argument( + "--keep_all_info", + action="store_true", + default=False, + help="Use a slower SB3 wrapper but keep all the extra training info.", + ) + add_isaaclab_launcher_args(parser) + args_cli, hydra_args = parser.parse_known_args(argv) + enable_cameras_for_video(args_cli) + set_hydra_args(hydra_args) + return args_cli + + +def run(argv: list[str]) -> None: + """Train a Stable-Baselines3 agent.""" + import numpy as np + from stable_baselines3 import PPO + from stable_baselines3.common.callbacks import CheckpointCallback, LogEveryNTimesteps + from stable_baselines3.common.vec_env import VecNormalize + + from isaaclab.envs import DirectMARLEnvCfg + + from isaaclab_rl.sb3 import Sb3VecEnvWrapper, process_sb3_cfg + + from isaaclab_tasks.utils import launch_simulation, resolve_task_config + + signal.signal(signal.SIGINT, _cleanup_pbar) + + args_cli = _parse_args(argv) + env_cfg, agent_cfg = resolve_task_config(args_cli.task, args_cli.agent) + + with launch_simulation(env_cfg, args_cli): + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + apply_env_overrides(args_cli, env_cfg) + agent_cfg["seed"] = args_cli.seed if args_cli.seed is not None else agent_cfg["seed"] + if args_cli.max_iterations is not None: + agent_cfg["n_timesteps"] = args_cli.max_iterations * agent_cfg["n_steps"] * env_cfg.scene.num_envs + + env_cfg.seed = agent_cfg["seed"] + + run_info = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + log_root_path = os.path.abspath(os.path.join("logs", "sb3", args_cli.task)) + print(f"[INFO] Logging experiment in directory: {log_root_path}") + print(f"Exact experiment name requested from command line: {run_info}") + log_dir = os.path.join(log_root_path, run_info) + dump_train_configs(log_dir, env_cfg, agent_cfg) + + command = " ".join(sys.orig_argv) + (Path(log_dir) / "command.txt").write_text(command) + + agent_cfg = process_sb3_cfg(agent_cfg, env_cfg.scene.num_envs) + policy_arch = agent_cfg.pop("policy") + n_timesteps = agent_cfg.pop("n_timesteps") + + configure_io_descriptors(env_cfg, args_cli, logger) + env_cfg.log_dir = log_dir + + env = create_isaaclab_env( + args_cli.task, + env_cfg, + args_cli, + convert_marl_to_single_agent=isinstance(env_cfg, DirectMARLEnvCfg), + ) + env = wrap_record_video(env, log_dir, args_cli) + + start_time = time.time() + env = Sb3VecEnvWrapper(env, fast_variant=not args_cli.keep_all_info) + + norm_keys = {"normalize_input", "normalize_value", "clip_obs"} + norm_args = {} + for key in norm_keys: + if key in agent_cfg: + norm_args[key] = agent_cfg.pop(key) + + if norm_args and norm_args.get("normalize_input"): + print(f"Normalizing input, {norm_args=}") + env = VecNormalize( + env, + training=True, + norm_obs=norm_args["normalize_input"], + norm_reward=norm_args.get("normalize_value", False), + clip_obs=norm_args.get("clip_obs", 100.0), + gamma=agent_cfg["gamma"], + clip_reward=np.inf, + ) + + agent = PPO(policy_arch, env, verbose=1, tensorboard_log=log_dir, **agent_cfg) + if args_cli.checkpoint is not None: + agent = agent.load(args_cli.checkpoint, env, print_system_info=True) + + checkpoint_callback = CheckpointCallback(save_freq=1000, save_path=log_dir, name_prefix="model", verbose=2) + callbacks = [checkpoint_callback, LogEveryNTimesteps(n_steps=args_cli.log_interval)] + + with contextlib.suppress(KeyboardInterrupt): + agent.learn( + total_timesteps=n_timesteps, + callback=callbacks, + progress_bar=True, + log_interval=None, + ) + + agent.save(os.path.join(log_dir, "model")) + print("Saving to:") + print(os.path.join(log_dir, "model.zip")) + + if isinstance(env, VecNormalize): + print("Saving normalization") + env.save(os.path.join(log_dir, "model_vecnormalize.pkl")) + + print(f"Training time: {round(time.time() - start_time, 2)} seconds") + env.close() diff --git a/scripts/reinforcement_learning/skrl/play.py b/scripts/reinforcement_learning/skrl/play.py index 8663d0561941..7e3707ae471d 100644 --- a/scripts/reinforcement_learning/skrl/play.py +++ b/scripts/reinforcement_learning/skrl/play.py @@ -10,6 +10,16 @@ a more user-friendly way. """ +import warnings + +warnings.warn( + "scripts/reinforcement_learning/skrl/play.py is deprecated. Use " + "`./isaaclab.sh play --library skrl --task ` instead. " + "Example: `./isaaclab.sh play --library skrl --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import os diff --git a/scripts/reinforcement_learning/skrl/play_skrl.py b/scripts/reinforcement_learning/skrl/play_skrl.py new file mode 100644 index 000000000000..46f79599c13f --- /dev/null +++ b/scripts/reinforcement_learning/skrl/play_skrl.py @@ -0,0 +1,229 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +""" +Script to play a checkpoint of an RL agent from skrl. + +Visit the skrl documentation (https://skrl.readthedocs.io) to see the examples structured in +a more user-friendly way. +""" + +import argparse +import contextlib +import os +import random +import sys +import time + +import gymnasium as gym +import skrl +import torch +from packaging import version + +from isaaclab.envs import DirectMARLEnvCfg +from isaaclab.utils.dict import print_dict + +from isaaclab_rl.utils.pretrained_checkpoint import get_published_pretrained_checkpoint + +import isaaclab_tasks # noqa: F401 +from isaaclab_tasks.utils import add_launcher_args, get_checkpoint_path, launch_simulation, resolve_task_config + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + +SKRL_VERSION = "2.0.0" + +# -- argparse ---------------------------------------------------------------- +parser = argparse.ArgumentParser(description="Play a checkpoint of an RL agent from skrl.") +parser.add_argument("--video", action="store_true", default=False, help="Record videos during play.") +parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).") +parser.add_argument( + "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations." +) +parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") +parser.add_argument("--task", type=str, default=None, help="Name of the task.") +parser.add_argument( + "--agent", + type=str, + default=None, + help=( + "Name of the RL agent configuration entry point. Defaults to None, in which case the argument " + "--algorithm is used to determine the default agent configuration entry point." + ), +) +parser.add_argument("--checkpoint", type=str, default=None, help="Path to model checkpoint.") +parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") +parser.add_argument( + "--use_pretrained_checkpoint", + action="store_true", + help="Use the pre-trained checkpoint from Nucleus.", +) +parser.add_argument( + "--ml_framework", + type=str, + default="torch", + choices=["torch", "jax"], + help="The ML framework used for training the skrl agent.", +) +parser.add_argument( + "--algorithm", + type=str, + default="PPO", + choices=["AMP", "PPO", "IPPO", "MAPPO"], + help="The RL algorithm used for training the skrl agent.", +) +parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.") +add_launcher_args(parser) +args_cli, hydra_args = parser.parse_known_args() + +if args_cli.video: + args_cli.enable_cameras = True + +sys.argv = [sys.argv[0]] + hydra_args + +# -- check skrl version ------------------------------------------------------ +if version.parse(skrl.__version__) < version.parse(SKRL_VERSION): + skrl.logger.error( + f"Unsupported skrl version: {skrl.__version__}. " + f"Install supported version using 'pip install skrl>={SKRL_VERSION}'" + ) + exit() + +# config shortcuts +if args_cli.agent is None: + algorithm = args_cli.algorithm.lower() + agent_cfg_entry_point = "skrl_cfg_entry_point" if algorithm in ["ppo"] else f"skrl_{algorithm}_cfg_entry_point" +else: + agent_cfg_entry_point = args_cli.agent + algorithm = agent_cfg_entry_point.split("_cfg")[0].split("skrl_")[-1].lower() + + +def main(): + """Play with skrl agent.""" + env_cfg, experiment_cfg = resolve_task_config(args_cli.task, agent_cfg_entry_point) + with launch_simulation(env_cfg, args_cli): + if args_cli.ml_framework.startswith("torch"): + from skrl.utils.runner.torch import Runner + elif args_cli.ml_framework.startswith("jax"): + from skrl.utils.runner.jax import Runner + + from isaaclab_rl.skrl import SkrlVecEnvWrapper + + # grab task name for checkpoint path + task_name = args_cli.task.split(":")[-1] + train_task_name = task_name.replace("-Play", "") + + # override configurations with non-hydra CLI arguments + env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs + env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device + + # configure the ML framework into the global skrl variable + if args_cli.ml_framework.startswith("jax"): + skrl.config.jax.backend = "jax" if args_cli.ml_framework == "jax" else "numpy" + + # randomly sample a seed if seed = -1 + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + # set the agent and environment seed from command line + experiment_cfg["seed"] = args_cli.seed if args_cli.seed is not None else experiment_cfg["seed"] + env_cfg.seed = experiment_cfg["seed"] + + # specify directory for logging experiments (load checkpoint) + log_root_path = os.path.join("logs", "skrl", experiment_cfg["agent"]["experiment"]["directory"]) + log_root_path = os.path.abspath(log_root_path) + print(f"[INFO] Loading experiment from directory: {log_root_path}") + # get checkpoint path + if args_cli.use_pretrained_checkpoint: + resume_path = get_published_pretrained_checkpoint("skrl", train_task_name) + if not resume_path: + print("[INFO] Unfortunately a pre-trained checkpoint is currently unavailable for this task.") + return + elif args_cli.checkpoint: + resume_path = os.path.abspath(args_cli.checkpoint) + else: + resume_path = get_checkpoint_path( + log_root_path, run_dir=f".*_{algorithm}_{args_cli.ml_framework}", other_dirs=["checkpoints"] + ) + log_dir = os.path.dirname(os.path.dirname(resume_path)) + + # set the log directory for the environment + env_cfg.log_dir = log_dir + + # create isaac environment + env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None) + + # convert to single-agent instance if required by the RL algorithm + if isinstance(env.unwrapped.cfg, DirectMARLEnvCfg) and algorithm in ["ppo"]: + from isaaclab.envs import multi_agent_to_single_agent + + env = multi_agent_to_single_agent(env) + + # get environment (step) dt for real-time evaluation + try: + dt = env.step_dt + except AttributeError: + dt = env.unwrapped.step_dt + + # wrap for video recording + if args_cli.video: + video_kwargs = { + "video_folder": os.path.join(log_dir, "videos", "play"), + "step_trigger": lambda step: step == 0, + "video_length": args_cli.video_length, + "disable_logger": True, + } + print("[INFO] Recording videos during play.") + print_dict(video_kwargs, nesting=4) + env = gym.wrappers.RecordVideo(env, **video_kwargs) + + # wrap around environment for skrl + env = SkrlVecEnvWrapper(env, ml_framework=args_cli.ml_framework) + + # configure and instantiate the skrl runner + experiment_cfg["trainer"]["close_environment_at_exit"] = False + experiment_cfg["agent"]["experiment"]["write_interval"] = 0 + experiment_cfg["agent"]["experiment"]["checkpoint_interval"] = 0 + runner = Runner(env, experiment_cfg) + + print(f"[INFO] Loading model checkpoint from: {resume_path}") + runner.agent.load(resume_path) + runner.agent.enable_training_mode(False, apply_to_models=True) + + # reset environment + obs, _ = env.reset() + states = env.state() + timestep = 0 + # simulate environment + try: + while True: + start_time = time.time() + + with torch.inference_mode(): + outputs = runner.agent.act(obs, states, timestep=0, timesteps=0) + if hasattr(env, "possible_agents"): + actions = {a: outputs[-1][a].get("mean_actions", outputs[0][a]) for a in env.possible_agents} + else: + actions = outputs[-1].get("mean_actions", outputs[0]) + obs, _, _, _, _ = env.step(actions) + states = env.state() + if args_cli.video: + timestep += 1 + if timestep == args_cli.video_length: + break + + sleep_time = dt - (time.time() - start_time) + if args_cli.real_time and sleep_time > 0: + time.sleep(sleep_time) + + # close the simulator + env.close() + except KeyboardInterrupt: + pass + + +if __name__ == "__main__": + main() diff --git a/scripts/reinforcement_learning/skrl/train.py b/scripts/reinforcement_learning/skrl/train.py index 535403e5a105..4ccbf494f5fc 100644 --- a/scripts/reinforcement_learning/skrl/train.py +++ b/scripts/reinforcement_learning/skrl/train.py @@ -10,6 +10,16 @@ a more user-friendly way. """ +import warnings + +warnings.warn( + "scripts/reinforcement_learning/skrl/train.py is deprecated. Use " + "`./isaaclab.sh train --library skrl --task ` instead. " + "Example: `./isaaclab.sh train --library skrl --task Isaac-Cartpole-v0`.", + DeprecationWarning, + stacklevel=1, +) + import argparse import contextlib import logging diff --git a/scripts/reinforcement_learning/skrl/train_skrl.py b/scripts/reinforcement_learning/skrl/train_skrl.py new file mode 100644 index 000000000000..392564cc48fe --- /dev/null +++ b/scripts/reinforcement_learning/skrl/train_skrl.py @@ -0,0 +1,180 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""skrl training logic for the unified reinforcement learning entrypoint.""" + +from __future__ import annotations + +import argparse +import contextlib +import logging +import os +import random +import time +from datetime import datetime + +from common import ( + add_common_train_args, + add_isaaclab_launcher_args, + apply_env_overrides, + configure_io_descriptors, + create_isaaclab_env, + dump_train_configs, + enable_cameras_for_video, + set_hydra_args, + validate_distributed_device, + wrap_record_video, +) +from packaging import version + +import isaaclab_tasks # noqa: F401 + +logger = logging.getLogger(__name__) + +SKRL_VERSION = "2.0.0" + +# PLACEHOLDER: Extension template (do not remove this comment) +with contextlib.suppress(ImportError): + import isaaclab_tasks_experimental # noqa: F401 + + +def _parse_args(argv: list[str]) -> argparse.Namespace: + """Parse skrl training arguments.""" + parser = argparse.ArgumentParser(description="Train an RL agent with skrl.") + add_common_train_args( + parser, + agent_default=None, + agent_help=( + "Name of the RL agent configuration entry point. Defaults to None, in which case the argument " + "--algorithm is used to determine the default agent configuration entry point." + ), + ) + parser.add_argument("--checkpoint", type=str, default=None, help="Path to model checkpoint to resume training.") + parser.add_argument( + "--ml_framework", + type=str, + default="torch", + choices=["torch", "jax"], + help="The ML framework used for training the skrl agent.", + ) + parser.add_argument( + "--algorithm", + type=str, + default="PPO", + choices=["AMP", "PPO", "IPPO", "MAPPO"], + help="The RL algorithm used for training the skrl agent.", + ) + add_isaaclab_launcher_args(parser) + args_cli, hydra_args = parser.parse_known_args(argv) + enable_cameras_for_video(args_cli) + set_hydra_args(hydra_args) + return args_cli + + +def _resolve_agent_entry_point(args_cli: argparse.Namespace) -> tuple[str, str]: + """Resolve the skrl agent entry point and algorithm from CLI arguments.""" + if args_cli.agent is None: + algorithm = args_cli.algorithm.lower() + agent_cfg_entry_point = "skrl_cfg_entry_point" if algorithm in ["ppo"] else f"skrl_{algorithm}_cfg_entry_point" + else: + agent_cfg_entry_point = args_cli.agent + algorithm = agent_cfg_entry_point.split("_cfg")[0].split("skrl_")[-1].lower() + return agent_cfg_entry_point, algorithm + + +def run(argv: list[str]) -> None: + """Train a skrl agent.""" + import skrl + + from isaaclab.envs import DirectMARLEnvCfg + from isaaclab.utils.assets import retrieve_file_path + + from isaaclab_rl.skrl import SkrlVecEnvWrapper + + from isaaclab_tasks.utils import launch_simulation, resolve_task_config + + args_cli = _parse_args(argv) + + if version.parse(skrl.__version__) < version.parse(SKRL_VERSION): + skrl.logger.error( + f"Unsupported skrl version: {skrl.__version__}. " + f"Install supported version using 'pip install skrl>={SKRL_VERSION}'" + ) + raise SystemExit(1) + + agent_cfg_entry_point, algorithm = _resolve_agent_entry_point(args_cli) + env_cfg, agent_cfg = resolve_task_config(args_cli.task, agent_cfg_entry_point) + + with launch_simulation(env_cfg, args_cli): + if args_cli.ml_framework.startswith("torch"): + from skrl.utils.runner.torch import Runner + elif args_cli.ml_framework.startswith("jax"): + from skrl.utils.runner.jax import Runner + + apply_env_overrides(args_cli, env_cfg) + validate_distributed_device(args_cli) + + if args_cli.distributed: + global_rank = int(os.getenv("RANK", "0")) + + if args_cli.max_iterations: + agent_cfg["trainer"]["timesteps"] = args_cli.max_iterations * agent_cfg["agent"]["rollouts"] + agent_cfg["trainer"]["close_environment_at_exit"] = False + + if args_cli.ml_framework.startswith("jax"): + skrl.config.jax.backend = "jax" if args_cli.ml_framework == "jax" else "numpy" + + if args_cli.seed == -1: + args_cli.seed = random.randint(0, 10000) + + agent_cfg["seed"] = args_cli.seed if args_cli.seed is not None else agent_cfg["seed"] + if args_cli.distributed: + agent_cfg["seed"] = agent_cfg["seed"] + global_rank + env_cfg.seed = agent_cfg["seed"] + + log_root_path = os.path.abspath(os.path.join("logs", "skrl", agent_cfg["agent"]["experiment"]["directory"])) + print(f"[INFO] Logging experiment in directory: {log_root_path}") + log_dir = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") + f"_{algorithm}_{args_cli.ml_framework}" + print(f"Exact experiment name requested from command line: {log_dir}") + if agent_cfg["agent"]["experiment"]["experiment_name"]: + log_dir += f"_{agent_cfg['agent']['experiment']['experiment_name']}" + agent_cfg["agent"]["experiment"]["directory"] = log_root_path + agent_cfg["agent"]["experiment"]["experiment_name"] = log_dir + log_dir = os.path.join(log_root_path, log_dir) + + dump_train_configs(log_dir, env_cfg, agent_cfg) + + resume_path = retrieve_file_path(args_cli.checkpoint) if args_cli.checkpoint else None + + configure_io_descriptors(env_cfg, args_cli, logger) + env_cfg.log_dir = log_dir + + env = create_isaaclab_env( + args_cli.task, + env_cfg, + args_cli, + convert_marl_to_single_agent=isinstance(env_cfg, DirectMARLEnvCfg) and algorithm in ["ppo"], + ) + env = wrap_record_video(env, log_dir, args_cli) + + start_time = time.time() + env = SkrlVecEnvWrapper(env, ml_framework=args_cli.ml_framework) + runner = Runner(env, agent_cfg) + + if resume_path: + print(f"[INFO] Loading model checkpoint from: {resume_path}") + runner.agent.load(resume_path) + + try: + runner.run() + print(f"Training time: {round(time.time() - start_time, 2)} seconds") + + total_timesteps = agent_cfg["trainer"]["timesteps"] + os.makedirs(os.path.join(log_dir, "checkpoints"), exist_ok=True) + runner.agent.write_checkpoint(timestep=total_timesteps, timesteps=total_timesteps) + print(f"[INFO] Saved final agent checkpoint to: {log_dir}/checkpoints") + env.close() + except KeyboardInterrupt: + pass diff --git a/scripts/reinforcement_learning/train.py b/scripts/reinforcement_learning/train.py new file mode 100644 index 000000000000..99313867c2a2 --- /dev/null +++ b/scripts/reinforcement_learning/train.py @@ -0,0 +1,37 @@ +# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md). +# All rights reserved. +# +# SPDX-License-Identifier: BSD-3-Clause + +"""Unified training entrypoint for Isaac Lab reinforcement learning workflows.""" + +from __future__ import annotations + +from pathlib import Path + +from common import dispatch_library_entrypoint + +SCRIPT_DIR = Path(__file__).resolve().parent + +LIBRARY_ENTRYPOINTS = { + "rl_games": SCRIPT_DIR / "rl_games" / "train_rl_games.py", + "rlinf": SCRIPT_DIR / "rlinf" / "train_rlinf.py", + "rsl_rl": SCRIPT_DIR / "rsl_rl" / "train_rsl_rl.py", + "sb3": SCRIPT_DIR / "sb3" / "train_sb3.py", + "skrl": SCRIPT_DIR / "skrl" / "train_skrl.py", +} + + +def main(argv: list[str] | None = None) -> int: + """Run the selected reinforcement learning training library.""" + return dispatch_library_entrypoint( + argv, + LIBRARY_ENTRYPOINTS, + action="train", + description="Train an RL agent with a selected reinforcement learning library.", + library_help="Training library to use.", + ) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/tools/train_and_publish_checkpoints.py b/scripts/tools/train_and_publish_checkpoints.py index 13c55fa6fb55..f0969300e445 100644 --- a/scripts/tools/train_and_publish_checkpoints.py +++ b/scripts/tools/train_and_publish_checkpoints.py @@ -138,7 +138,9 @@ from isaaclab_rl.utils.pretrained_checkpoint import ( WORKFLOW_EXPERIMENT_NAME_VARIABLE, WORKFLOW_PLAYER, + WORKFLOW_PLAYER_ARGS, WORKFLOW_TRAINER, + WORKFLOW_TRAINER_ARGS, WORKFLOWS, get_log_root_path, get_pretrained_checkpoint_path, @@ -183,6 +185,7 @@ def train_job(workflow, task_name, headless=False, force=False, num_envs=None): cmd = [ sys.executable, WORKFLOW_TRAINER[workflow], + *WORKFLOW_TRAINER_ARGS[workflow], "--task", task_name, "--enable_cameras", @@ -236,6 +239,7 @@ def review_pretrained_checkpoint(workflow, task_name, force_review=False, num_en cmd = [ sys.executable, WORKFLOW_PLAYER[workflow], + *WORKFLOW_PLAYER_ARGS[workflow], "--task", task_name, "--checkpoint", diff --git a/source/isaaclab/changelog.d/rl-script-shorthand.rst b/source/isaaclab/changelog.d/rl-script-shorthand.rst new file mode 100644 index 000000000000..bcc5c13f7b5b --- /dev/null +++ b/source/isaaclab/changelog.d/rl-script-shorthand.rst @@ -0,0 +1,8 @@ +Added +^^^^^ + +* Added Isaac Lab CLI ``train`` and ``play`` aliases for launching the unified + reinforcement learning scripts. +* Added ``uv run train`` and ``uv run play`` source checkout entrypoints with + default RSL-RL, tasks, and Newton dependencies, plus optional dependency extras + for OVRTX and OVPhysX workflows. diff --git a/source/isaaclab/isaaclab/cli/__init__.py b/source/isaaclab/isaaclab/cli/__init__.py index 22cfa1e51725..f7426fd972ba 100644 --- a/source/isaaclab/isaaclab/cli/__init__.py +++ b/source/isaaclab/isaaclab/cli/__init__.py @@ -4,6 +4,8 @@ # SPDX-License-Identifier: BSD-3-Clause import argparse +import sys +from pathlib import Path from .commands.envs import command_setup_conda, command_setup_uv from .commands.format import command_format @@ -17,17 +19,46 @@ command_vscode_settings, ) from .utils import ( + ISAACLAB_ROOT, is_windows, run_python_command, ) +def train(args: list[str] | None = None) -> None: + """Run the unified reinforcement learning training script.""" + if args is None: + args = sys.argv[1:] + run_python_command(ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "train.py", args, check=True) + + +def play(args: list[str] | None = None) -> None: + """Run the unified reinforcement learning play script.""" + if args is None: + args = sys.argv[1:] + run_python_command(ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "play.py", args, check=True) + + def cli() -> None: """Parse CLI arguments and run the requested command.""" + if len(sys.argv) > 1 and sys.argv[1] == "train": + train(sys.argv[2:]) + return + if len(sys.argv) > 1 and sys.argv[1] == "play": + play(sys.argv[2:]) + return + + executable_name = Path(sys.argv[0]).name + default_prog = "isaaclab.bat" if is_windows() else "isaaclab.sh" parser = argparse.ArgumentParser( description="Isaac Lab CLI", - prog="isaaclab" + (".bat" if is_windows() else ".sh"), + prog=executable_name if executable_name != "__main__.py" else default_prog, formatter_class=argparse.RawTextHelpFormatter, + epilog=( + "commands:\n" + " train Run scripts/reinforcement_learning/train.py\n" + " play Run scripts/reinforcement_learning/play.py" + ), ) _submodules_str = ", ".join(sorted(VALID_ISAACLAB_SUBMODULES)) @@ -42,7 +73,8 @@ def cli() -> None: "Accepts a comma-separated list of submodule names, one of the RL frameworks, or a special value.\n" "\n" f"* Isaac Lab submodules: {_submodules_str}\n" - " Any submodule accepts an editable selector, e.g. visualizers[all|kit|newton|rerun|viser], rl[rsl_rl|skrl].\n" + " Any submodule accepts an editable selector, e.g.\n" + " visualizers[all|kit|newton|rerun|viser], rl[rsl_rl|skrl].\n" "\n" f"* RL frameworks: {_frameworks_str}\n" " Passing an RL framework name installs all Isaac Lab submodules + that framework.\n" @@ -65,7 +97,10 @@ def cli() -> None: "-p", "--python", nargs=argparse.REMAINDER, - help="Run the python executable provided by Isaac Sim or virtual environment (if active).", + help=( + "Run the python executable provided by Isaac Sim or virtual environment (if active).\n" + "For reinforcement learning workflows, prefer the direct `train` and `play` commands." + ), ) parser.add_argument( "-s", diff --git a/source/isaaclab/isaaclab/cli/utils.py b/source/isaaclab/isaaclab/cli/utils.py index 611a1c6e8101..c7de00d9656f 100644 --- a/source/isaaclab/isaaclab/cli/utils.py +++ b/source/isaaclab/isaaclab/cli/utils.py @@ -17,6 +17,12 @@ # Default path to look for Isaac Sim is _isaac_sim symlink. DEFAULT_ISAAC_SIM_PATH = ISAACLAB_ROOT / "_isaac_sim" +# Short script names supported by ``isaaclab -p``. +_PYTHON_SCRIPT_ALIASES = { + "train.py": ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "train.py", + "play.py": ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "play.py", +} + # ANSI colors. _ANSI_COLOR_RESET = "\033[0m" _ANSI_COLOR_INFO = "\033[36m" # cyan @@ -239,25 +245,41 @@ def run_command( sys.exit(130) +def _is_virtualenv_python(python_exe: str | Path) -> bool: + """Check whether a Python executable belongs to a virtual environment. + + Args: + python_exe: Python executable path. + + Returns: + True when the executable is inside a Python virtual environment. + """ + python_path = Path(python_exe) + return (python_path.parent.parent / "pyvenv.cfg").is_file() + + def get_pip_command(python_exe: str | None = None) -> list[str]: """Return the base pip command tokens for the current environment. When ``uv`` is available and a virtual environment is active, returns - ``["uv", "pip"]``. Otherwise returns ``[python_exe, "-m", "pip"]`` - so that the target interpreter's own pip is used (e.g. Isaac Sim's - bundled ``python.sh``). + ``["uv", "pip"]``. When the target Python belongs to a virtual + environment, ``UV_PYTHON`` is set so ``uv pip`` installs into that + environment even if the process itself is not activated. Otherwise returns + ``[python_exe, "-m", "pip"]`` so that the target interpreter's own pip is + used (e.g. Isaac Sim's bundled ``python.sh``). Args: python_exe: Python executable path. Resolved via :func:`extract_python_exe` when ``None``. """ - in_venv = bool(os.environ.get("VIRTUAL_ENV") or os.environ.get("CONDA_PREFIX") or (sys.prefix != sys.base_prefix)) - if shutil.which("uv") and in_venv: - return ["uv", "pip"] - if python_exe is None: python_exe = extract_python_exe() + in_venv = bool(os.environ.get("VIRTUAL_ENV") or os.environ.get("CONDA_PREFIX") or (sys.prefix != sys.base_prefix)) + if shutil.which("uv") and (in_venv or _is_virtualenv_python(python_exe)): + os.environ["UV_PYTHON"] = python_exe + return ["uv", "pip"] + return [python_exe, "-m", "pip"] @@ -300,17 +322,22 @@ def extract_python_exe() -> str: else: print_debug("extract_python_exe(): No CONDA_PREFIX found.") - # Try the default Isaac Lab uv venv (env_isaaclab/) in the repo root. + # Try the current interpreter when already inside a virtual environment. + if (not python_exe or not Path(python_exe).exists()) and sys.prefix != sys.base_prefix: + python_exe = Path(sys.executable) + print_debug(f"extract_python_exe(): Using active virtual environment python: {python_exe}") + + # Try repo-local virtual environments. if not python_exe or not Path(python_exe).exists(): - default_venv = ISAACLAB_ROOT / "env_isaaclab" - if default_venv.is_dir(): + for default_venv in (ISAACLAB_ROOT / "env_isaaclab", ISAACLAB_ROOT / ".venv"): if is_windows(): candidate = default_venv / "Scripts" / "python.exe" else: candidate = default_venv / "bin" / "python" if candidate.exists(): - print_debug(f"extract_python_exe(): Found default venv python: {candidate}") + print_debug(f"extract_python_exe(): Found repo-local venv python: {candidate}") python_exe = candidate + break # Try kit python. if not python_exe or not Path(python_exe).exists(): @@ -525,6 +552,8 @@ def run_python_command( if is_module: cmd.append("-m") + else: + script_or_module = _PYTHON_SCRIPT_ALIASES.get(str(script_or_module), script_or_module) cmd.append(str(script_or_module)) cmd.extend(args) diff --git a/source/isaaclab/setup.py b/source/isaaclab/setup.py index cfced25a24aa..a6d07d29068c 100644 --- a/source/isaaclab/setup.py +++ b/source/isaaclab/setup.py @@ -135,6 +135,13 @@ python_requires=">=3.12", install_requires=INSTALL_REQUIRES, extras_require=EXTRAS_REQUIRE, + entry_points={ + "console_scripts": [ + "isaaclab=isaaclab.cli:cli", + "play=isaaclab.cli:play", + "train=isaaclab.cli:train", + ], + }, dependency_links=PYTORCH_INDEX_URL, packages=["isaaclab"], classifiers=[ diff --git a/source/isaaclab/test/cli/test_install.py b/source/isaaclab/test/cli/test_install.py index 13494cec696d..5593b982d002 100644 --- a/source/isaaclab/test/cli/test_install.py +++ b/source/isaaclab/test/cli/test_install.py @@ -13,11 +13,14 @@ import pytest +from isaaclab.cli import cli, play, train from isaaclab.cli.utils import ( + ISAACLAB_ROOT, determine_python_version, extract_isaacsim_path, extract_python_exe, get_pip_command, + run_python_command, ) @@ -67,6 +70,27 @@ def test_returns_uv_pip_in_venv_with_uv(self, tmp_path): result = get_pip_command(python_exe=fake_python) assert result == ["uv", "pip"] + def test_returns_uv_pip_for_target_venv_without_activation(self, tmp_path): + """When the target Python is in a venv, use uv pip even if the venv is not activated.""" + venv_python = _python_in_venv(tmp_path / ".venv") + venv_python.parent.mkdir(parents=True, exist_ok=True) + venv_python.touch() + (tmp_path / ".venv" / "pyvenv.cfg").touch() + + env = os.environ.copy() + env.pop("CONDA_PREFIX", None) + env.pop("VIRTUAL_ENV", None) + env.pop("UV_PYTHON", None) + with ( + mock.patch.dict(os.environ, env, clear=True), + mock.patch("isaaclab.cli.utils.shutil.which", return_value="/usr/bin/uv"), + mock.patch.object(sys, "prefix", "/usr"), + mock.patch.object(sys, "base_prefix", "/usr"), + ): + result = get_pip_command(python_exe=str(venv_python)) + assert result == ["uv", "pip"] + assert os.environ["UV_PYTHON"] == str(venv_python) + def test_returns_python_pip_without_uv(self, tmp_path): """When uv is not installed, always return python -m pip.""" fake_python = str(tmp_path / "python") @@ -93,6 +117,131 @@ def test_returns_python_pip_in_conda_without_uv(self, tmp_path): assert result == [fake_python, "-m", "pip"] +# --------------------------------------------------------------------------- +# run_python_command +# --------------------------------------------------------------------------- + + +class TestRunPythonCommand: + """Tests for :func:`run_python_command`.""" + + def test_resolves_train_shorthand(self): + """Should resolve train.py to the unified reinforcement learning training script.""" + with ( + mock.patch("isaaclab.cli.utils.extract_python_exe", return_value="/usr/bin/python"), + mock.patch( + "isaaclab.cli.utils.run_command", + return_value=subprocess.CompletedProcess(args=[], returncode=0), + ) as run_command_mock, + ): + run_python_command("train.py", ["--help"]) + + command = run_command_mock.call_args.args[0] + assert command[:2] == [ + "/usr/bin/python", + str(ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "train.py"), + ] + assert command[2:] == ["--help"] + + def test_resolves_play_shorthand(self): + """Should resolve play.py to the unified reinforcement learning play script.""" + with ( + mock.patch("isaaclab.cli.utils.extract_python_exe", return_value="/usr/bin/python"), + mock.patch( + "isaaclab.cli.utils.run_command", + return_value=subprocess.CompletedProcess(args=[], returncode=0), + ) as run_command_mock, + ): + run_python_command("play.py", ["--help"]) + + command = run_command_mock.call_args.args[0] + assert command[:2] == [ + "/usr/bin/python", + str(ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "play.py"), + ] + assert command[2:] == ["--help"] + + def test_keeps_explicit_relative_script_path(self): + """Should only resolve bare shorthand names.""" + with ( + mock.patch("isaaclab.cli.utils.extract_python_exe", return_value="/usr/bin/python"), + mock.patch( + "isaaclab.cli.utils.run_command", + return_value=subprocess.CompletedProcess(args=[], returncode=0), + ) as run_command_mock, + ): + run_python_command("./train.py", ["--help"]) + + command = run_command_mock.call_args.args[0] + assert command[:2] == ["/usr/bin/python", "./train.py"] + assert command[2:] == ["--help"] + + +# --------------------------------------------------------------------------- +# cli +# --------------------------------------------------------------------------- + + +class TestCli: + """Tests for the Isaac Lab CLI.""" + + def test_train_command_runs_unified_train_script(self): + """Should dispatch the train command to the unified reinforcement learning training script.""" + with ( + mock.patch.object(sys, "argv", ["isaaclab.sh", "train", "--library", "rsl_rl"]), + mock.patch("isaaclab.cli.run_python_command") as run_python_command_mock, + ): + cli() + + run_python_command_mock.assert_called_once_with( + ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "train.py", + ["--library", "rsl_rl"], + check=True, + ) + + def test_play_command_runs_unified_play_script(self): + """Should dispatch the play command to the unified reinforcement learning play script.""" + with ( + mock.patch.object(sys, "argv", ["isaaclab.sh", "play", "--library", "rsl_rl"]), + mock.patch("isaaclab.cli.run_python_command") as run_python_command_mock, + ): + cli() + + run_python_command_mock.assert_called_once_with( + ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "play.py", + ["--library", "rsl_rl"], + check=True, + ) + + def test_train_console_script_runs_unified_train_script(self): + """Should dispatch the train console script to the unified reinforcement learning training script.""" + with ( + mock.patch.object(sys, "argv", ["train", "--library", "rsl_rl"]), + mock.patch("isaaclab.cli.run_python_command") as run_python_command_mock, + ): + train() + + run_python_command_mock.assert_called_once_with( + ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "train.py", + ["--library", "rsl_rl"], + check=True, + ) + + def test_play_console_script_runs_unified_play_script(self): + """Should dispatch the play console script to the unified reinforcement learning play script.""" + with ( + mock.patch.object(sys, "argv", ["play", "--library", "rsl_rl"]), + mock.patch("isaaclab.cli.run_python_command") as run_python_command_mock, + ): + play() + + run_python_command_mock.assert_called_once_with( + ISAACLAB_ROOT / "scripts" / "reinforcement_learning" / "play.py", + ["--library", "rsl_rl"], + check=True, + ) + + # --------------------------------------------------------------------------- # extract_python_exe # --------------------------------------------------------------------------- @@ -124,6 +273,42 @@ def test_uses_conda_prefix_when_no_venv(self, tmp_path): result = extract_python_exe() assert Path(result) == conda_python + def test_uses_current_python_when_running_in_virtualenv_without_env_var(self, tmp_path): + """Should return the current interpreter when the process is already inside a virtual environment.""" + venv_python = _python_in_venv(tmp_path) + venv_python.parent.mkdir(parents=True, exist_ok=True) + venv_python.touch() + + env = os.environ.copy() + env.pop("CONDA_PREFIX", None) + env.pop("VIRTUAL_ENV", None) + with ( + mock.patch.dict(os.environ, env, clear=True), + mock.patch.object(sys, "prefix", str(tmp_path)), + mock.patch.object(sys, "base_prefix", "/usr"), + mock.patch.object(sys, "executable", str(venv_python)), + ): + result = extract_python_exe() + assert Path(result) == venv_python + + def test_uses_repo_dot_venv_when_no_environment_is_active(self, tmp_path): + """Should return the repo-local .venv Python before falling back to system Python.""" + venv_python = _python_in_venv(tmp_path / ".venv") + venv_python.parent.mkdir(parents=True, exist_ok=True) + venv_python.touch() + + env = os.environ.copy() + env.pop("CONDA_PREFIX", None) + env.pop("VIRTUAL_ENV", None) + with ( + mock.patch.dict(os.environ, env, clear=True), + mock.patch("isaaclab.cli.utils.ISAACLAB_ROOT", tmp_path), + mock.patch.object(sys, "prefix", "/usr"), + mock.patch.object(sys, "base_prefix", "/usr"), + ): + result = extract_python_exe() + assert Path(result) == venv_python + # --------------------------------------------------------------------------- # extract_isaacsim_path diff --git a/source/isaaclab/test/install_ci/test_isaaclabx_uv_training.py b/source/isaaclab/test/install_ci/test_isaaclabx_uv_training.py index 9bd2bfa40f32..ead19044399b 100644 --- a/source/isaaclab/test/install_ci/test_isaaclabx_uv_training.py +++ b/source/isaaclab/test/install_ci/test_isaaclabx_uv_training.py @@ -43,8 +43,9 @@ def test_install_and_train_cartpole(self, isaaclab_root): result = self.run_in_uv_env( [ str(self.cli_script), - "-p", - "scripts/reinforcement_learning/rsl_rl/train.py", + "train", + "--library", + "rsl_rl", "--task", "Isaac-Cartpole-Direct-v0", "--num_envs", diff --git a/source/isaaclab_rl/changelog.d/unified-rl-train.rst b/source/isaaclab_rl/changelog.d/unified-rl-train.rst new file mode 100644 index 000000000000..fb6dd0ffa1bb --- /dev/null +++ b/source/isaaclab_rl/changelog.d/unified-rl-train.rst @@ -0,0 +1,5 @@ +Added +^^^^^ + +* Added unified ``scripts/reinforcement_learning/train.py`` and ``scripts/reinforcement_learning/play.py`` + entrypoints for RL workflows. diff --git a/source/isaaclab_rl/isaaclab_rl/utils/pretrained_checkpoint.py b/source/isaaclab_rl/isaaclab_rl/utils/pretrained_checkpoint.py index 3e26d8c4d4f0..a5928c13df4e 100644 --- a/source/isaaclab_rl/isaaclab_rl/utils/pretrained_checkpoint.py +++ b/source/isaaclab_rl/isaaclab_rl/utils/pretrained_checkpoint.py @@ -19,11 +19,17 @@ WORKFLOWS = ["rl_games", "rsl_rl", "sb3", "skrl"] """The supported workflows for pre-trained checkpoints""" -WORKFLOW_TRAINER = {w: f"scripts/reinforcement_learning/{w}/train.py" for w in WORKFLOWS} -"""A dict mapping workflow to their training program path""" +WORKFLOW_TRAINER = {w: "scripts/reinforcement_learning/train.py" for w in WORKFLOWS} +"""A dict mapping workflow to the unified training program path""" -WORKFLOW_PLAYER = {w: f"scripts/reinforcement_learning/{w}/play.py" for w in WORKFLOWS} -"""A dict mapping workflow to their play program path""" +WORKFLOW_TRAINER_ARGS = {w: ["--library", w] for w in WORKFLOWS} +"""A dict mapping workflow to arguments required by the unified training program""" + +WORKFLOW_PLAYER = {w: "scripts/reinforcement_learning/play.py" for w in WORKFLOWS} +"""A dict mapping workflow to the unified play program path""" + +WORKFLOW_PLAYER_ARGS = {w: ["--library", w] for w in WORKFLOWS} +"""A dict mapping workflow to arguments required by the unified play program""" WORKFLOW_PRETRAINED_CHECKPOINT_FILENAMES = { "rl_games": "checkpoint.pth", diff --git a/source/isaaclab_tasks/changelog.d/unified-rl-entrypoints.rst b/source/isaaclab_tasks/changelog.d/unified-rl-entrypoints.rst new file mode 100644 index 000000000000..38fbd1166f9f --- /dev/null +++ b/source/isaaclab_tasks/changelog.d/unified-rl-entrypoints.rst @@ -0,0 +1,5 @@ +Changed +^^^^^^^ + +* Changed task automation helpers to use the unified reinforcement learning + train and play entrypoints. diff --git a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py index 9c7c6b2e712e..bf1588d11cce 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py +++ b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py @@ -69,7 +69,8 @@ def main(): command.extend( [ "-p", - "scripts/reinforcement_learning/rl_games/train.py", + "scripts/reinforcement_learning/train.py", + "--library=rl_games", "--task=Isaac-AutoMate-Disassembly-Direct-v0", f"--num_envs={args.num_envs}", f"--seed={args.seed}", diff --git a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py index 0411f8008f68..3a3f6ae84d48 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py +++ b/source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py @@ -74,7 +74,8 @@ def main(): if args.train: command.extend( [ - "scripts/reinforcement_learning/rl_games/train.py", + "scripts/reinforcement_learning/train.py", + "--library=rl_games", "--task=Isaac-AutoMate-Assembly-Direct-v0", f"--seed={args.seed}", f"--max_iterations={args.max_iterations}", @@ -83,7 +84,13 @@ def main(): else: if not args.checkpoint: raise ValueError("No checkpoint provided for evaluation.") - command.extend(["scripts/reinforcement_learning/rl_games/play.py", "--task=Isaac-AutoMate-Assembly-Direct-v0"]) + command.extend( + [ + "scripts/reinforcement_learning/play.py", + "--library=rl_games", + "--task=Isaac-AutoMate-Assembly-Direct-v0", + ] + ) command.append(f"--num_envs={args.num_envs}") diff --git a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py index 504a3ccda288..925120850f16 100644 --- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py +++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/deploy/gear_assembly/config/rizon_4s/ros_inference_env_cfg.py @@ -124,7 +124,7 @@ class Rizon4sGearAssemblyEnvCfg_PLAY(Rizon4sGearAssemblyROSInferenceEnvCfg): To debug a specific real-world scenario, edit the constants below to match the physical setup, then run:: - python scripts/reinforcement_learning/rsl_rl/play.py \\ + ./isaaclab.sh play --library rsl_rl \\ --task Isaac-Deploy-GearAssembly-Rizon4s-Grav-Play-v0 \\ --num_envs 1 --checkpoint diff --git a/source/isaaclab_tasks/test/benchmarking/test_environments_training.py b/source/isaaclab_tasks/test/benchmarking/test_environments_training.py index 988cc0336789..7db6f98be699 100644 --- a/source/isaaclab_tasks/test/benchmarking/test_environments_training.py +++ b/source/isaaclab_tasks/test/benchmarking/test_environments_training.py @@ -20,7 +20,11 @@ import gymnasium as gym import pytest -from isaaclab_rl.utils.pretrained_checkpoint import WORKFLOW_EXPERIMENT_NAME_VARIABLE, WORKFLOW_TRAINER +from isaaclab_rl.utils.pretrained_checkpoint import ( + WORKFLOW_EXPERIMENT_NAME_VARIABLE, + WORKFLOW_TRAINER, + WORKFLOW_TRAINER_ARGS, +) def setup_environment(): @@ -42,6 +46,7 @@ def train_job(workflow, task, env_config, num_gpus): cmd = [ sys.executable, WORKFLOW_TRAINER[workflow], + *WORKFLOW_TRAINER_ARGS[workflow], "--task", task, "--enable_cameras", diff --git a/tools/run_train_envs.py b/tools/run_train_envs.py index efc85c0265ba..a1a8f5a86cc9 100644 --- a/tools/run_train_envs.py +++ b/tools/run_train_envs.py @@ -6,7 +6,7 @@ """ This scripts run training with different RL libraries over a subset of the environments. -It calls the script ``scripts/reinforcement_learning/${args.lib_name}/train.py`` with the appropriate arguments. +It calls the script ``scripts/reinforcement_learning/train.py`` with the appropriate arguments. Each training run has the corresponding "commit tag" appended to the run name, which allows comparing different training logs of the same environments. @@ -64,7 +64,9 @@ def main(args: argparse.Namespace): [ f"{ISAACLAB_PATH}/isaaclab.sh", "-p", - f"{ISAACLAB_PATH}/scripts/reinforcement_learning/{args.lib_name}/train.py", + f"{ISAACLAB_PATH}/scripts/reinforcement_learning/train.py", + "--library", + args.lib_name, "--task", env_name, "--headless",