Skip to content

Commit 2388178

Browse files
Updates docs for using nurec background in locomanipulation sdg (#5301)
Updates docs for using nurec background in locomanipulation sdg ## Type of change - Documentation update ## Checklist - [ ] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
1 parent b582dab commit 2388178

3 files changed

Lines changed: 116 additions & 35 deletions

File tree

docs/source/overview/imitation-learning/humanoids_imitation.rst

Lines changed: 111 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ You can replay the collected demonstrations by running the following command:
107107
--dataset_file ./datasets/dataset_gr1.hdf5
108108
109109
.. note::
110-
Non-determinism may be observed during replay as physics in IsaacLab are not determimnistically reproducible when using ``env.reset``.
110+
Non-determinism may be observed during replay as physics in IsaacLab are not deterministically reproducible when using ``env.reset``.
111111

112112

113113
Annotate the demonstrations
@@ -405,7 +405,7 @@ The robot picks up an object at the initial location (point A) and places it at
405405
AGILE is an officially supported humanoid control training pipeline that leverages the manager based environment in Isaac Lab. It will also be
406406
seamlessly integrated with other evaluation and deployment tools across Isaac products. This allows teams to rely on a single, maintained stack
407407
covering all necessary infrastructure and tooling for policy training, with easy export to real-world deployment. The AGILE repository contains
408-
updated pre-trained policies with separate upper and lower body policies for flexibtility. They have been verified in the real world and can be
408+
updated pre-trained policies with separate upper and lower body policies for flexibility. They have been verified in the real world and can be
409409
directly deployed. Users can also train their own locomotion or whole-body control policies using the AGILE framework.
410410

411411
.. _generate-the-manipulation-dataset:
@@ -531,6 +531,8 @@ Visualize the trained policy performance:
531531
* Behavior Cloning (BC) policy success is typically 75-85% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 40 minutes on a RTX ADA 6000.
532532
* **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
533533

534+
.. _generate-the-dataset-with-manipulation-and-point-to-point-navigation:
535+
534536
Generate the dataset with manipulation and point-to-point navigation
535537
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
536538

@@ -580,7 +582,7 @@ To generate the locomanipulation dataset, use the following command:
580582
The key parameters for locomanipulation dataset generation are:
581583

582584
* ``--lift_step 60``: Number of steps for the lifting phase of the manipulation task. This should mark the point immediately after the robot has grasped the object.
583-
* ``--navigate_step 130``: Number of steps for the navigation phase between locations. This should make the point where the robot has lifted the object and is ready to walk.
585+
* ``--navigate_step 130``: Number of steps for the navigation phase between locations. This should mark the point where the robot has lifted the object and is ready to walk.
584586
* ``--output_file``: Name of the output dataset file
585587

586588
.. note::
@@ -600,6 +602,8 @@ This process creates a dataset where the robot performs the manipulation task at
600602
The data generated from this locomanipulation pipeline can also be used to finetune an imitation learning policy using GR00T N1.5.
601603
The following steps describe how to install GR00T, convert the dataset to LeRobot format, finetune the policy, and run rollouts in Isaac Lab.
602604

605+
.. _finetune-groot-n15-for-locomanipulation:
606+
603607
Finetune GR00T N1.5 policy for locomanipulation
604608
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605609

@@ -699,37 +703,99 @@ Optional arguments include ``--randomize_placement`` and ``--policy_quat_format
699703
The policy shown above uses the camera image, hand poses, hand joint positions, object pose, and base goal pose as inputs.
700704
The output of the model is the target base velocity, hand poses, and hand joint positions for the next several timesteps.
701705

702-
Use NuRec Background in Locomanipulation SDG
703-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
706+
Integrating 3D Gaussian Splatting into SDG
707+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
704708

705-
**Prerequisites:** Generate a manipulation dataset or download a pre-recorded annotated dataset from :ref:`Generate the manipulation dataset <generate-the-manipulation-dataset>`.
709+
This section extends
710+
:ref:`locomanipulation SDG <generate-the-dataset-with-manipulation-and-point-to-point-navigation>`
711+
by replacing the synthetic background with a 3D Gaussian Splatting (NuRec) scene. As in the
712+
base pipeline, the workflow takes a manipulation dataset as input and produces a combined
713+
navigation and manipulation dataset as an HDF5 file — but here the robot navigates and
714+
manipulates objects inside a neurally-rendered environment, and an ego-centric camera
715+
captures the result, producing more realistic training data than a purely synthetic scene.
716+
NVIDIA Isaac Sim renders 3DGS models stored as USD assets; see
717+
`Neural Volume Rendering <https://docs.isaacsim.omniverse.nvidia.com/6.0.0/assets/usd_assets_nurec.html>`__
718+
for details.
706719

707-
The `NuRec assets <https://docs.isaacsim.omniverse.nvidia.com/5.1.0/assets/usd_assets_nurec.html#neural-volume-rendering>`__
708-
are neural volumes reconstructed from real-world captures. When integrated into the locomanipulation SDG workflow, these
709-
assets allow you to generate synthetic data in photorealistic environments that mirror real-world.
720+
.. note::
710721

711-
Custom NuRec Asset Requirements
712-
"""""""""""""""""""""""""""""""
722+
This section focuses on data generation with a 3DGS background. To train a policy on the
723+
generated data, see :ref:`Finetune GR00T N1.5 policy for locomanipulation <finetune-groot-n15-for-locomanipulation>`.
713724

714-
To load a custom USD asset, ensure it meets the following specifications:
725+
.. note::
715726

716-
- Neural Rendering: Include neural reconstruction for rendering.
717-
- Navigation: Include a pre-computed occupancy map for path planning and navigation. You can use the `Occupancy Map Generator <https://docs.isaacsim.omniverse.nvidia.com/6.0.0/digital_twin/ext_isaacsim_asset_generator_occupancy_map.html>`_ to generate the occupancy map.
718-
- Orientation: Transform the asset so that the ground aligns with the z=0 plane.
719-
- Collision Mesh (optional): If a collision mesh is included, set it to invisible.
727+
The locomanipulation SDG pipeline currently runs a single environment. Parallel environment
728+
support is not yet available for this workflow.
720729

721-
Using Pre-constructed Assets
722-
""""""""""""""""""""""""""""
730+
Setup: downloading example assets
731+
"""""""""""""""""""""""""""""""""
732+
733+
We provide a sample asset, ``hand_hold-voyager-babyboom``, on
734+
`Hugging Face <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-NuRec/tree/main>`__.
723735

724-
Pre-constructed assets are available via the `PhysicalAI Robotics NuRec <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-NuRec>`__
725-
dataset. Some of them are captured from a humanoid-viewpoint to match the camera view of the humanoid robot.
736+
Log in to Hugging Face:
726737

727-
For example, when using the asset ``hand_hold-voyager-babyboom``, the relevant files are:
738+
.. code:: bash
728739
729-
- `stage.usdz <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-NuRec/resolve/main/hand_hold-voyager-babyboom/stage.usdz>`__: a USDZ archive that bundles 3D Gaussian splatting (``volume.nurec``), a collision mesh (``mesh.usd``), etc.
730-
- `occupancy_map.yaml <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-NuRec/resolve/main/hand_hold-voyager-babyboom/occupancy_map.yaml>`__ and `occupancy_map.png <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-NuRec/resolve/main/hand_hold-voyager-babyboom/occupancy_map.png>`__: occupancy map for path planning and navigation.
740+
hf auth login --token <your_huggingface_access_token>
731741
732-
Download the files and place them under ``<PATH_TO_USD_ASSET>``, then run the following command to generate a new dataset with background:
742+
Download the required USDZ stage files and occupancy maps:
743+
744+
.. code:: bash
745+
746+
hf download nvidia/PhysicalAI-Robotics-NuRec \
747+
hand_hold-voyager-babyboom/stage_volume.usdz \
748+
hand_hold-voyager-babyboom/stage_particle.usdz \
749+
hand_hold-voyager-babyboom/occupancy_map.png \
750+
hand_hold-voyager-babyboom/occupancy_map.yaml \
751+
--repo-type dataset \
752+
--local-dir <PATH_TO_USD_ASSET>
753+
754+
The sample includes both a volume-based USD (``stage_volume.usdz``) and a particle-field USD
755+
(``stage_particle.usdz``). Either can be used as the background asset.
756+
757+
Asset requirements
758+
""""""""""""""""""
759+
760+
If you are using custom 3D Gaussian assets, ensure they meet these specifications to be
761+
compatible with the SDG pipeline:
762+
763+
- The scene has sufficient free space (e.g. 5m x 5m) for asset placement and robot navigation.
764+
- The ground surface is aligned with the z=0 plane, as the pipeline assumes this elevation for
765+
object placement.
766+
- An occupancy map is required for path planning.
767+
768+
- If your scene was reconstructed using the `Stereo Workflow <https://docs.nvidia.com/nurec/robotics/neural_reconstruction_stereo.html>`__,
769+
the occupancy map is generated via ``nvblox``.
770+
- If your background includes a mesh, use the `Occupancy Map Generator <https://docs.isaacsim.omniverse.nvidia.com/6.0.0/digital_twin/ext_isaacsim_asset_generator_occupancy_map.html>`__
771+
to create a map via physical simulation.
772+
773+
Generating the dataset
774+
""""""""""""""""""""""
775+
776+
Before proceeding, ensure you have generated a manipulation dataset or downloaded the sample
777+
dataset provided in the
778+
:ref:`Generate the manipulation dataset <generate-the-manipulation-dataset>` section.
779+
780+
Once you have gathered:
781+
782+
- A manipulation dataset
783+
- A background USD asset
784+
- A matched occupancy map
785+
786+
you can run the generation command. At runtime, the script adds a ground plane at ``z=0`` to
787+
the scene. It then proceeds through four stages:
788+
789+
1. **Pick**: The robot picks up an object at the start location by replaying the manipulation
790+
trajectory. ``--lift_step`` marks the end of this stage (immediately after grasp).
791+
2. **Navigate**: The robot travels to the target location using occupancy-map path planning and
792+
its locomotion policy. ``--navigate_step`` marks the end of this stage (when the robot is in
793+
place to release the object).
794+
3. **Place**: The robot places the object at the target location, completing the trajectory.
795+
4. **Record**: Joint states, poses, and the ego-centric video are saved to the HDF5 file
796+
specified by ``--output_file``.
797+
798+
Run the generation command:
733799

734800
.. code:: bash
735801
@@ -741,36 +807,49 @@ Download the files and place them under ``<PATH_TO_USD_ASSET>``, then run the fo
741807
--num_runs 1 \
742808
--lift_step 60 \
743809
--navigate_step 130 \
744-
--output_file <DATASET_FOLDER>/generated_dataset_g1_locomanipulation_sdg_with_background.hdf5 \
810+
--output_file <DATASET_FOLDER>/generated_dataset_g1_locomanipulation_sdg_gaussian_background.hdf5 \
745811
--enable_cameras \
746812
--visualizer kit \
747-
--background_usd_path <PATH_TO_USD_ASSET>/stage.usdz \
813+
--background_usd_path <PATH_TO_USD_ASSET>/stage_particle.usdz \
748814
--background_occupancy_yaml_file <PATH_TO_USD_ASSET>/occupancy_map.yaml \
749815
--randomize_placement \
750816
--high_res_video
751817
752818
The key parameters are:
753819

754-
- ``--background_usd_path``: Path to the NuRec USD asset.
820+
- ``--background_usd_path``: Path to the 3D Gaussian background USD asset.
755821
- ``--background_occupancy_yaml_file``: Path to the occupancy map file.
756-
- ``--high_res_video``: Generate a higher resolution video (540x960) for the ego-centric camera view.
757-
- ``--sensor_camera_view``: Optionally set the Sim GUI viewport to the ``robot_pov_cam`` sensor view.
822+
- ``--high_res_video``: Capture the ego-centric camera at 960×540 instead of the default
823+
256×160.
758824

759-
On successful task completion, an HDF5 dataset is generated containing camera observations. You can convert
760-
the ego-centric camera view to MP4.
825+
When the run completes successfully, an HDF5 dataset is generated containing camera
826+
observations. You can convert the ego-centric camera view to MP4:
761827

762828
.. code:: bash
763829
764830
./isaaclab.sh -p scripts/tools/hdf5_to_mp4.py \
765-
--input_file <DATASET_FOLDER>/generated_dataset_g1_locomanipulation_sdg_with_background.hdf5 \
831+
--input_file <DATASET_FOLDER>/generated_dataset_g1_locomanipulation_sdg_gaussian_background.hdf5 \
766832
--output_dir <DATASET_FOLDER>/ \
767833
--input_keys robot_pov_cam \
768834
--video_width 960 \
769835
--video_height 540
770836
837+
Set ``--video_width`` and ``--video_height`` to match the resolution captured during
838+
generation: 960×540 with ``--high_res_video``, or 256×160 without it.
839+
771840
To play the generated MP4 video on Ubuntu, install the following multimedia packages:
772841

773842
.. code:: bash
774843
775844
sudo apt update
776845
sudo apt install libavcodec-extra gstreamer1.0-libav gstreamer1.0-plugins-ugly
846+
847+
848+
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/locomanipulation_sdg_gaussian_background_2x.webp
849+
:width: 100%
850+
:align: center
851+
:alt: locomanipulation SDG with a 3D Gaussian background
852+
:figclass: align-center
853+
854+
The figure above shows recorded ego-centric camera views in the 3D Gaussian background
855+
when the robot replays the pick and place trajectory and navigates to the target location.

scripts/imitation_learning/locomanipulation_sdg/generate_data.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -331,8 +331,8 @@ def project_robot_state_into_env(env: LocomanipulationSDGEnv, input_episode_data
331331
object = env.scene["object"]
332332
current_object_pose = torch.cat(
333333
[
334-
torch.as_tensor(object.data.root_pos_w[0:1], device=env.device, dtype=torch.float32),
335-
torch.as_tensor(object.data.root_quat_w[0:1], device=env.device, dtype=torch.float32),
334+
torch.as_tensor(object.data.root_pos_w.torch[0:1], device=env.device, dtype=torch.float32),
335+
torch.as_tensor(object.data.root_quat_w.torch[0:1], device=env.device, dtype=torch.float32),
336336
],
337337
dim=-1,
338338
) # (1, 7)

source/isaaclab_mimic/isaaclab_mimic/locomanipulation_sdg/scene_utils.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,9 @@ def _get_xform_view(self) -> FrameView:
106106
xform_prim = self.scene[self.entity_name]
107107
if xform_prim.count == 0:
108108
# The view was created before environment cloning; rebuild it now that prims exist.
109-
xform_prim = FrameView(xform_prim._prim_path, device=xform_prim.device)
109+
# FabricFrameView composes UsdFrameView; the template prim_path lives on the inner USD view.
110+
inner = getattr(xform_prim, "_usd_view", xform_prim)
111+
xform_prim = FrameView(inner._prim_path, device=xform_prim.device)
110112
self.scene.extras[self.entity_name] = xform_prim
111113
return xform_prim
112114

0 commit comments

Comments
 (0)