Skip to content

Feat/frame view enable mgpu#5514

Draft
pv-nvidia wants to merge 5 commits intoisaac-sim:developfrom
pv-nvidia:feat/frame-view-enable-mgpu
Draft

Feat/frame view enable mgpu#5514
pv-nvidia wants to merge 5 commits intoisaac-sim:developfrom
pv-nvidia:feat/frame-view-enable-mgpu

Conversation

@pv-nvidia
Copy link
Copy Markdown

Description

Removes the cuda:0-only restriction in FabricFrameView. USDRT SelectPrims now accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g., cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.

The change itself is small (drops the device guard in __init__, the assertion in _initialize_fabric, and the _fabric_supported_devices allowlist). The bulk of the diff is multi-GPU test coverage — three cuda:1-parameterized tests guarded by a new multi_gpu pytest marker, plus a dedicated CI job on the multi-GPU runner so regressions show up on PRs that touch FabricFrameView.

The skip-vs-fail logic in _skip_if_unavailable is intentional:

  • On a developer workstation with a single GPU, cuda:1 tests pytest.skip with a warning so local runs stay green.
  • In CI (GITHUB_ACTIONS=true), a missing cuda:1 becomes pytest.fail so a misconfigured runner is caught immediately rather than silently green-lighting every PR.

Stacked on: #5380. Merge that one first; this PR contains only the multi-GPU enablement on top of it.

Type of change

  • New feature (non-breaking change which adds functionality)

cuda:0 continues to work exactly as before; cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Note on the changelog item: this PR uses a fragment file at source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst per the new fragment-based changelog system (#5434). CHANGELOG.rst and config/extension.toml are intentionally not edited directly — the nightly CI workflow compiles them from fragments.

Test plan

Three new tests, all marked @pytest.mark.multi_gpu and parameterized with ["cuda:1"]:

  • test_fabric_cuda1_world_pose_roundtripset_world_posesget_world_poses returns the same values on a non-primary CUDA device.
  • test_fabric_cuda1_no_usd_writeback — Fabric writes on cuda:1 do not write back to USD (atol=0.0 — equality, not approximate).
  • test_fabric_cuda1_scales_roundtrip — covers the set_scales write path on cuda:1, since both Fabric write paths now run on self._device.

A new CI job, test-fabric-multi-gpu, runs in .github/workflows/test-multi-gpu.yaml on the existing [self-hosted, linux, x64, gpu, multi-gpu] runner. The job pre-flights with ./isaaclab.sh -p -c "import torch; print(torch.cuda.device_count())" and fails loudly with ::error:: if the runner regresses to a single GPU. Triggered automatically on PRs that touch source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py or its test file.

To verify locally on a multi-GPU machine:

./isaaclab.sh -p -m pytest -m multi_gpu \
    source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v

To verify the cuda:0 path is unchanged:

./isaaclab.sh -p -m pytest -m "not multi_gpu" \
    source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v

@github-actions github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels May 6, 2026
@pv-nvidia pv-nvidia marked this pull request as draft May 6, 2026 12:32
@pv-nvidia pv-nvidia self-assigned this May 6, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR lifts the cuda:0-only restriction in FabricFrameView by removing the _fabric_supported_devices allowlist and the constructor device guard, allowing Fabric GPU acceleration to run on any CUDA device index. It also refactors the two write paths (set_world_poses and set_scales) into a single _compose_fabric_transform helper so PrepareForReuse is invoked exactly once per logical update, and hardens the topology-change guard from a strippable assert to a RuntimeError.

  • Core fix: _fabric_supported_devices tuple and its guard removed; SelectPrims is now called with device=self._device regardless of GPU index, unblocking distributed training where each rank is pinned to a non-primary GPU.
  • Refactor: _compose_fabric_transform consolidates position/orientation/scale writes into one kernel launch and one PrepareForReuse call, resolving the double-invocation risk identified in a prior review.
  • Test coverage: Three new @pytest.mark.multi_gpu tests for cuda:1 with a CI-aware skip-vs-fail mechanism; a new test-fabric-multi-gpu CI job on the [multi-gpu] runner tied to path triggers on fabric_frame_view.py and its test file.

Confidence Score: 5/5

Safe to merge — the core change is a targeted removal of an outdated device allowlist with a well-scoped refactor and adequate test coverage.

The change is small and focused: removing the _fabric_supported_devices guard and forwarding self._device directly to SelectPrims is low-risk. The _compose_fabric_transform consolidation correctly reduces PrepareForReuse calls to one per write. Three new parameterized tests cover the critical cuda:1 paths. The only findings are a stale docstring and a missing CI preflight step that is already covered at the test level.

.github/workflows/test-multi-gpu.yaml — the new test-fabric-multi-gpu job could benefit from an explicit GPU-count preflight step for consistency with the sibling test-multi-gpu job.

Important Files Changed

Filename Overview
source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Removes _fabric_supported_devices allowlist and drops the cuda:0-only guard; refactors set_world_poses/set_scales into a shared _compose_fabric_transform that invokes PrepareForReuse exactly once per write; replaces bare assert with RuntimeError in _rebuild_fabric_arrays. One stale docstring on the device parameter.
source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py Adds three @pytest.mark.multi_gpu tests for cuda:1 (world pose roundtrip, no-USD-writeback, scales roundtrip). Updates _skip_if_unavailable to handle indexed CUDA devices and promote skips to failures in CI. Clean and well-structured.
.github/workflows/test-multi-gpu.yaml Adds test-fabric-multi-gpu job and path triggers for fabric_frame_view.py and its test file. New job is missing the GPU-count pre-flight check that the sibling test-multi-gpu job uses, contrary to the PR description.
pyproject.toml Registers new multi_gpu pytest marker with a descriptive label. Straightforward configuration change.
source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst New changelog fragment covering the multi-GPU enablement, the PrepareForReuse-once fix, and the assert→RuntimeError hardening. Accurate and complete.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant FFV as FabricFrameView
    participant CFT as _compose_fabric_transform
    participant Warp as Warp Kernel
    participant Fabric as USDRT Fabric

    Caller->>FFV: set_world_poses(positions, orientations)
    FFV->>CFT: _compose_fabric_transform(positions, orientations)
    CFT->>FFV: _initialize_fabric() [if first call]
    FFV-->>Fabric: "SelectPrims(device=self._device)"
    CFT->>FFV: _prepare_for_reuse()
    FFV-->>Fabric: PrepareForReuse() [once]
    CFT->>Warp: launch compose_fabric_transformation_matrix
    Warp-->>Fabric: write worldMatrix
    CFT->>Fabric: update_world_xforms()

    Caller->>FFV: set_scales(scales)
    FFV->>CFT: _compose_fabric_transform(scales)
    CFT->>FFV: _prepare_for_reuse()
    FFV-->>Fabric: PrepareForReuse() [once]
    CFT->>Warp: launch compose_fabric_transformation_matrix
    Warp-->>Fabric: write worldMatrix

    Caller->>FFV: get_world_poses()
    FFV->>FFV: _sync_fabric_from_usd_once() [if not synced]
    FFV->>CFT: _compose_fabric_transform(pos+ori+scales)
    Note over CFT,Fabric: Single PrepareForReuse for full sync
    FFV->>Warp: launch decompose_fabric_transformation_matrix
    Warp-->>FFV: positions, orientations
Loading

Reviews (2): Last reviewed commit: "Enable FabricFrameView on non-primary GP..." | Re-trigger Greptile

Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Outdated
Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Outdated
@pv-nvidia pv-nvidia force-pushed the feat/frame-view-enable-mgpu branch 3 times, most recently from ce0aaa0 to a6cd73e Compare May 6, 2026 17:10
- Allow FabricFrameView to run on cuda:N for any N; USDRT SelectPrims
  no longer needs cuda:0.
- Refactor the Fabric write path into a single _compose_fabric_transform
  helper shared by set_world_poses, set_scales, and the initial
  USD->Fabric sync, collapsing the sync to one kernel launch with one
  PrepareForReuse.
- Replace the topology-invariant assert with RuntimeError so it survives
  python -O.
- Add multi_gpu pytest marker plus cuda:1 unit-test coverage for both
  Fabric write paths, and run them in the existing test-multi-gpu CI
  job (one extra step, no new job).
@pv-nvidia pv-nvidia force-pushed the feat/frame-view-enable-mgpu branch from a6cd73e to 2c619fe Compare May 7, 2026 08:44
pv-nvidia added 4 commits May 7, 2026 11:15
The standard pytest invocation in CI runs the fabric test file without
filtering on the ``multi_gpu`` marker, so the ``cuda:1`` tests get
scheduled on every runner including the single-GPU ones.  Previously
``_skip_if_unavailable`` hard-failed via ``pytest.fail`` whenever
``GITHUB_ACTIONS=true`` and the requested device was missing, on the
theory that this would catch a misconfigured multi-GPU runner.  In
practice it just broke the standard CI: the dedicated
``test-fabric-multi-gpu`` workflow already pre-flights
``torch.cuda.device_count() >= 2`` before invoking pytest, so a
genuinely misconfigured multi-GPU runner is already caught there.

Always skip rather than fail when the requested ``cuda:N`` index isn't
available.  Drop the now-unused ``import os``.
Kit's CLI parser reads sys.argv directly at startup and segfaults on
pytest flags that collide with its own short options.  Running

    pytest -m multi_gpu source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py

crashes during collection because Kit sees ``-m multi_gpu`` and exits
with ``Ill formed parameter: -m`` followed by SIGSEGV (exit code 245)
inside ``simulation_app._start_app``.

Strip sys.argv to argv[0] before instantiating AppLauncher.  The test
file takes no CLI arguments of its own, mirroring the broader pattern
used by ``test_tiled_camera_env.py`` which assigns
``sys.argv[1:] = args_cli.unittest_args`` after argparse.
wp.to_torch on a ProxyArray is deprecated in favor of the .torch
accessor.  Switch the three call sites that consume the ProxyArray
returned by get_world_poses; leave get_scales call sites alone since
that method still returns a raw wp.array (no .torch accessor).
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Clean, well-motivated PR that removes an artificial device restriction. The refactoring of set_world_poses and set_scales into a single _compose_fabric_transform helper is a nice improvement — it ensures PrepareForReuse is only called once per logical update regardless of which components are being set.

Strengths

  • Single kernel launch for combined updates_sync_fabric_from_usd_once now does pos+orient+scale in one pass instead of two separate calls
  • assert → RuntimeError — Topology guard won't be optimized away with python -O
  • Thorough multi-GPU test coverage with proper skip/fail semantics depending on CI vs local
  • CI workflow integration — dedicated test-fabric-multi-gpu job triggered by relevant file changes
  • sys.argv stripping — prevents Kit segfault from pytest flags

Minor Notes

See inline comments.

dummy3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
dummy4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device)
scales_wp = _to_float32_2d(scales)
empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Good consolidation: Merging the old set_world_poses Fabric path and set_scales Fabric path into _compose_fabric_transform is a clean refactor. The "empty array means skip" convention with the kernel is elegant.

One subtle point: _sync_fabric_from_usd_once now also calls _compose_fabric_transform (which calls _prepare_for_reuse). Previously it called set_world_poses + set_scales which would invoke _prepare_for_reuse twice. The consolidation to one call is strictly better — worth noting in the changelog (which you did ✓).

if not device.startswith("cuda"):
return
if not torch.cuda.is_available():
pytest.skip("CUDA not available")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Skip-vs-fail comment conflicts with implementation: The code comment says "Always skip rather than fail" and explains why. But the PR description says:

In CI (GITHUB_ACTIONS=true), a missing cuda:1 becomes pytest.fail

The actual code here always skips. Was the CI-fail behavior removed during development? The PR description should be updated to match the final implementation if it changed.

@@ -17,6 +17,12 @@

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 sys.argv stripping: This is a pragmatic workaround for Kit's argv parsing. Consider adding a note that this must remain before AppLauncher — if someone reorders imports or adds a CLI parser above, the protection breaks.

Alternatively, AppLauncher itself could strip unknown flags before passing to Kit (upstream fix), but that's outside this PR's scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant