Feat/frame view enable mgpu#5514
Conversation
Greptile SummaryThis PR lifts the
Confidence Score: 5/5Safe to merge — the core change is a targeted removal of an outdated device allowlist with a well-scoped refactor and adequate test coverage. The change is small and focused: removing the
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant FFV as FabricFrameView
participant CFT as _compose_fabric_transform
participant Warp as Warp Kernel
participant Fabric as USDRT Fabric
Caller->>FFV: set_world_poses(positions, orientations)
FFV->>CFT: _compose_fabric_transform(positions, orientations)
CFT->>FFV: _initialize_fabric() [if first call]
FFV-->>Fabric: "SelectPrims(device=self._device)"
CFT->>FFV: _prepare_for_reuse()
FFV-->>Fabric: PrepareForReuse() [once]
CFT->>Warp: launch compose_fabric_transformation_matrix
Warp-->>Fabric: write worldMatrix
CFT->>Fabric: update_world_xforms()
Caller->>FFV: set_scales(scales)
FFV->>CFT: _compose_fabric_transform(scales)
CFT->>FFV: _prepare_for_reuse()
FFV-->>Fabric: PrepareForReuse() [once]
CFT->>Warp: launch compose_fabric_transformation_matrix
Warp-->>Fabric: write worldMatrix
Caller->>FFV: get_world_poses()
FFV->>FFV: _sync_fabric_from_usd_once() [if not synced]
FFV->>CFT: _compose_fabric_transform(pos+ori+scales)
Note over CFT,Fabric: Single PrepareForReuse for full sync
FFV->>Warp: launch decompose_fabric_transformation_matrix
Warp-->>FFV: positions, orientations
Reviews (2): Last reviewed commit: "Enable FabricFrameView on non-primary GP..." | Re-trigger Greptile |
ce0aaa0 to
a6cd73e
Compare
- Allow FabricFrameView to run on cuda:N for any N; USDRT SelectPrims no longer needs cuda:0. - Refactor the Fabric write path into a single _compose_fabric_transform helper shared by set_world_poses, set_scales, and the initial USD->Fabric sync, collapsing the sync to one kernel launch with one PrepareForReuse. - Replace the topology-invariant assert with RuntimeError so it survives python -O. - Add multi_gpu pytest marker plus cuda:1 unit-test coverage for both Fabric write paths, and run them in the existing test-multi-gpu CI job (one extra step, no new job).
a6cd73e to
2c619fe
Compare
The standard pytest invocation in CI runs the fabric test file without filtering on the ``multi_gpu`` marker, so the ``cuda:1`` tests get scheduled on every runner including the single-GPU ones. Previously ``_skip_if_unavailable`` hard-failed via ``pytest.fail`` whenever ``GITHUB_ACTIONS=true`` and the requested device was missing, on the theory that this would catch a misconfigured multi-GPU runner. In practice it just broke the standard CI: the dedicated ``test-fabric-multi-gpu`` workflow already pre-flights ``torch.cuda.device_count() >= 2`` before invoking pytest, so a genuinely misconfigured multi-GPU runner is already caught there. Always skip rather than fail when the requested ``cuda:N`` index isn't available. Drop the now-unused ``import os``.
Kit's CLI parser reads sys.argv directly at startup and segfaults on
pytest flags that collide with its own short options. Running
pytest -m multi_gpu source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
crashes during collection because Kit sees ``-m multi_gpu`` and exits
with ``Ill formed parameter: -m`` followed by SIGSEGV (exit code 245)
inside ``simulation_app._start_app``.
Strip sys.argv to argv[0] before instantiating AppLauncher. The test
file takes no CLI arguments of its own, mirroring the broader pattern
used by ``test_tiled_camera_env.py`` which assigns
``sys.argv[1:] = args_cli.unittest_args`` after argparse.
wp.to_torch on a ProxyArray is deprecated in favor of the .torch accessor. Switch the three call sites that consume the ProxyArray returned by get_world_poses; leave get_scales call sites alone since that method still returns a raw wp.array (no .torch accessor).
There was a problem hiding this comment.
Review Summary
Clean, well-motivated PR that removes an artificial device restriction. The refactoring of set_world_poses and set_scales into a single _compose_fabric_transform helper is a nice improvement — it ensures PrepareForReuse is only called once per logical update regardless of which components are being set.
Strengths
- Single kernel launch for combined updates —
_sync_fabric_from_usd_oncenow does pos+orient+scale in one pass instead of two separate calls - assert → RuntimeError — Topology guard won't be optimized away with
python -O - Thorough multi-GPU test coverage with proper skip/fail semantics depending on CI vs local
- CI workflow integration — dedicated
test-fabric-multi-gpujob triggered by relevant file changes - sys.argv stripping — prevents Kit segfault from pytest flags
Minor Notes
See inline comments.
| dummy3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device) | ||
| dummy4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device) | ||
| scales_wp = _to_float32_2d(scales) | ||
| empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device) |
There was a problem hiding this comment.
💡 Good consolidation: Merging the old set_world_poses Fabric path and set_scales Fabric path into _compose_fabric_transform is a clean refactor. The "empty array means skip" convention with the kernel is elegant.
One subtle point: _sync_fabric_from_usd_once now also calls _compose_fabric_transform (which calls _prepare_for_reuse). Previously it called set_world_poses + set_scales which would invoke _prepare_for_reuse twice. The consolidation to one call is strictly better — worth noting in the changelog (which you did ✓).
| if not device.startswith("cuda"): | ||
| return | ||
| if not torch.cuda.is_available(): | ||
| pytest.skip("CUDA not available") |
There was a problem hiding this comment.
🟡 Skip-vs-fail comment conflicts with implementation: The code comment says "Always skip rather than fail" and explains why. But the PR description says:
In CI (
GITHUB_ACTIONS=true), a missingcuda:1becomespytest.fail
The actual code here always skips. Was the CI-fail behavior removed during development? The PR description should be updated to match the final implementation if it changed.
| @@ -17,6 +17,12 @@ | |||
|
|
|||
There was a problem hiding this comment.
💡 sys.argv stripping: This is a pragmatic workaround for Kit's argv parsing. Consider adding a note that this must remain before AppLauncher — if someone reorders imports or adds a CLI parser above, the protection breaks.
Alternatively, AppLauncher itself could strip unknown flags before passing to Kit (upstream fix), but that's outside this PR's scope.
Description
Removes the
cuda:0-only restriction inFabricFrameView. USDRTSelectPrimsnow accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g.,cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.The change itself is small (drops the device guard in
__init__, the assertion in_initialize_fabric, and the_fabric_supported_devicesallowlist). The bulk of the diff is multi-GPU test coverage — threecuda:1-parameterized tests guarded by a newmulti_gpupytest marker, plus a dedicated CI job on the multi-GPU runner so regressions show up on PRs that touchFabricFrameView.The skip-vs-fail logic in
_skip_if_unavailableis intentional:cuda:1testspytest.skipwith a warning so local runs stay green.GITHUB_ACTIONS=true), a missingcuda:1becomespytest.failso a misconfigured runner is caught immediately rather than silently green-lighting every PR.Stacked on: #5380. Merge that one first; this PR contains only the multi-GPU enablement on top of it.
Type of change
cuda:0continues to work exactly as before;cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists thereTest plan
Three new tests, all marked
@pytest.mark.multi_gpuand parameterized with["cuda:1"]:test_fabric_cuda1_world_pose_roundtrip—set_world_poses→get_world_posesreturns the same values on a non-primary CUDA device.test_fabric_cuda1_no_usd_writeback— Fabric writes oncuda:1do not write back to USD (atol=0.0— equality, not approximate).test_fabric_cuda1_scales_roundtrip— covers theset_scaleswrite path oncuda:1, since both Fabric write paths now run onself._device.A new CI job,
test-fabric-multi-gpu, runs in.github/workflows/test-multi-gpu.yamlon the existing[self-hosted, linux, x64, gpu, multi-gpu]runner. The job pre-flights with./isaaclab.sh -p -c "import torch; print(torch.cuda.device_count())"and fails loudly with::error::if the runner regresses to a single GPU. Triggered automatically on PRs that touchsource/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.pyor its test file.To verify locally on a multi-GPU machine:
./isaaclab.sh -p -m pytest -m multi_gpu \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -vTo verify the cuda:0 path is unchanged:
./isaaclab.sh -p -m pytest -m "not multi_gpu" \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v