-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Feat/frame view enable mgpu #5514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
2c619fe
a35a934
0da3c0a
72d56c1
b3b770d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| Changed | ||
| ^^^^^^^ | ||
|
|
||
| * Combined the initial USD→Fabric sync in | ||
| :class:`~isaaclab_physx.sim.views.FabricFrameView` into a single Fabric | ||
| write so ``PrepareForReuse`` is invoked exactly once per logical update | ||
| (positions, orientations, and scales are composed in one kernel launch). | ||
| This avoids the possibility of a second non-idempotent | ||
| ``PrepareForReuse`` call masking a topology-change signal that should | ||
| have triggered a fabricarray rebuild. | ||
|
|
||
| Fixed | ||
| ^^^^^ | ||
|
|
||
| * Fixed :class:`~isaaclab_physx.sim.views.FabricFrameView` falling back to | ||
| the slow USD path on every CUDA device other than ``cuda:0``. USDRT | ||
| ``SelectPrims`` now accepts any CUDA device index, so Fabric acceleration | ||
| runs on the simulation device the view was constructed with (e.g. | ||
| ``cuda:1``). This unblocks distributed training where each rank is | ||
| pinned to a non-primary GPU. | ||
|
|
||
| * Fixed the topology-change invariant guard in | ||
| :class:`~isaaclab_physx.sim.views.FabricFrameView` not surviving | ||
| ``python -O``. The check now raises :class:`RuntimeError` instead of | ||
| using ``assert`` so the prim-count mismatch between view and Fabric is | ||
| reported at every optimisation level rather than silently producing | ||
| wrong poses or out-of-bounds kernel indices. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,12 @@ | |
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Alternatively, |
||
| from isaaclab.app import AppLauncher | ||
|
|
||
| # Kit reads ``sys.argv`` directly during startup and segfaults on pytest flags | ||
| # (e.g. ``-m multi_gpu``) that collide with its own short options. Strip | ||
| # everything but ``argv[0]`` before booting the app — the test file takes no | ||
| # CLI arguments of its own. | ||
| sys.argv = sys.argv[:1] | ||
|
|
||
| simulation_app = AppLauncher(headless=True).app | ||
|
|
||
| import pytest # noqa: E402 | ||
|
|
@@ -44,8 +50,19 @@ def test_setup_teardown(): | |
|
|
||
|
|
||
| def _skip_if_unavailable(device: str): | ||
| if device.startswith("cuda") and not torch.cuda.is_available(): | ||
| if not device.startswith("cuda"): | ||
| return | ||
| if not torch.cuda.is_available(): | ||
| pytest.skip("CUDA not available") | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Skip-vs-fail comment conflicts with implementation: The code comment says "Always skip rather than fail" and explains why. But the PR description says:
The actual code here always skips. Was the CI-fail behavior removed during development? The PR description should be updated to match the final implementation if it changed. |
||
| idx = int(device.split(":")[1]) if ":" in device else 0 | ||
| n = torch.cuda.device_count() | ||
| if idx >= n: | ||
| # Always skip rather than fail: the dedicated multi-GPU workflow does its own | ||
| # pre-flight ``torch.cuda.device_count() >= 2`` check before invoking pytest, so | ||
| # a misconfigured multi-GPU runner is already caught there. Failing here would | ||
| # only break the standard single-GPU CI runners that legitimately can't run | ||
| # ``cuda:1+`` tests. | ||
| pytest.skip(f"{device} not available (device_count={n}) — multi-GPU test skipped") | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------ | ||
|
|
@@ -169,7 +186,7 @@ def test_fabric_set_world_does_not_write_back_to_usd(device, view_factory): | |
|
|
||
| # Verify Fabric has the new position | ||
| fab_pos, _ = view.get_world_poses() | ||
| pos_torch = wp.to_torch(fab_pos) | ||
| pos_torch = fab_pos.torch | ||
| assert torch.allclose(pos_torch, torch.tensor([[99.0, 99.0, 99.0]], device=device), atol=0.1), ( | ||
| f"Fabric should have new position, got {pos_torch}" | ||
| ) | ||
|
|
@@ -230,6 +247,83 @@ def force_topology_changed(): | |
| # Read back — proves the rebuilt _view_to_fabric and _fabric_world_matrices | ||
| # are still consistent. | ||
| ret_pos, _ = view.get_world_poses() | ||
| pos_torch = wp.to_torch(ret_pos) | ||
| pos_torch = ret_pos.torch | ||
| expected = torch.tensor([[4.0, 5.0, 6.0], [4.0, 5.0, 6.0]], device=device) | ||
| assert torch.allclose(pos_torch, expected, atol=1e-7), f"Read after rebuild failed on {device}: {pos_torch}" | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------ | ||
| # Multi-GPU tests (cuda:1) — skipped automatically on single-GPU workstations | ||
| # ------------------------------------------------------------------ | ||
|
|
||
|
|
||
| @pytest.mark.multi_gpu | ||
| @pytest.mark.parametrize("device", ["cuda:1"]) | ||
| def test_fabric_cuda1_world_pose_roundtrip(device, view_factory): | ||
| """set_world_poses -> get_world_poses roundtrip works on cuda:1. | ||
|
|
||
| Verifies that FabricFrameView operates correctly on a non-primary CUDA | ||
| device without falling back to the USD path. | ||
| """ | ||
| bundle = view_factory(2, device) | ||
| view = bundle.view | ||
|
|
||
| new_pos = wp.zeros((2, 3), dtype=wp.float32, device=device) | ||
| wp.launch(kernel=_fill_position, dim=2, inputs=[new_pos, 10.0, 20.0, 30.0], device=device) | ||
| view.set_world_poses(positions=new_pos) | ||
|
|
||
| ret_pos, _ = view.get_world_poses() | ||
| pos_torch = ret_pos.torch | ||
| expected = torch.tensor([[10.0, 20.0, 30.0], [10.0, 20.0, 30.0]], device=device) | ||
| assert torch.allclose(pos_torch, expected, atol=1e-7), f"Roundtrip failed on {device}: {pos_torch}" | ||
|
|
||
|
|
||
| @pytest.mark.multi_gpu | ||
| @pytest.mark.parametrize("device", ["cuda:1"]) | ||
| def test_fabric_cuda1_no_usd_writeback(device, view_factory): | ||
| """set_world_poses on cuda:1 does not write back to USD. | ||
|
|
||
| Mirrors test_fabric_set_world_does_not_write_back_to_usd for the cuda:1 | ||
| device to confirm the no-writeback invariant holds across GPU indices. | ||
| """ | ||
| bundle = view_factory(1, device) | ||
| view = bundle.view | ||
|
|
||
| stage = sim_utils.get_current_stage() | ||
| prim = stage.GetPrimAtPath(view.prim_paths[0]) | ||
| xform_cache = UsdGeom.XformCache() | ||
| t_before = xform_cache.GetLocalToWorldTransform(prim).ExtractTranslation() | ||
| orig_usd_pos = torch.tensor([float(t_before[0]), float(t_before[1]), float(t_before[2])]) | ||
|
|
||
| new_pos = wp.zeros((1, 3), dtype=wp.float32, device=device) | ||
| wp.launch(kernel=_fill_position, dim=1, inputs=[new_pos, 99.0, 99.0, 99.0], device=device) | ||
| view.set_world_poses(positions=new_pos) | ||
|
|
||
| # USD must not have moved at all — equality, not approximate. | ||
| t_after = UsdGeom.XformCache().GetLocalToWorldTransform(prim).ExtractTranslation() | ||
| usd_pos_after = torch.tensor([float(t_after[0]), float(t_after[1]), float(t_after[2])]) | ||
| assert torch.allclose(usd_pos_after, orig_usd_pos, atol=0.0), ( | ||
| f"USD wrote back on {device}: expected {orig_usd_pos}, got {usd_pos_after}" | ||
| ) | ||
|
|
||
|
|
||
| @pytest.mark.multi_gpu | ||
| @pytest.mark.parametrize("device", ["cuda:1"]) | ||
| def test_fabric_cuda1_scales_roundtrip(device, view_factory): | ||
| """set_scales -> get_scales roundtrip works on cuda:1. | ||
|
|
||
| Both write paths (``set_world_poses`` and ``set_scales``) call | ||
| ``_prepare_for_reuse`` and launch on ``self._device``; this test covers | ||
| the scales path on the non-primary CUDA device. | ||
| """ | ||
| bundle = view_factory(2, device) | ||
| view = bundle.view | ||
|
|
||
| new_scales = wp.zeros((2, 3), dtype=wp.float32, device=device) | ||
| wp.launch(kernel=_fill_position, dim=2, inputs=[new_scales, 2.0, 3.0, 4.0], device=device) | ||
| view.set_scales(new_scales) | ||
|
|
||
| ret_scales = view.get_scales() | ||
| scales_torch = wp.to_torch(ret_scales) | ||
| expected = torch.tensor([[2.0, 3.0, 4.0], [2.0, 3.0, 4.0]], device=device) | ||
| assert torch.allclose(scales_torch, expected, atol=1e-7), f"Scales roundtrip failed on {device}: {scales_torch}" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Good consolidation: Merging the old
set_world_posesFabric path andset_scalesFabric path into_compose_fabric_transformis a clean refactor. The "empty array means skip" convention with the kernel is elegant.One subtle point:
_sync_fabric_from_usd_oncenow also calls_compose_fabric_transform(which calls_prepare_for_reuse). Previously it calledset_world_poses+set_scaleswhich would invoke_prepare_for_reusetwice. The consolidation to one call is strictly better — worth noting in the changelog (which you did ✓).