Implement Fabric-aware get/set_local_poses via indexedfabricarray#2
Open
pv-nvidia wants to merge 9 commits intofix/fabric-prepare-for-reusefrom
Open
Implement Fabric-aware get/set_local_poses via indexedfabricarray#2pv-nvidia wants to merge 9 commits intofix/fabric-prepare-for-reusefrom
pv-nvidia wants to merge 9 commits intofix/fabric-prepare-for-reusefrom
Conversation
Compute local poses from Fabric world matrices instead of falling back to stale USD data. This fixes the inconsistency where set_world_poses() modified Fabric worldMatrix but get_local_poses() still read from USD, returning stale values (Issue isaac-sim#5). How it works: - get_local_poses: reads child world pose from Fabric, parent world pose from USD, computes local = inv(parent) * child - set_local_poses: reads parent world from USD, computes child world = parent * local, writes to Fabric via set_world_poses Added quaternion math helpers (_quat_mul, _quat_conjugate, _quat_rotate) for the parent/child transform composition. Test changes: - Remove xfail from test_set_world_updates_local (now passes) Addresses Piotr's Issue isaac-sim#5 (localMatrix). Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
Adds two new tests to improve code coverage of FabricFrameView: - test_set_local_via_fabric_path: Exercises the Fabric-native set_local_poses code path (parent_world * local → child_world) by first triggering Fabric init via get_world_poses, then setting local poses and verifying world = parent + local. - test_get_scales_fabric_path: Exercises the Fabric-native get_scales path, verifying default scale (1,1,1) is returned. Coverage of fabric_frame_view.py: 76% → 85%.
* Set _fabric_usd_sync_done = True after the USD-fallback set_local_poses branch updates Fabric world matrices via update_world_xforms(). Without this, the next get_world_poses re-runs _sync_fabric_from_usd_once even though Fabric is already in sync with USD, causing redundant kernel launches (greptile P2). * Re-allocate cached output buffers (_fabric_positions_buf, _fabric_orientations_buf, _fabric_scales_buf) and refresh their ProxyArray wrappers in _rebuild_fabric_arrays. The fabricarrays are rebuilt on a topology change but the cached buffers were left pointing at the prior allocations, which could become stale alongside the rebuild (greptile P1). * Drop redundant local 'import torch' calls inside _get_parent_world_poses, _quat_mul, and _quat_rotate. torch is already imported at the top of the module. * Replace fragile self._usd_view._prims[0].GetStage() with sim_utils.get_current_stage() in _get_parent_world_poses. * Tighten return type hints from bare 'tuple' to 'tuple[torch.Tensor, torch.Tensor]' for _get_parent_world_poses, _compose_parent_local, and _invert_parent_compose.
Documents the new get/set_local_poses Fabric path: get_local_poses no longer returns stale USD values after Fabric world-pose writes, and set_local_poses now composes child_world = parent_world * local through Fabric instead of routing the write to USD.
`wp.to_torch(<ProxyArray>)` is deprecated in favor of the `.torch` accessor. Switch the new Fabric-aware get_local_poses code path and test_set_local_via_fabric_path to the modern API. The pre-existing wp.to_torch usage in test_fabric_set_world_does_not_write_back_to_usd (line 155) is left alone as it lives in a previously-merged PR.
The shared _run_pose_benchmarks helper previously only timed get/set world poses. Add three more cases so the script covers the full read/write surface of FabricFrameView: * Interleaved set->get on world poses (the realistic camera-update pattern). * get_local_poses (now that FabricFrameView computes local from Fabric world matrices instead of falling back to stale USD). * set_local_poses (now that FabricFrameView composes child_world = parent_world * local through Fabric). Also tolerate the ProxyArray return contract introduced in 0.5.23+: view.get_world_poses() can return either wp.array (older callers) or ProxyArray (current). Read .warp / .torch when present.
Adopts the prototype design from bareya/pbarejko/camera-update for FabricFrameView's transform handling, and replaces the previous software composition of local poses (which forced a Python USD parent loop and made get_local_poses 3x slower than USD). Architecture changes: * Three persistent ``PrimSelection`` instances (trans_ro, world_rw, local_rw), differing only in per-attribute access mode. Replaces the single selection that required a custom ``isaaclab:view_index`` attribute on every prim. * Path-based view -> fabric index computation: the integer mapping is derived once from ``selection.GetPaths()`` and stored as a Warp array. No prim attributes are added to the stage. * All transform reads and writes go through ``wp.indexedfabricarray``, so kernels just dereference ``ifa[view_index]`` without a separate mapping argument. The two new indexed kernels live next to the existing ones in ``isaaclab.utils.warp.fabric``. * Stage-level ``IFabricHierarchy`` cache and dirty-stage set: multiple FabricFrameView instances on the same stage share one hierarchy. Local-pose path: * ``set_local_poses`` writes ``omni:fabric:localMatrix`` directly through Fabric and marks the stage dirty. The next ``get_world_poses`` fires a Warp kernel that recomputes ``child_world = parent_world * child_local`` (we cannot rely on ``IFabricHierarchy.update_world_xforms`` -- in practice it re-reads USD's authored xformOps and overwrites the Fabric matrices we just authored). * Symmetrically, ``set_world_poses`` runs a kernel that recomputes ``child_local = inv(parent_world) * child_world`` so subsequent ``get_local_poses`` calls return consistent values. * Initial Fabric population reads through the internal ``UsdFrameView`` (children) and ``UsdGeom.XformCache`` (parents) and writes via the same compose kernel set used at runtime. ``SetWorldXformFromUsd`` / ``SetLocalXformFromUsd`` are no-ops on author-then-query stages, so we do this manually. Performance impact (1024 prims, 50 iterations, A6000): Operation USD (ms) Fabric (ms) Speedup Get World Poses 13.56 0.067 201x Set World Poses 29.97 0.123 244x Interleaved Set->Get 43.78 0.159 276x Get Local Poses 6.15 0.064 96x (was 0.31x SLOWER) Set Local Poses 8.61 0.053 163x (was 0.42x SLOWER) The two new indexed kernels added to ``isaaclab.utils.warp.fabric``: * ``decompose_indexed_fabric_transforms`` -- like the existing ``decompose_fabric_transformation_matrix_to_warp_arrays`` but takes a ``wp.indexedfabricarray`` so the view->fabric mapping is baked in. * ``compose_indexed_fabric_transforms`` -- mirror. * ``update_indexed_local_matrix_from_world`` -- new: recomputes ``child_local = inv(parent_world) * child_world`` per child. * ``update_indexed_world_matrix_from_local`` -- new: mirror. Tests updated: * ``test_fabric_rebuild_after_topology_change`` no longer monkey-patches the removed ``_prepare_for_reuse`` / ``_rebuild_fabric_arrays`` helpers; it now exercises ``_compute_fabric_indices`` and ``_build_indexed_array`` directly to simulate a rebuild. * ``test_prepare_for_reuse_detects_topology_change`` verifies all three persistent selections respond to ``PrepareForReuse`` (instead of the removed single ``_fabric_selection``). All 41 tests in ``test_views_xform_prim_fabric.py`` pass.
`update_indexed_local_matrix_from_world` and `update_indexed_world_matrix_from_local` were transposing both inputs into math convention, doing the multiply/inverse, and transposing the result back to storage convention. Under the transpose identity `(A * B)^T = B^T * A^T` (and `inv(A^T) = inv(A)^T`) this reduces to a single direct multiplication on the stored matrices: storage: local = world * inv(parent) (was: T(inv(T(parent)) * T(world))) storage: world = local * parent (was: T(T(parent) * T(local))) Identical result, four fewer transposes per thread, and the math reads the same as the docstring without scaffolding. All 41 fabric tests still pass.
This branch adds four new public Warp kernels to ``isaaclab.utils.warp.fabric`` (the indexed compose/decompose pair plus the two local↔world propagation kernels) but had no fragment for the ``isaaclab`` package. CI's "Verify changelog fragments" check fails the PR without one — every ``source/<pkg>/`` touched needs its own fragment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Replaces isaac-sim#5381
Adopts the prototype design from bareya/pbarejko/camera-update: all transform reads and writes go through
wp.indexedfabricarray, the view→fabric mapping is path-based (no custom prim attributes), andFabricFrameViewreads/writesomni:fabric:localMatrixdirectly.Previously
get_local_posesdelegated to USD even when Fabric held the source-of-truth world matrices, so afterset_world_poses()the nextget_local_poses()returned stale values — the reasontest_set_world_updates_localwas markedxfail(strict=True). An earlier attempt fixed this by composinglocal = inv(parent_world) * child_worldin torch using a Python-level USD parent loop, which was correct but 3× slower than USD. This PR replaces that with two Warp kernels operating on Fabric matrices via a parent indexed fabric array, eliminating the Python bottleneck.Implementation summary
PrimSelectioninstances (trans_ro,world_rw,local_rw) differing only in per-attribute access mode.selection.GetPaths(); noisaaclab:view_index:HASHprim attributes written anywhere.wp.indexedfabricarrayeverywhere: kernels dereferenceifa[view_index]instead of taking a separate mapping argument.IFabricHierarchycache + dirty-stages set: multipleFabricFrameViewinstances on the same stage share one hierarchy lookup.set_local_posesmarks the stage dirty; the next world read fireschild_world = parent * child_localper child. Mirror kernel runs afterset_world_posesto recomputechild_local = inv(parent) * child_world.isaaclab.utils.warp.fabric:decompose_indexed_fabric_transformsandcompose_indexed_fabric_transforms, plus propagation kernelsupdate_indexed_local_matrix_from_worldandupdate_indexed_world_matrix_from_local.The shared contract test
test_set_world_updates_localno longer needs thexfailoverride and passes for bothcpuandcuda:0. Three additional tests cover the new paths:test_set_local_via_fabric_path,test_get_scales_fabric_path,test_prepare_for_reuse_detects_topology_change. Local run of the full fabric test suite: 41 passed, 0 failed.Benchmark results
1024 prims, 50 iterations, NVIDIA RTX A6000.
Reproduce with: