Skip to content

Implement Fabric-aware get/set_local_poses via indexedfabricarray#2

Open
pv-nvidia wants to merge 9 commits intofix/fabric-prepare-for-reusefrom
pv/5381-rebased
Open

Implement Fabric-aware get/set_local_poses via indexedfabricarray#2
pv-nvidia wants to merge 9 commits intofix/fabric-prepare-for-reusefrom
pv/5381-rebased

Conversation

@pv-nvidia
Copy link
Copy Markdown
Owner

@pv-nvidia pv-nvidia commented May 5, 2026

Description

Replaces isaac-sim#5381

Adopts the prototype design from bareya/pbarejko/camera-update: all transform reads and writes go through wp.indexedfabricarray, the view→fabric mapping is path-based (no custom prim attributes), and FabricFrameView reads/writes omni:fabric:localMatrix directly.

Previously get_local_poses delegated to USD even when Fabric held the source-of-truth world matrices, so after set_world_poses() the next get_local_poses() returned stale values — the reason test_set_world_updates_local was marked xfail(strict=True). An earlier attempt fixed this by composing local = inv(parent_world) * child_world in torch using a Python-level USD parent loop, which was correct but 3× slower than USD. This PR replaces that with two Warp kernels operating on Fabric matrices via a parent indexed fabric array, eliminating the Python bottleneck.

Implementation summary

  • Three persistent PrimSelection instances (trans_ro, world_rw, local_rw) differing only in per-attribute access mode.
  • Path-based view→fabric index mapping computed from selection.GetPaths(); no isaaclab:view_index:HASH prim attributes written anywhere.
  • wp.indexedfabricarray everywhere: kernels dereference ifa[view_index] instead of taking a separate mapping argument.
  • Stage-level IFabricHierarchy cache + dirty-stages set: multiple FabricFrameView instances on the same stage share one hierarchy lookup.
  • Symmetric local↔world propagation via Warp kernels: set_local_poses marks the stage dirty; the next world read fires child_world = parent * child_local per child. Mirror kernel runs after set_world_poses to recompute child_local = inv(parent) * child_world.
  • Two new public Warp kernels in isaaclab.utils.warp.fabric: decompose_indexed_fabric_transforms and compose_indexed_fabric_transforms, plus propagation kernels update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local.

The shared contract test test_set_world_updates_local no longer needs the xfail override and passes for both cpu and cuda:0. Three additional tests cover the new paths: test_set_local_via_fabric_path, test_get_scales_fabric_path, test_prepare_for_reuse_detects_topology_change. Local run of the full fabric test suite: 41 passed, 0 failed.

Benchmark results

1024 prims, 50 iterations, NVIDIA RTX A6000.

Operation USD (ms) Fabric (ms) Speedup
Get World Poses 13.5 0.067 201×
Set World Poses 30.0 0.123 244×
Interleaved Set→Get 43.8 0.159 276×
Get Local Poses 6.1 0.064 96×
Set Local Poses 8.6 0.053 163×

Reproduce with:

./isaaclab.sh -p scripts/benchmarks/benchmark_view_comparison.py \
  --num_envs 1024 --num_iterations 50 --backends usd fabric \
  --device cuda:0 --headless

Depends on: pv-nvidia:fix/fabric-prepare-for-reuse (PR #<TBD>) — please merge that one first.

Fixes #5

Type of change

• Bug fix (non-breaking change which fixes an issue)
• New feature (non-breaking change which adds functionality)

Checklist

• [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
• [x] I have run the [pre-commit checks](https://pre-commit.com/) with ./isaaclab.sh --format
• [x] I have made corresponding changes to the documentation
• [x] My changes generate no new warnings
• [x] I have added tests that prove my fix is effective or that my feature works

• [x] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
• [x] I have added my name to the CONTRIBUTORS.md or my name already exists there

Per the fragment-based changelog system (#5434), this PR ships fragments at source/isaaclab_physx/changelog.d/fabric-frame-view-local-poses.rst and source/isaaclab/changelog.d/pv-5381-rebased.rst instead of editing CHANGELOG.rst and config/extension.toml directly — the nightly CI workflow compiles those from the fragments.

pv-nvidia added 5 commits May 4, 2026 22:12
Compute local poses from Fabric world matrices instead of falling
back to stale USD data.  This fixes the inconsistency where
set_world_poses() modified Fabric worldMatrix but get_local_poses()
still read from USD, returning stale values (Issue isaac-sim#5).

How it works:
- get_local_poses: reads child world pose from Fabric, parent world
  pose from USD, computes local = inv(parent) * child
- set_local_poses: reads parent world from USD, computes child world
  = parent * local, writes to Fabric via set_world_poses

Added quaternion math helpers (_quat_mul, _quat_conjugate,
_quat_rotate) for the parent/child transform composition.

Test changes:
- Remove xfail from test_set_world_updates_local (now passes)

Addresses Piotr's Issue isaac-sim#5 (localMatrix).

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
Adds two new tests to improve code coverage of FabricFrameView:

- test_set_local_via_fabric_path: Exercises the Fabric-native
  set_local_poses code path (parent_world * local → child_world)
  by first triggering Fabric init via get_world_poses, then
  setting local poses and verifying world = parent + local.

- test_get_scales_fabric_path: Exercises the Fabric-native
  get_scales path, verifying default scale (1,1,1) is returned.

Coverage of fabric_frame_view.py: 76% → 85%.
* Set _fabric_usd_sync_done = True after the USD-fallback set_local_poses
  branch updates Fabric world matrices via update_world_xforms(). Without
  this, the next get_world_poses re-runs _sync_fabric_from_usd_once even
  though Fabric is already in sync with USD, causing redundant kernel
  launches (greptile P2).
* Re-allocate cached output buffers (_fabric_positions_buf,
  _fabric_orientations_buf, _fabric_scales_buf) and refresh their
  ProxyArray wrappers in _rebuild_fabric_arrays. The fabricarrays are
  rebuilt on a topology change but the cached buffers were left pointing
  at the prior allocations, which could become stale alongside the
  rebuild (greptile P1).
* Drop redundant local 'import torch' calls inside _get_parent_world_poses,
  _quat_mul, and _quat_rotate. torch is already imported at the top of
  the module.
* Replace fragile self._usd_view._prims[0].GetStage() with
  sim_utils.get_current_stage() in _get_parent_world_poses.
* Tighten return type hints from bare 'tuple' to
  'tuple[torch.Tensor, torch.Tensor]' for _get_parent_world_poses,
  _compose_parent_local, and _invert_parent_compose.
Documents the new get/set_local_poses Fabric path: get_local_poses no
longer returns stale USD values after Fabric world-pose writes, and
set_local_poses now composes child_world = parent_world * local through
Fabric instead of routing the write to USD.
`wp.to_torch(<ProxyArray>)` is deprecated in favor of the `.torch`
accessor.  Switch the new Fabric-aware get_local_poses code path and
test_set_local_via_fabric_path to the modern API.  The pre-existing
wp.to_torch usage in test_fabric_set_world_does_not_write_back_to_usd
(line 155) is left alone as it lives in a previously-merged PR.
pv-nvidia added 4 commits May 5, 2026 11:30
The shared _run_pose_benchmarks helper previously only timed get/set
world poses. Add three more cases so the script covers the full
read/write surface of FabricFrameView:

* Interleaved set->get on world poses (the realistic camera-update
  pattern).
* get_local_poses (now that FabricFrameView computes local from Fabric
  world matrices instead of falling back to stale USD).
* set_local_poses (now that FabricFrameView composes child_world =
  parent_world * local through Fabric).

Also tolerate the ProxyArray return contract introduced in 0.5.23+:
view.get_world_poses() can return either wp.array (older callers) or
ProxyArray (current). Read .warp / .torch when present.
Adopts the prototype design from bareya/pbarejko/camera-update for
FabricFrameView's transform handling, and replaces the previous
software composition of local poses (which forced a Python USD parent
loop and made get_local_poses 3x slower than USD).

Architecture changes:

* Three persistent ``PrimSelection`` instances (trans_ro, world_rw,
  local_rw), differing only in per-attribute access mode.  Replaces
  the single selection that required a custom ``isaaclab:view_index``
  attribute on every prim.
* Path-based view -> fabric index computation: the integer mapping is
  derived once from ``selection.GetPaths()`` and stored as a Warp
  array.  No prim attributes are added to the stage.
* All transform reads and writes go through ``wp.indexedfabricarray``,
  so kernels just dereference ``ifa[view_index]`` without a separate
  mapping argument.  The two new indexed kernels live next to the
  existing ones in ``isaaclab.utils.warp.fabric``.
* Stage-level ``IFabricHierarchy`` cache and dirty-stage set: multiple
  FabricFrameView instances on the same stage share one hierarchy.

Local-pose path:

* ``set_local_poses`` writes ``omni:fabric:localMatrix`` directly
  through Fabric and marks the stage dirty.  The next ``get_world_poses``
  fires a Warp kernel that recomputes ``child_world = parent_world *
  child_local`` (we cannot rely on ``IFabricHierarchy.update_world_xforms``
  -- in practice it re-reads USD's authored xformOps and overwrites the
  Fabric matrices we just authored).
* Symmetrically, ``set_world_poses`` runs a kernel that recomputes
  ``child_local = inv(parent_world) * child_world`` so subsequent
  ``get_local_poses`` calls return consistent values.
* Initial Fabric population reads through the internal ``UsdFrameView``
  (children) and ``UsdGeom.XformCache`` (parents) and writes via the
  same compose kernel set used at runtime.  ``SetWorldXformFromUsd`` /
  ``SetLocalXformFromUsd`` are no-ops on author-then-query stages, so
  we do this manually.

Performance impact (1024 prims, 50 iterations, A6000):

  Operation             USD (ms)   Fabric (ms)   Speedup
  Get World Poses        13.56       0.067        201x
  Set World Poses        29.97       0.123        244x
  Interleaved Set->Get   43.78       0.159        276x
  Get Local Poses         6.15       0.064         96x  (was 0.31x SLOWER)
  Set Local Poses         8.61       0.053        163x  (was 0.42x SLOWER)

The two new indexed kernels added to ``isaaclab.utils.warp.fabric``:

* ``decompose_indexed_fabric_transforms``  -- like the existing
  ``decompose_fabric_transformation_matrix_to_warp_arrays`` but takes a
  ``wp.indexedfabricarray`` so the view->fabric mapping is baked in.
* ``compose_indexed_fabric_transforms``  -- mirror.
* ``update_indexed_local_matrix_from_world``  -- new: recomputes
  ``child_local = inv(parent_world) * child_world`` per child.
* ``update_indexed_world_matrix_from_local``  -- new: mirror.

Tests updated:

* ``test_fabric_rebuild_after_topology_change`` no longer monkey-patches
  the removed ``_prepare_for_reuse`` / ``_rebuild_fabric_arrays`` helpers;
  it now exercises ``_compute_fabric_indices`` and ``_build_indexed_array``
  directly to simulate a rebuild.
* ``test_prepare_for_reuse_detects_topology_change`` verifies all three
  persistent selections respond to ``PrepareForReuse`` (instead of the
  removed single ``_fabric_selection``).

All 41 tests in ``test_views_xform_prim_fabric.py`` pass.
`update_indexed_local_matrix_from_world` and
`update_indexed_world_matrix_from_local` were transposing both inputs
into math convention, doing the multiply/inverse, and transposing the
result back to storage convention.  Under the transpose identity
`(A * B)^T = B^T * A^T` (and `inv(A^T) = inv(A)^T`) this reduces to a
single direct multiplication on the stored matrices:

  storage:  local = world * inv(parent)        (was: T(inv(T(parent)) * T(world)))
  storage:  world = local * parent             (was: T(T(parent) * T(local)))

Identical result, four fewer transposes per thread, and the math reads
the same as the docstring without scaffolding.  All 41 fabric tests
still pass.
This branch adds four new public Warp kernels to
``isaaclab.utils.warp.fabric`` (the indexed compose/decompose pair plus
the two local↔world propagation kernels) but had no fragment for the
``isaaclab`` package.  CI's "Verify changelog fragments" check fails the
PR without one — every ``source/<pkg>/`` touched needs its own fragment.
@pv-nvidia pv-nvidia changed the title Pv/5381 rebased Implement Fabric-aware get/set_local_poses via indexedfabricarray May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant