Skip to content

perf: Implement Fabric-aware get/set_local_poses on FabricFrameView#5516

Open
pv-nvidia wants to merge 12 commits into
isaac-sim:developfrom
pv-nvidia:pv/5381-rebased
Open

perf: Implement Fabric-aware get/set_local_poses on FabricFrameView#5516
pv-nvidia wants to merge 12 commits into
isaac-sim:developfrom
pv-nvidia:pv/5381-rebased

Conversation

@pv-nvidia
Copy link
Copy Markdown

@pv-nvidia pv-nvidia commented May 6, 2026

Description

FabricFrameView.set_world_poses updated Fabric's omni:fabric:worldMatrix, but get_local_poses still delegated to USD — so after a world write the next local read returned stale values until USD was re-synced. Symmetrically, set_local_poses went through USD and Fabric never saw the write until the next sync. The shared contract test test_set_world_updates_local was marked xfail(strict=True) to acknowledge this.

This PR replaces the USD-fallback local-pose path with a fully Fabric-native implementation. World and local matrices are read and written directly on omni:fabric:worldMatrix / omni:fabric:localMatrix via wp.indexedfabricarray; consistency in both directions is maintained by Warp kernels.

The design rests on three pieces: three persistent PrimSelection instances differing only in per-attribute access mode (trans_ro, world_rw, local_rw); path-based view→fabric index mapping (no custom prim attributes written); and wp.indexedfabricarray for every (selection × attribute) pair, which bakes the view→fabric mapping into the array itself so kernels read element i of the view directly instead of taking a separate index-mapping argument. World↔local consistency is maintained entirely on the GPU through two Warp kernels — no Python-side parent walks, no per-call USD reads. The result is faster than the USD baseline by two orders of magnitude on every measured operation (see Benchmark results below).

Implementation summary

  • Three persistent PrimSelection instancestrans_ro (read both matrices), world_rw (write world, read local), local_rw (read world, write local). Each selection caches its own view→fabric index mapping built from selection.GetPaths(); selections do not guarantee a shared path ordering, so the mappings are kept independent.
  • Path-based view→fabric index mapping — no isaaclab:view_index:HASH attributes written anywhere on the prims. Fabric groups prims into buckets by their attribute schema; writing a per-view index attribute would split each touched bucket into a "with-attr" and "without-attr" half and proliferate further with each additional view on the stage. Keeping the mapping out of Fabric leaves the bucket layout untouched, which keeps SelectPrims enumeration and wp.fabricarray reads more contiguous.
  • wp.indexedfabricarray everywhere — bakes the mapping into the array itself, removing a per-launch indirection.
  • Per-view dirty flag for world↔local consistency:
    • After set_local_poses the view is marked dirty; the next world read fires a Warp kernel that recomputes child_world = parent_world * child_local for this view's children. Tracking is per-view (not per-stage) so multiple FabricFrameView instances on the same stage cannot clear each other's dirty flag.
    • After set_world_poses / set_scales, a mirror Warp kernel recomputes child_local = inv(parent_world) * child_world so subsequent get_local_poses returns consistent values.
  • Stage-level IFabricHierarchy cache keyed by (stage_id, fabric_id) so a recycled stage_id paired with a new Fabric attachment never returns a stale handle. FabricFrameView.clear_static_caches() is the explicit teardown hook.
  • Four new public Warp kernels in isaaclab.utils.warp.fabric: decompose_indexed_fabric_transforms, compose_indexed_fabric_transforms, update_indexed_local_matrix_from_world, update_indexed_world_matrix_from_local.

Transpose-storage convention

Fabric stores 4×4 transforms in column-transposed form. The two sync kernels apply the identity (A·B)ᵀ = Bᵀ·Aᵀ (and inv(A)ᵀ = inv(Aᵀ)) to compute the equivalent multiply on the stored matrices without explicit transposes:

  • math local = inv(parent) · world → stored local = world · inv(parent)
  • math world = parent · local → stored world = local · parent

Two new regression tests pin this convention; see Tests below.

Benchmark results

1024 prims, 50 iterations, NVIDIA RTX A6000.

Operation USD (ms) Fabric (ms) Speedup
Get World Poses 13.4 0.057 238×
Set World Poses 31.3 0.139 226×
Interleaved Set→Get 44.6 0.172 259×
Get Local Poses 6.1 0.039 158×
Set Local Poses 8.5 0.054 159×

Reproduce with:

./isaaclab.sh -p scripts/benchmarks/benchmark_view_comparison.py \
  --num_envs 1024 --num_iterations 50 --backends usd fabric \
  --device cuda:0 --headless

Tests

Local run of source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py on cpu and cuda:0: 45 passed, 0 failed.

  • Removed the xfail(strict=True) on the shared test_set_world_updates_local — it now passes.
  • test_set_local_via_fabric_pathset_local_poses writes omni:fabric:localMatrix; the deferred world = parent * local recompute lands the right world pose.
  • test_get_scales_fabric_path — exercises the Fabric-native get_scales.
  • test_prepare_for_reuse_detects_topology_change — confirms PrimSelection.PrepareForReuse is wired up and returns the expected topology-change signal.
  • test_fabric_rebuild_after_topology_change — simulates a topology change and confirms the indexed arrays rebuild and continue producing correct results.
  • Transpose-convention verification (test_set_local_then_get_world_with_rotated_parent, test_set_world_then_get_local_with_rotated_parent). The standard fixture's parents are translation-only, so the rotation block is identity and equal to its own transpose — a wrong transpose convention in either sync kernel would silently pass every other test. These two tests place the parent at a non-trivial rotation (+90° around Z) and verify the round-trip in both directions. Locally confirmed that flipping the multiply order in either of the two sync kernels makes the matching test fail.

Iterations during review

  • Stale localMatrix after set_scales — added the missing _sync_local_from_world call so a write through scale also propagates the new diagonal back into the local matrix.
  • Shared _fabric_indices across selections — each selection now caches its own indices array; _build_indexed_array takes them as an explicit parameter, removing the silent-corruption hazard the shared field invited.
  • Stage-keyed dirty flag — replaced with a per-view bool. With multiple views on a stage, view B reading worlds no longer clears the flag that view A's set_local_poses had set (which would have left A's worlds silently stale).
  • _static_hierarchy_cache lifecycle — cache key now (stage_id, fabric_id); added FabricFrameView.clear_static_caches() and call it from the test teardown fixture so handles do not accumulate across the suite.
  • Redundant PrepareForReuse in syncs — both _sync_local_from_world and _sync_world_from_local_if_dirty now refresh trans_sel_ro exactly once per call, then read _world_ifa_ro, _local_ifa_ro and _parent_world_ifa_ro directly from the fields, avoiding the duplicated USDRT round-trip.
  • Docstring vs reality on PrepareForReuse — the class docstring now describes the actual behaviour: every accessor calls it; it is cheap and idempotent in the steady state.
  • Clearer error for root-level prims_compute_parent_fabric_indices now raises a dedicated error when a prim has no parent path, rather than bubbling up the generic "not found in selection" message.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Per the fragment-based changelog system (#5434), this PR ships fragments at source/isaaclab/changelog.d/fix-fabric-local-matrix.rst and source/isaaclab_physx/changelog.d/fix-fabric-local-matrix.rst instead of editing CHANGELOG.rst and config/extension.toml directly — the nightly CI workflow compiles those from the fragments.

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 6, 2026
@pv-nvidia pv-nvidia self-assigned this May 6, 2026
@pv-nvidia pv-nvidia marked this pull request as draft May 6, 2026 12:41
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR implements a fully Fabric-native get_local_poses / set_local_poses path on FabricFrameView, replacing the previous USD-fallback that caused stale reads after Fabric world-pose writes. It introduces four new Warp kernels for indexed Fabric matrix I/O and world↔local consistency, three persistent PrimSelection instances with per-selection index caches, a per-view dirty flag, and a (stage_id, fabric_id)-keyed IFabricHierarchy cache with an explicit clear_static_caches() teardown hook.

  • All previously flagged issues from review threads are addressed: per-selection fabric-index caches eliminate the shared-index corruption hazard; per-view _world_dirty flag prevents cross-view dirty-flag leakage; _static_hierarchy_cache key is (stage_id, fabric_id); clear_static_caches() is added and wired into test teardown.
  • New Fabric-native local-pose path: set_local_poses writes omni:fabric:localMatrix directly and marks the view dirty; the next get_world_poses fires update_indexed_world_matrix_from_local; symmetrically set_world_poses fires update_indexed_local_matrix_from_world so subsequent get_local_poses is consistent.
  • _sync_fabric_from_usd_initial seeds parent world matrices with hard-coded unit scale and child local matrices without scale, so prims with non-unit scale will produce incorrect world/local matrices after the first sync-kernel invocation.

Confidence Score: 3/5

Safe to merge for stages with translation-only parent prims; stages where any parent or child prim carries a non-unit scale will produce incorrect world/local matrices after the first set_local_poses → get_world_poses round-trip.

The initial Fabric seed in _sync_fabric_from_usd_initial calls Orthonormalize() on every parent transform and then writes the parent world matrix with a hard-coded (1,1,1) scale, silently discarding any real parent scale. The child local matrix is seeded without scale for the same reason. Both sync kernels read the parent world matrix on every world↔local recomputation, so a wrong parent scale corrupts every child pose update after the first dirty-flag sync.

_sync_fabric_from_usd_initial in fabric_frame_view.py — the parent-world-matrix and child-local-matrix seeding blocks.

Important Files Changed

Filename Overview
source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Core implementation of Fabric-native local-pose path; all previously flagged issues addressed, but _sync_fabric_from_usd_initial seeds parent world matrices with unit scale and child local matrices without scale, causing incorrect sync for prims with non-unit scale.
source/isaaclab/isaaclab/utils/warp/fabric.py Four new Warp kernels for indexed Fabric matrix decompose/compose and world↔local sync; transpose-convention is correct and well-documented.
source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py xfail removed, six new tests added including transpose-convention regression tests with a rotated parent; clear_static_caches() wired into autouse teardown fixture.
scripts/benchmarks/benchmark_view_comparison.py Added interleaved set→get and local-pose benchmarks; ProxyArray compatibility handled defensively; straightforward extension.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant FFV as FabricFrameView
    participant WK as Warp Kernels
    participant FA as Fabric (GPU)
    participant USD as USD/UsdGeom

    Note over FFV: _initialize_fabric()
    FFV->>USD: get_world_poses / get_local_poses / get_scales
    FFV->>USD: XformCache (parents)
    FFV->>FA: compose_indexed_fabric_transforms (world+local, children)
    FFV->>FA: compose_indexed_fabric_transforms (world, parents)

    Note over FFV: set_world_poses()
    C->>FFV: set_world_poses(positions, orientations)
    FFV->>WK: compose_indexed_fabric_transforms worldMatrix[rw]
    WK->>FA: write worldMatrix
    FFV->>WK: update_indexed_local_matrix_from_world
    WK->>FA: read worldMatrix[ro], parentWorld[ro]
    WK->>FA: write localMatrix[rw]

    Note over FFV: set_local_poses()
    C->>FFV: set_local_poses(translations, orientations)
    FFV->>WK: compose_indexed_fabric_transforms localMatrix[rw]
    WK->>FA: write localMatrix
    FFV->>FFV: "_world_dirty=True"

    Note over FFV: get_world_poses() dirty path
    C->>FFV: get_world_poses()
    FFV->>WK: update_indexed_world_matrix_from_local
    WK->>FA: read localMatrix[ro], parentWorld[ro]
    WK->>FA: write worldMatrix[rw]
    FFV->>FFV: "_world_dirty=False"
    FFV->>WK: decompose_indexed_fabric_transforms
    WK-->>C: (positions, orientations)
Loading

Reviews (7): Last reviewed commit: "Document isaaclab.utils.warp.fabric kern..." | Re-trigger Greptile

Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
Comment thread source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Outdated
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Fabric-aware get/set_local_poses + IK/OSC Integration

Updated review after new commits (5b4f91ebb778db)

New Changes in This Push

1. Task-space dynamics accessors (Major Feature)

  • Added body_link_jacobian_w, body_com_jacobian_w, mass_matrix, gravity_compensation_forces to ArticulationData
  • New shift_jacobian_com_to_origin Warp kernel converts PhysX COM-referenced Jacobians to link-origin
  • Proper cache invalidation in write paths (write_root_*_to_sim_index, write_joint_*_to_sim_index)
  • Full DoF axis preserved (6 floating-base columns for floating-base assets)

2. FabricFrameView Major Rework

  • Three persistent PrimSelection instances (trans_ro, world_rw, local_rw)
  • Path-based view→fabric index mapping via wp.indexedfabricarray
  • Proper parent-indexed arrays for world↔local transforms
  • Per-view dirty tracking (not per-stage) to avoid cross-view corruption
  • Class-level _static_hierarchy_cache for IFabricHierarchy handles
  • clear_static_caches() method for test cleanup

3. Extensive Test Coverage

  • Shape contract tests for Jacobian/mass matrix on fixed & floating base
  • test_get_jacobians_link_origin_contract: J·q̇ = body velocities ✅
  • test_get_mass_matrix_symmetry_pd: square, symmetric, positive-definite ✅
  • test_get_gravity_compensation_forces_static_equilibrium: τ_gc holds arm static ✅
  • IK tracking accuracy test with DifferentialIKController ✅
  • OSC tracking accuracy tests with gravity compensation ✅
  • Multi-view per-view dirty isolation test ✅

4. Newton Dependency Update

  • newton @ git+...@v1.2.0rc2newton==1.2.0rc3 (packaged release)

5. Dexsuite Config Updates

  • Switched from primitive shapes (CuboidCfg, SphereCfg) to mesh-based (MeshCuboidCfg, MeshSphereCfg)
  • Consolidated physics presets into base PhysicsCfg class
  • Removed PhysX hardcoding from IK/OSC env configs (now inherit parent preset)

Code Quality Notes

COM→origin Jacobian shift - The kernel correctly applies v_origin = v_com - ω × (R · com_offset_b) per DoF column

Cache invalidation - All write paths properly invalidate _body_com_jacobian_w, _mass_matrix, _gravity_compensation_forces

ProxyArray pattern - Lazy init pattern matches existing body_link_pose_w, joint_pos etc.

Floating-base DoF layout - Matches industry convention (Pinocchio, Drake, MuJoCo): 6 base-DoF columns prepended

FabricFrameView - Clean separation of indexed arrays per selection, proper sync kernels

Previous Fixes (still valid)

  • Parent world scale via Gf.Transform.GetScale()
  • wp.select() for branchless GPU execution ✅
  • test_initial_seed_with_scaled_parent regression test ✅

Summary

This is a substantial feature addition that:

  1. Enables proper IK/OSC control on PhysX backend with correct Jacobian reference frames
  2. Fixes latent correctness bug where COM-referenced Jacobians were used with link-frame poses
  3. Adds comprehensive test coverage for cross-backend parity
  4. Cleans up physics preset inheritance

Approval recommended.


Update (e104e0e): Refactored if/else branching to wp.where() in Warp kernels (compose_fabric_transformation_matrix_from_warp_arrays, compose_indexed_fabric_transforms). This is a clean branchless pattern for GPU execution — good style improvement. No new concerns.

@pv-nvidia pv-nvidia marked this pull request as ready for review May 11, 2026 14:27
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
@pv-nvidia pv-nvidia added the enhancement New feature or request label May 12, 2026
Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py Outdated
@pv-nvidia pv-nvidia changed the title perf: Implement Fabric-aware get/set_local_poses via indexedfabricarray perf: Implement Fabric-aware get/set_local_poses on FabricFrameView May 12, 2026
@pv-nvidia pv-nvidia added the bug Something isn't working label May 12, 2026
Comment thread source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 12, 2026
Comment on lines +808 to +812
parent_unit_scale = wp.array(
[[1.0, 1.0, 1.0]] * len(unique_parent_paths),
dtype=wp.float32,
device=self._device,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Parent world scale silently forced to (1,1,1) during initial seed

tf.Orthonormalize() strips scale from the USD local-to-world transform, and the subsequent compose call uses a hard-coded parent_unit_scale = [[1.0, 1.0, 1.0]]. Any parent prim with a non-unit world scale will have the wrong matrix in Fabric from the start. Both sync kernels (update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local) read this parent matrix for every world↔local recomputation, so the error propagates into every child world and local pose after that.

Suggested change
parent_unit_scale = wp.array(
[[1.0, 1.0, 1.0]] * len(unique_parent_paths),
dtype=wp.float32,
device=self._device,
)
parent_scale_rows: list[list[float]] = []
for path in unique_parent_paths:
prim = usd_stage.GetPrimAtPath(path)
full_tf = UsdGeom.XformCache(Usd.TimeCode.Default()).GetLocalToWorldTransform(prim)
s = full_tf.ExtractScaleFactors()
parent_scale_rows.append([float(s[0]), float(s[1]), float(s[2])])
parent_unit_scale = wp.array(
parent_scale_rows,
dtype=wp.float32,
device=self._device,
)

Comment on lines +772 to +787
# Compose into child localMatrix.
wp.launch(
kernel=fabric_utils.compose_indexed_fabric_transforms,
dim=self.count,
inputs=[
self._local_ifa_rw,
_to_float32_2d(local_pos_ta.warp),
_to_float32_2d(local_ori_ta.warp),
self._fabric_empty_2d_array_sentinel,
False,
False,
False,
self._view_indices,
],
device=self._device,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Child local-matrix scale not seeded from USD on freshly authored stages

When SetLocalXformFromUsd() is a no-op (freshly authored stage, not yet rendered), the child's localMatrix in Fabric starts at identity with scale (1,1,1). The compose call passes _fabric_empty_2d_array_sentinel for the scale slot, leaving the existing (identity) scale in the matrix. If a child prim has non-unit local scale, the first set_local_poses → get_world_poses round-trip calls update_indexed_world_matrix_from_local with the wrong local scale, producing an incorrect world-space scale. The fix is to pass scales_wp (already fetched above from _usd_view.get_scales()).

Suggested change
# Compose into child localMatrix.
wp.launch(
kernel=fabric_utils.compose_indexed_fabric_transforms,
dim=self.count,
inputs=[
self._local_ifa_rw,
_to_float32_2d(local_pos_ta.warp),
_to_float32_2d(local_ori_ta.warp),
self._fabric_empty_2d_array_sentinel,
False,
False,
False,
self._view_indices,
],
device=self._device,
)
# Compose into child localMatrix (include scale so that
# _sync_world_from_local_if_dirty produces the right world-space scale).
wp.launch(
kernel=fabric_utils.compose_indexed_fabric_transforms,
dim=self.count,
inputs=[
self._local_ifa_rw,
_to_float32_2d(local_pos_ta.warp),
_to_float32_2d(local_ori_ta.warp),
_to_float32_2d(scales_wp),
False,
False,
False,
self._view_indices,
],
device=self._device,
)

Comment thread scripts/benchmarks/benchmark_view_comparison.py
pv-nvidia added 5 commits May 13, 2026 21:57
Replaces the earlier Python-based parent-loop implementation (which was
correct but ~3× slower than USD for local poses) with a fully GPU-side
Fabric path that follows the bareya/pbarejko/camera-update prototype:

* Three persistent ``PrimSelection`` instances differing only in
  per-attribute access mode — one each for {trans_ro, world_rw, local_rw}.
* Path-based view → fabric index mapping computed once from
  ``selection.GetPaths()`` and stored as a Warp ``int32`` array.  No
  custom prim attributes are written to the stage.
* All transform reads and writes go through ``wp.indexedfabricarray``,
  so the kernels just dereference ``ifa[view_index]`` instead of taking
  a separate mapping argument.
* Stage-level ``IFabricHierarchy`` cache and dirty-stages set so multiple
  ``FabricFrameView`` instances on the same stage share state.

World ↔ local consistency is preserved through Warp kernels that run on
the affected write paths:

* ``set_local_poses`` writes ``omni:fabric:localMatrix`` directly via the
  compose kernel, then a second kernel recomputes child worldMatrix from
  ``parent_world * child_local`` so the next ``get_world_poses`` read is
  consistent.  ``IFabricHierarchy.update_world_xforms()`` is *not* used
  for this — in practice it re-reads USD's authored xformOps and would
  overwrite the matrices we just authored.
* ``set_world_poses`` mirrors the above, recomputing
  ``child_local = inv(parent_world) * child_world`` after the write.

Two new public Warp kernels in ``isaaclab.utils.warp.fabric``:

* ``decompose_indexed_fabric_transforms`` /
  ``compose_indexed_fabric_transforms`` — indexed-array variants of the
  existing decompose/compose kernels.
* ``update_indexed_local_matrix_from_world`` /
  ``update_indexed_world_matrix_from_local`` — propagate one direction
  using a parent indexed fabric array.  Implemented directly in storage
  convention (``local = world * inv(parent)``,
  ``world = local * parent``) — the four transposes the math-convention
  form would imply cancel out under ``(A·B)^T = B^T·A^T``.

Benchmark (1024 prims, 50 iterations, RTX A6000):

  Operation             USD (ms)   Fabric (ms)   Speedup
  Get World Poses        12.33      0.044         282×
  Set World Poses        27.98      0.117         240×
  Interleaved Set→Get    41.34      0.160         258×
  Get Local Poses         6.04      0.037         162×
  Set Local Poses         8.54      0.053         162×

Local-pose ops went from ~3× slower than USD (in the earlier
torch-based parent-loop implementation) to ~160× faster, with the
new Fabric-side localMatrix authoring keeping ``test_set_world_updates_local``
passing without an xfail override.

Tests: 41 passed in the Fabric backend's contract + new coverage for
the Fabric-native ``set_local_poses``, ``get_scales``, and topology
rebuild paths.

Drops the no-longer-needed ``cd571d482`` Python-loop attempt.
Three review fixes on the indexedfabricarray refactor:

* set_scales wrote the new scale into worldMatrix but never refreshed
  localMatrix, so a subsequent get_local_poses returned the stale
  scale.  Call _sync_local_from_world after the world write, matching
  set_world_poses.

* The view->fabric mapping was stored in a single shared
  _fabric_indices field that each accessor overwrote from its own
  selection's GetPaths() ordering.  Selections do not guarantee a
  shared path ordering, so this was brittle and hard to reason about.
  Cache the mapping per selection (_trans_ro_fabric_indices,
  _world_rw_fabric_indices, _local_rw_fabric_indices) and pass it
  explicitly to _build_indexed_array.  The three trans_ro accessors
  now share a _rebuild_trans_ro_arrays helper.

* Update two test comments that referenced the removed
  _fabric_usd_sync_done attribute to point at the lazy
  _initialize_fabric() call instead.
The chained set_world_poses -> get_world_poses round-trip in
test_fabric_rebuild_after_topology_change goes through Warp's float32
SRT compose/decompose, which accumulates a few ULP of drift.

At the test's position magnitudes (~4-6), one float32 ULP is
~4.77e-7, so the prior atol of 1e-7 demanded sub-ULP agreement and
was sensitive to GPU/codegen variation -- it passed locally on the
A6000 but flaked in CI.

1e-5 corresponds to roughly 20 ULP at those magnitudes: tight enough
to catch any real bug (a wrong index or stale read would be at least
~1e-3 off given the test setup) and consistent with the shared
contract harness in frame_view_contract_utils.py, which already
documents and uses ATOL = 1e-5 for compose/decompose-through-float32
checks.
pv-nvidia added 6 commits May 13, 2026 21:57
The previous _static_dirty_stages set was keyed by stage_id, but
_sync_world_from_local_if_dirty only recomputes worldMatrix for the
*calling view's* children before clearing the flag.  With two views
on the same stage, view B's world-read would clear the flag set by
view A's set_local_poses, leaving A's worlds silently stale.

Replace the class-level set with a per-instance bool.  Each view now
tracks its own dirty state, which matches the actual scope of the
recompute kernel and removes a mutable ClassVar.

Also:
- Raise a clearer error when _compute_parent_fabric_indices is asked
  to look up the parent of a root-level prim (rsplit produces ""),
  instead of bubbling up the generic "not found in selection" message.
- Document on the remaining _static_hierarchy_cache that it is not
  thread-safe by design (Isaac Lab's loop is single-threaded; adding a
  lock would negate the per-stage caching benefit).
- Update the module docstring to reflect the per-view dirty model and
  drop the stale reference to IFabricHierarchy.update_world_xforms.
Three review fixes plus the missing coverage for the transpose
storage convention.

* Hierarchy cache eviction: cache key is now (stage_id, fabric_id) so a
  recycled stage_id paired with a new Fabric attachment never returns a
  stale handle.  Added FabricFrameView.clear_static_caches() classmethod
  for explicit teardown and wired it into the test fixture so cached
  handles do not accumulate across the suite.

* PrepareForReuse no longer fires twice per sync.  Both _sync_local_from_world
  and _sync_world_from_local_if_dirty refresh trans_sel_ro exactly once,
  then read _world_ifa_ro / _local_ifa_ro / _parent_world_ifa_ro directly
  from the fields instead of going through the accessors.

* Class docstring rewritten to describe the actual PrepareForReuse policy
  (every accessor calls it; idempotent and cheap in the steady state).
  The prior wording claimed reads avoided PrepareForReuse, which has not
  been true since the indexedfabricarray rewrite.

* New regression tests for the transpose storage convention.  The standard
  fixture parents are translation-only, so the rotation block is identity
  and equal to its transpose - which means a wrong transpose convention
  would still pass every existing test.  Two new tests place a parent
  rotated 90 degrees around Z and verify the world<->local round-trip:
  - test_set_local_then_get_world_with_rotated_parent exercises
    update_indexed_world_matrix_from_local
  - test_set_world_then_get_local_with_rotated_parent exercises
    update_indexed_local_matrix_from_world
  Confirmed locally that flipping the multiply order in either kernel
  makes the matching test fail.

45 tests pass on cpu and cuda:0.
FabricFrameView is referenced by fully-qualified name in the migration
guide and in this PR's changelog fragment, but no RST file documented
the module - so the Sphinx :class: and :meth: cross-refs were not
resolvable.  Add a thin automodule page mirroring the sibling pages
under docs/source/api/lab_physx/ and register it in the API index
toctree.  This also picks up the new clear_static_caches() classmethod
automatically via :members:.
The submodule was not surfaced anywhere in the Sphinx tree, so :func:
cross-references to its kernels (added in the changelog fragment for
this branch and used by FabricFrameView throughout) did not resolve.

Add a Warp Fabric kernels subsection to isaaclab.utils.rst that
automodule's the submodule, and add __all__ to fabric.py so the
generated page lists only the eight public kernels - the type aliases
(FabricArrayMat44d, ArrayUInt32, ...) and the re-imported `wp` /
`TYPE_CHECKING` / `Any` symbols stay out of the rendered docs.

The page covers both the pre-existing kernels
(compose/decompose_fabric_transformation_matrix_*_warp_arrays,
set_view_to_fabric_array, arange_k) and the four kernels added on
this branch.
_sync_fabric_from_usd_initial had two scale-related bugs in the
USD->Fabric seed path that produced silently wrong matrices whenever a
parent or child had a non-unit scale.  Both kernels that recompute
world<->local consistency read those seeded matrices, so the error
propagated.

* Parent worldMatrix was composed with a hardcoded (1, 1, 1) scale.
  Orthonormalize() strips scale from the local-to-world transform, so
  we now extract the scale via Gf.Transform.GetScale() *before*
  orthonormalizing and pass it through to the compose kernel.

* Child localMatrix was composed with the empty-array sentinel for the
  scale slot, leaving the kernel-side scale at the identity default.
  We now pass the locally-authored scale (already fetched via
  _usd_view.get_scales()) so the matrix carries the right scale.

* Child worldMatrix is still composed from get_world_poses() position
  and orientation plus the child's local scale, which is wrong when a
  parent has non-unit world scale.  Instead of fixing the seed by hand
  (would require per-child world-scale lookups), mark the view dirty at
  the end of the seed.  The very next world read fires
  _sync_world_from_local_if_dirty, which computes
  child_world = parent_world * child_local on the GPU - and with both
  matrices now correctly scaled, the multiply produces the right
  world-space scale automatically.

Added test_initial_seed_with_scaled_parent regression test: parent
world scale (2, 1, 1), child local scale (3, 1, 1).  Locally verified
the test fails when either fix is reverted in isolation.
While adding a regression test for the per-view world-dirty flag, I
discovered the IFabricHierarchy cache silently misses on every view
because ``Stage.GetFabricId()`` returns a fresh ``FabricId`` wrapper
on every call, with no value equality between wrappers for the same
underlying Fabric.  The cache stored (stage_id, wrapper) tuples, so
two views on the same stage produced two distinct cache keys and
re-fetched the hierarchy on each init.  The bug was harmless in
practice -- USDRT's ``get_fabric_hierarchy`` itself returns a
process-wide singleton per Fabric stage, so both views happened to
end up with the same handle anyway -- but the cache wasn't doing the
work it was advertised to do.

Fix: key the cache on ``(stage_id, fabric_id.id)`` where ``.id`` is
the stable ``int`` underneath the wrapper.

The new test exercises the multi-view-per-stage scenario:

* two FabricFrameView instances on disjoint child prims under
  different parent sub-trees of one stage
* writes on view A must not dirty view B
* world reads on view B must not clear view A's dirty flag
* both views share one cached IFabricHierarchy and the cache has
  exactly one entry after both inits (this assertion is what
  surfaced the FabricId-wrapper bug)
* symmetric pass: writes on B must not affect A's post-read state

Verified locally that the test fails with both the wrapper-keyed
cache (cache size > 1) and with a synthetic stage-shared dirty flag
(cross-view stomp).

Module-level coverage of fabric_frame_view.py with this test added:
85% line / 78% branch.  Remaining uncovered code is the USD-fallback
delegations (Fabric disabled), defensive RuntimeError raises, and
topology-rebuild branches inside the accessors.

49 tests pass on cpu and cuda:0.
… kernels

Replace if/else broadcast branching with wp.where() for branchless
predicated selection. wp.select was deprecated in Warp 1.7 and removed
in 1.10; wp.where has the more intuitive (cond, true, false) order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant