Split RayCaster into backend-specific implementations#5510
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR refactors RayCaster and RayCasterCamera into backend-agnostic base classes (BaseRayCaster, BaseRayCasterCamera) with PhysX and Newton-specific implementations. The core motivation is to fix a staleness regression from #5179 where sensors parented under rigid bodies returned their spawn-time pose forever during headless training. The PhysX backend now tracks the parent rigid body directly via RigidObjectView instead of going through FabricFrameView. The implementation is architecturally sound and follows the established pattern used by Pva, FrameTransformer, and ContactSensor.
Architecture Impact
Cross-module impact is significant but well-contained:
RayCasterandRayCasterCamerabecomeFactoryBaseshims dispatching to backend-specific implementationsMultiMeshRayCasterandMultiMeshRayCasterCamerare-parent ontoBaseRayCaster/BaseRayCasterCamerabut retainFrameView-backed tracking (documented as xfail)- New packages:
isaaclab_physx.sensors.ray_casterandisaaclab_newton.sensors.ray_caster - All existing user-facing APIs remain unchanged; cfg surface is preserved
What calls the changed code:
- Any RL environment using height-scan observations via
RayCaster - Camera sensors using
RayCasterCamerafor fast depth rendering - Task configs referencing
RayCasterCfgorRayCasterCameraCfg
Implementation Verdict
Minor fixes needed — The core architecture is correct and the bug fix is validated by comprehensive tests. However, there are a few issues that should be addressed.
Test Coverage
Strong overall. The PR adds:
test_sensor_pose_tracks_falling_rigid_parentcovering all four sensor variants with xfail for MultiMesh- Alignment mode tests (
world,base,yaw) - Drift resampling test
- Integration tests for ArticulationView/RigidBodyView paths
Minor gap: No explicit test for the Newton backend's static-parent path (_is_static_parent=True), though the PhysX static path is implicitly tested.
CI Status
No CI checks available yet — cannot verify test pass status.
Findings
🔴 Critical: source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py:162 — Missing re-registration of world-origin site in _invalidate_body_tracker
The _invalidate_body_tracker method only re-registers sites via self._register_body_tracker() for non-static parents. However, _register_body_tracker registers both _world_origin_label and _site_label. On STOP→reinit, the docstring claims sites survive close/reinit, but NewtonManager.close() clears _cl_pending_sites. If _site_args_cached is True but the site labels are stale, the subsequent _initialize_body_tracker will fail when looking up labels in site_map.
def _invalidate_body_tracker(self) -> None:
self._newton_transforms = None
self._sensor_index = None
if not self._is_static_parent:
self._register_body_tracker() # Only re-registers if not staticThe logic seems correct for the rigid-body branch, but verify that _site_args_cached=True doesn't prevent re-registration when sites were actually cleared.
🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster.py:366-368 — _zero_data_buffers calls .zero_() on warp arrays via attributes
The method calls .zero_() on self._data._pos_w, self._data._quat_w, and self._data._ray_hits_w. These are wp.array objects, and while wp.array has a zero_() method, this pattern differs from the rest of the codebase which typically uses wp.launch with fill kernels for masked zeroing. If the physics event fires mid-step, this could race with kernel writes.
def _zero_data_buffers(self) -> None:
self._data._pos_w.zero_()
self._data._quat_w.zero_()
self._data._ray_hits_w.zero_()🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:152-153 — reset clones poses before combine_frame_transforms but indices may be tensor
pos_w, quat_w = self._get_sensor_world_poses(env_ids)
pos_w, quat_w = math_utils.combine_frame_transforms(
pos_w.clone(), quat_w.clone(), self._offset_pos[env_ids], self._offset_quat[env_ids]
)When env_ids is a tensor, self._offset_pos[env_ids] and self._offset_quat[env_ids] return views. This is fine, but the .clone() calls suggest concern about in-place modification. The combine function doesn't modify inputs in-place, so the clones are defensive but unnecessary overhead.
🟡 Warning: source/isaaclab_physx/isaaclab_physx/sensors/ray_caster/ray_caster.py:93-96 — Offset composition mutates warp-backed torch views in-place
if self._fixed_pos_b is not None:
offset_pos = wp.to_torch(self._offset_pos_wp)
offset_quat = wp.to_torch(self._offset_quat_wp)
cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone()
offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos)
offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat)This correctly clones before reading to avoid self-aliasing, but the composition formula new_offset = T_body_to_xform * existing_offset assumes the existing offset is identity for the base RayCaster case. The camera path writes cfg.offset into these buffers in _initialize_rays_impl before _initialize_body_tracker runs. Verify that the camera's _initialize_rays_impl runs before _initialize_body_tracker in the MRO chain — it does, per line 135-137 in base_ray_caster.py.
🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/multi_mesh_ray_caster.py:352-358 — Duplicate buffer allocation pattern
self._ray_distance_w = wp.zeros((self._num_envs, self.num_rays), dtype=wp.float32, device=self._device)This is allocated in MultiMeshRayCaster._initialize_rays_impl after calling super()._initialize_rays_impl(). The base class already allocates _dummy_ray_distance but with size (1,1). Consider documenting why MultiMesh needs the full-sized buffer (atomic_min across meshes) vs the dummy in the base.
🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:509-525 — Deprecated functions still present
The deprecated _compute_view_world_poses and _compute_camera_world_poses methods emit warnings but remain. Consider adding a # TODO: Remove in v3.0 comment with the deprecation version for tracking.
🔵 Improvement: source/isaaclab/test/sensors/test_ray_caster_sensor.py:377-378 — Test uses hardcoded ground-truth for PhysX body view
body_z = _parent_body_z_ground_truth(sensor)
if body_z is not None:
assert abs(z_after - body_z) < 1e-4, ...This tight assertion (1e-4) is appropriate for PhysX but the function returns None for Newton. Consider adding an equivalent ground-truth check for Newton using SensorFrameTransform.transforms if available.
Greptile SummaryThis PR refactors
Confidence Score: 5/5The core bug fix (PhysX RigidBodyView replacing FabricFrameView for pose tracking) is cleanly isolated to the new mixin; the cloner refactor is additive and the fallback homogeneous path is unchanged in behavior. The backend split is architecturally sound: the mixin contract is narrow (_initialize_pose_tracking + _get_view_transforms_wp), both backends implement it consistently, and the existing warp kernel pipeline is untouched. The two flagged items are quality observations rather than functional defects. No correctness regressions were found in the primary single-mesh PhysX or Newton sensor paths. source/isaaclab/isaaclab/sensors/ray_caster/_target_tracker_utils.py (never imported — intended shared logic that wasn't wired up) and source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py (_update_mesh_transforms per-step allocations). Important Files Changed
Sequence DiagramsequenceDiagram
participant Scene as InteractiveScene
participant Base as BaseRayCaster
participant Mixin as PhysX/Newton Mixin
participant Kernel as Warp Kernel
Note over Scene: Construction
Scene->>Scene: _build_clone_plan_from_cfg()
Scene->>Base: __init__(cfg)
Base->>Base: _resolve_and_spawn("raycaster")
Note over Mixin: Newton only: __init__
Mixin->>Mixin: _register_sites_for_expr(prim_path)
Note over Base: physics-ready callback
Base->>Mixin: _initialize_pose_tracking()
Mixin-->>Base: "sets _view=self, _view_count, _offset_pos_wp"
Base->>Base: _initialize_warp_meshes()
Base->>Base: _initialize_rays_impl()
Note over Base: per-step update
Base->>Mixin: _get_view_transforms_wp()
Mixin-->>Base: wp.array[transformf] (body or site poses)
Base->>Kernel: update_ray_caster_kernel(transforms, _offset_pos_wp, ...)
Kernel-->>Base: pos_w, quat_w, ray_starts_w, ray_dirs_w
Base->>Kernel: raycast_mesh_masked_kernel(...)
Kernel-->>Base: ray_hits_w
Reviews (2): Last reviewed commit: "multi raycaster work" | Re-trigger Greptile |
| # apply a single offset to the body pose. Skipped when ancestor==prim | ||
| # (no fixed offset to compose). ``wp.to_torch`` returns zero-copy alias | ||
| # views; slice-writes update warp memory the kernel reads. | ||
| if self._fixed_pos_b is not None: | ||
| offset_pos = wp.to_torch(self._offset_pos_wp) | ||
| offset_quat = wp.to_torch(self._offset_quat_wp) | ||
| cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone() | ||
| offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos) | ||
| offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat) | ||
|
|
||
| def _get_sensor_transforms_wp(self) -> wp.array: |
There was a problem hiding this comment.
Offset baking corrupts camera
reset() / set_world_poses()
_offset_pos_wp / _offset_quat_wp are warp-owned memory, and _offset_pos / _offset_quat (created in BaseRayCasterCamera._initialize_rays_impl) are zero-copy torch aliases of the same memory. The baking on lines 127–128 overwrites those warp arrays with T_body_camera so the raycasting kernel sees a single composed offset — correct. However, BaseRayCasterCamera.reset() reads self._offset_pos[env_ids] expecting T_xform_camera (the config-level camera offset), so it computes T_world_xform * T_body_camera instead of T_world_xform * T_xform_camera = T_world_camera, storing a wrong pose in _data.pos_w / quat_w_world.
set_world_poses() is even more fragile: it writes a new T_xform_camera back into _offset_pos (and therefore into _offset_pos_wp), silently overwriting the baked T_body_camera, so the kernel subsequently applies body_pose * T_xform_camera without the body-to-xform offset — a wrong ray origin for every subsequent step.
This regression is specific to RayCasterCamera (PhysX) when the sensor xform is parented below a rigid body (ancestor != prim). The fix is to preserve T_xform_camera in a separate, non-shared buffer so reset() and set_world_poses() apply it correctly, while only _offset_pos_wp/_offset_quat_wp (kernel path) hold the baked T_body_camera.
There was a problem hiding this comment.
Good catch — this is a real correctness bug (and my regression tests did not exercise it because they use cfg.offset = identity and a sensor Xform with identity USD-local translation, so T_body_camera == cfg.offset).
Fixed in 32d9109 with option (a): drop the bake entirely and compose body_pose * T_body_to_xform per step via a tiny wp.kernel into a pre-allocated _sensor_transforms_wp buffer. _offset_pos_wp / _offset_quat_wp (and their torch aliases on the camera path) stay holding cfg.offset, so reset() and set_world_poses() see the right value.
Added two regression tests in test_ray_caster_sensor.py:
test_camera_offset_buffer_survives_body_tracker_initasserts_offset_posstill equalscfg.offsetafter_initialize_body_trackerunder a rigid parent. (Direct invariant — would have caught the original bake.)test_camera_set_world_poses_under_rigid_parentexercises the user-visible failure: kinematic rigid parent atz=5, sensor Xform at(0, 0.1, 0)USD-local,set_world_poses(positions=[(1, 2, 6)]), then assertdata.pos_w[0] ≈ (1, 2, 6)within 1 cm. Pre-fix path would land at(1, 1.9, 6)(off by they=0.1body-to-Xform offset).
Per-step cost is one extra wp.launch of a transform-multiply kernel ((num_envs,) threads). Negligible vs the existing ray-cast launch.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
The new commits appear to be minor (diff shows no substantive changes to the code logic). Reviewing against previous findings:
-
Previous Critical (Newton site re-registration): Still present at the same location. The
_invalidate_body_trackerlogic remains unchanged — the concern about_site_args_cached=Truepotentially preventing re-registration when sites were cleared byNewtonManager.close()still applies. However, looking more carefully at the code flow:_register_body_trackerchecksif not self._site_args_cachedbefore the USD walk, but it always callscl_register_sitefor non-static parents (lines 89-91). So re-registration does happen. This previous concern was overstated — the code is correct. -
Previous Warnings: All remain unchanged but are minor/defensive-coding issues, not bugs.
-
Previous Improvements: Documentation suggestions still applicable but non-blocking.
CI is still pending — recommend waiting for CI to pass before merging.
Verdict
Ship it once CI passes. The architecture is sound, tests are comprehensive, and the bug fix is well-validated. No new issues introduced.
|
Thanks for the review! Triage on each item: 🔴 Newton 🟡 🟡 Unnecessary 🟡 PhysX offset composition / camera ordering — already validated in the same comment ("does, per line 135-137"). No action. 🔵 Multi-mesh 🔵 Deprecated method TODO version comments — push back. 🔵 Newton ground-truth in regression test — push back. A real Newton ground-truth would need to read So: 1 fix landed, 6 push-backs above. Happy to reconsider any of them — let me know if I'm misreading something. |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
The new commits show no substantive changes to the code logic — the diff appears identical to what was previously reviewed. The previous bot review correctly identified that the Newton site re-registration concern was overstated (the code does re-register sites correctly).
CI is still pending — all test jobs show ⏳ status. Recommend waiting for CI to pass before merging.
Verdict
Ship it once CI passes. No new issues introduced.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
No code changes since the previous review — the diff is identical. The previous review correctly identified this as ready to ship.
CI Status Update: Most jobs are still pending (⏳), but isaaclab_ov and isaaclab_assets have passed (✅). The core isaaclab test jobs that would exercise the ray-caster changes are still running.
Verdict
Ship it once CI completes. No new concerns.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
No code changes since the previous review — the diff is identical to the last review cycle.
CI Status Update: More jobs have completed. Key results:
- ✅
isaaclab_ov,isaaclab_assets,isaaclab_teleoppassed - ✅
pre-commit,Build Wheel,Check changelog fragmentspassed - ⏳ Core test jobs (
isaaclab (core),isaaclab_physx,isaaclab_newton,isaaclab_tasks) still pending
Verdict
Ship it once the pending CI jobs (especially isaaclab (core) and backend-specific tests) complete successfully. No new concerns.
155b06e to
7a17c3d
Compare
14d3e5b to
e746b7b
Compare
|
@greptile-apps review again |
e746b7b to
ef4f340
Compare
3385e58 to
ca933c6
Compare
There was a problem hiding this comment.
Code Review Summary
This PR refactors RayCaster and RayCasterCamera (plus MultiMesh variants) into a backend-agnostic base with per-backend implementations for PhysX and Newton. The primary goal is fixing the staleness regression from #5179 where sensors parented under articulations/rigid bodies returned spawn-time poses.
✅ Strengths
-
Clean architecture: Base classes contain backend-agnostic logic while PhysX/Newton mixins handle only pose tracking. Consistent with the split pattern used by
Pva,FrameTransformer, andContactSensor. -
Proper factory dispatch:
FactoryBaseshims correctly dispatch to backend implementations. -
Critical bug fix: Directly addresses the staleness issue where sensors returned spawn-time poses forever during headless training.
-
Good test coverage: New tests validate physics body parent motion tracking and heterogeneous ClonePlan consumption.
🔍 Observations
-
Rectangular mesh table padding (Acceptable): Heterogeneous scenes pad shorter environments with a dummy mesh at
(1e9, 1e9, 1e9)to maintain kernel ABI compatibility. The PR notes a follow-up is planned for a new kernel. -
Newton site registration timing: Sites are registered in
__init__before cloning occurs, which is correct and matches existing patterns. -
MultiMesh staleness persists for FrameView-backed targets (Known limitation): Tracked as
xfailwith a planned follow-up.
📋 Minor Suggestions
-
Changelog fragment CI failure: The
Check changelog fragmentscheck is failing - may need a formatting adjustment. -
Type hints in mixins: Some methods use
self: Anydue to mixin patterns. Consider adding a brief docstring explaining the expected MRO chain for maintainability. -
PhysX body view cleanup:
_physx_body_viewis created but not explicitly released in__del__. This is likely fine since views are cleaned up with simulation, but worth noting.
✅ Verdict
Well-structured refactoring that correctly addresses the staleness regression while maintaining API compatibility. The architecture follows established patterns and includes proper testing for the new behavior.
Files reviewed
base_ray_caster.py,base_ray_caster_camera.py,base_multi_mesh_ray_caster.py,base_multi_mesh_ray_caster_camera.pyisaaclab_physx/sensors/ray_caster/*.pyisaaclab_newton/sensors/ray_caster/*.py- Test files:
test_ray_caster_sensor.py,test_ray_caster_integration.py,test_multi_mesh_ray_caster_camera.py - Changelog fragments
StafaH
left a comment
There was a problem hiding this comment.
Nice PR! Left some comments.
Overall there's alot of switching between torch and warp ops, which could be cleaned up to use warp only. Also we could use ProxyArrays in all the base classes to clean things up even more. Nothing critical for this PR though, can merge it in and clean up later
| self._num_meshes_per_env: dict[str, int] = {} | ||
|
|
||
| self._raycast_targets_cfg: list[MultiMeshRayCasterCfg.RaycastTargetCfg] = [] | ||
| for target in self.cfg.mesh_prim_paths: |
There was a problem hiding this comment.
Why not merge the two target_cfg loops?
| mesh_orientations: list[list[tuple[float, float, float, float]]] = [] | ||
|
|
||
| # Pack all targets into the rectangular table used by the Warp kernel. | ||
| for env_id in range(self._num_envs): |
There was a problem hiding this comment.
This can be vectorized with numpy. Also seems the dummy record is only used to extend the list. Can we not allocate the array in advance?
Even better it seems you could populate it with a warp kernel, since the end result is a warp kernel below?
| ) | ||
| self._mesh_positions_w_torch = wp.to_torch(self._mesh_positions_w) | ||
| self._mesh_orientations_w_torch = wp.to_torch(self._mesh_orientations_w) | ||
| self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device) |
| self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device) | ||
| self._mesh_orientations_w_torch[:] = torch.tensor(mesh_orientations, dtype=torch.float32, device=self.device) | ||
|
|
||
| def _build_mesh_records(self, target_cfg: _RaycastTargetCfg, plan: ClonePlan | None, dummy_mesh_id: int | None): |
There was a problem hiding this comment.
Why use an alias for _RaycastTargetCfg?
|
|
||
| # Prefer ClonePlan data for env-scoped targets; destination USD prims may not exist. | ||
| if plan is not None: | ||
| target_path = re.sub(r"env_\.\*", "env_0", target_cfg.prim_expr) |
There was a problem hiding this comment.
Can we not use target_cfg.prim_expr.replace here?
| meshes: ClassVar[dict[tuple[str, str], wp.Mesh]] = {} | ||
| """A dictionary to store warp meshes for raycasting, shared across all instances. | ||
|
|
||
| The keys are ``(prim_path, device)`` tuples and values are the corresponding warp Mesh objects. |
There was a problem hiding this comment.
Why are warp meshes being created on both devices? Should it not just be on the device currently set globally?
|
|
||
| # read prims to ray-cast | ||
| for mesh_prim_path in self.cfg.mesh_prim_paths: | ||
| mesh_key = (mesh_prim_path, self._device) |
There was a problem hiding this comment.
Same as previous comment, seems _device will not change? Then we don't need to key with tuple
|
|
||
| def _initialize_rays_impl(self): | ||
| # Create all indices buffer | ||
| self._ALL_INDICES = torch.arange(self._view.count, device=self._device, dtype=torch.long) |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _MeshRecord = tuple[int, tuple[float, float, float], tuple[float, float, float, float]] |
There was a problem hiding this comment.
In alot of places this tuple exists for vec3 and vec4/quat, maybe we can alias the type somewhere so it's a bit easier to read. If we convert more of the code to warp we can use wp.vec3 and wp.quat which would be alot cleaner
| def __str__(self) -> str: | ||
| """Returns: A string containing information about the instance.""" | ||
| return ( | ||
| f"Ray-caster @ '{self.cfg.prim_path}': \n" |
There was a problem hiding this comment.
BaseRayCaster has the same str overload, can we just call that one?
be77c7c to
7b43350
Compare
Description
Refactors
RayCasterandRayCasterCamerainto a backend-agnostic base + per-backend implementations underisaaclab_physx.sensors.ray_casterandisaaclab_newton.sensors.ray_caster, mirroring the iconic split pattern already used byPva,FrameTransformer, andContactSensor.The PhysX backend tracks the parent rigid body directly via
RigidObjectViewinstead of going throughFabricFrameView, which fixes the staleness regression from #5179: sensors parented under an articulation / rigid body were returning their spawn-time pose forever during headless training, silently freezing height-scan observations in rough-terrain locomotion (and any similar IMU / camera path that read throughFrameView).The Newton backend uses the site-based pattern from
Pva/FrameTransformer: walk USD to the rigid-body ancestor, register a body-attached site viaNewtonManager.cl_register_site, and read per-step transforms off aSensorFrameTransformagainst a shared world-origin reference. Static parents bypass the site machinery (a singlebody=-1global site can't represent per-env world origins) and serve a cached per-envwp.transformfarray.MultiMeshRayCaster/MultiMeshRayCasterCamerare-parent onto the new base but keep theirFrameView-backed body tracker, so the staleness behavior persists there. Tracked asxfailintest_ray_caster_sensor.py— extending the backend split to MultiMesh is a follow-up.The cfg surface,
class_typestrings, and runtime semantics are unchanged for callers; existing user code does not need to migrate.Fixes #5476 (the
FabricFrameViewcontract regression-test PR — the bug it documents is fixed for the single-mesh path here).Type of change
Screenshots
N/A — backend refactor, no UI changes.
Checklist