Skip to content

Split RayCaster into backend-specific implementations#5510

Open
ooctipus wants to merge 6 commits into
isaac-sim:developfrom
ooctipus:octi/raycaster-backend-split
Open

Split RayCaster into backend-specific implementations#5510
ooctipus wants to merge 6 commits into
isaac-sim:developfrom
ooctipus:octi/raycaster-backend-split

Conversation

@ooctipus
Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus commented May 6, 2026

Description

Refactors RayCaster and RayCasterCamera into a backend-agnostic base + per-backend implementations under isaaclab_physx.sensors.ray_caster and isaaclab_newton.sensors.ray_caster, mirroring the iconic split pattern already used by Pva, FrameTransformer, and ContactSensor.

The PhysX backend tracks the parent rigid body directly via RigidObjectView instead of going through FabricFrameView, which fixes the staleness regression from #5179: sensors parented under an articulation / rigid body were returning their spawn-time pose forever during headless training, silently freezing height-scan observations in rough-terrain locomotion (and any similar IMU / camera path that read through FrameView).

The Newton backend uses the site-based pattern from Pva / FrameTransformer: walk USD to the rigid-body ancestor, register a body-attached site via NewtonManager.cl_register_site, and read per-step transforms off a SensorFrameTransform against a shared world-origin reference. Static parents bypass the site machinery (a single body=-1 global site can't represent per-env world origins) and serve a cached per-env wp.transformf array.

MultiMeshRayCaster / MultiMeshRayCasterCamera re-parent onto the new base but keep their FrameView-backed body tracker, so the staleness behavior persists there. Tracked as xfail in test_ray_caster_sensor.py — extending the backend split to MultiMesh is a follow-up.

The cfg surface, class_type strings, and runtime semantics are unchanged for callers; existing user code does not need to migrate.

Fixes #5476 (the FabricFrameView contract regression-test PR — the bug it documents is fixed for the single-mesh path here).

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Screenshots

N/A — backend refactor, no UI changes.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the `pre-commit` checks with `./isaaclab.sh --format`
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under `source//changelog.d/` for every touched package (do not edit `CHANGELOG.rst` or bump `extension.toml` — CI handles that)
  • I have added my name to the `CONTRIBUTORS.md` or my name already exists there

@ooctipus ooctipus requested a review from pascal-roth as a code owner May 6, 2026 05:00
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 6, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR refactors RayCaster and RayCasterCamera into backend-agnostic base classes (BaseRayCaster, BaseRayCasterCamera) with PhysX and Newton-specific implementations. The core motivation is to fix a staleness regression from #5179 where sensors parented under rigid bodies returned their spawn-time pose forever during headless training. The PhysX backend now tracks the parent rigid body directly via RigidObjectView instead of going through FabricFrameView. The implementation is architecturally sound and follows the established pattern used by Pva, FrameTransformer, and ContactSensor.

Architecture Impact

Cross-module impact is significant but well-contained:

  • RayCaster and RayCasterCamera become FactoryBase shims dispatching to backend-specific implementations
  • MultiMeshRayCaster and MultiMeshRayCasterCamera re-parent onto BaseRayCaster/BaseRayCasterCamera but retain FrameView-backed tracking (documented as xfail)
  • New packages: isaaclab_physx.sensors.ray_caster and isaaclab_newton.sensors.ray_caster
  • All existing user-facing APIs remain unchanged; cfg surface is preserved

What calls the changed code:

  • Any RL environment using height-scan observations via RayCaster
  • Camera sensors using RayCasterCamera for fast depth rendering
  • Task configs referencing RayCasterCfg or RayCasterCameraCfg

Implementation Verdict

Minor fixes needed — The core architecture is correct and the bug fix is validated by comprehensive tests. However, there are a few issues that should be addressed.

Test Coverage

Strong overall. The PR adds:

  • test_sensor_pose_tracks_falling_rigid_parent covering all four sensor variants with xfail for MultiMesh
  • Alignment mode tests (world, base, yaw)
  • Drift resampling test
  • Integration tests for ArticulationView/RigidBodyView paths

Minor gap: No explicit test for the Newton backend's static-parent path (_is_static_parent=True), though the PhysX static path is implicitly tested.

CI Status

No CI checks available yet — cannot verify test pass status.

Findings

🔴 Critical: source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py:162 — Missing re-registration of world-origin site in _invalidate_body_tracker

The _invalidate_body_tracker method only re-registers sites via self._register_body_tracker() for non-static parents. However, _register_body_tracker registers both _world_origin_label and _site_label. On STOP→reinit, the docstring claims sites survive close/reinit, but NewtonManager.close() clears _cl_pending_sites. If _site_args_cached is True but the site labels are stale, the subsequent _initialize_body_tracker will fail when looking up labels in site_map.

def _invalidate_body_tracker(self) -> None:
    self._newton_transforms = None
    self._sensor_index = None
    if not self._is_static_parent:
        self._register_body_tracker()  # Only re-registers if not static

The logic seems correct for the rigid-body branch, but verify that _site_args_cached=True doesn't prevent re-registration when sites were actually cleared.

🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster.py:366-368 — _zero_data_buffers calls .zero_() on warp arrays via attributes

The method calls .zero_() on self._data._pos_w, self._data._quat_w, and self._data._ray_hits_w. These are wp.array objects, and while wp.array has a zero_() method, this pattern differs from the rest of the codebase which typically uses wp.launch with fill kernels for masked zeroing. If the physics event fires mid-step, this could race with kernel writes.

def _zero_data_buffers(self) -> None:
    self._data._pos_w.zero_()
    self._data._quat_w.zero_()
    self._data._ray_hits_w.zero_()

🟡 Warning: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:152-153 — reset clones poses before combine_frame_transforms but indices may be tensor

pos_w, quat_w = self._get_sensor_world_poses(env_ids)
pos_w, quat_w = math_utils.combine_frame_transforms(
    pos_w.clone(), quat_w.clone(), self._offset_pos[env_ids], self._offset_quat[env_ids]
)

When env_ids is a tensor, self._offset_pos[env_ids] and self._offset_quat[env_ids] return views. This is fine, but the .clone() calls suggest concern about in-place modification. The combine function doesn't modify inputs in-place, so the clones are defensive but unnecessary overhead.

🟡 Warning: source/isaaclab_physx/isaaclab_physx/sensors/ray_caster/ray_caster.py:93-96 — Offset composition mutates warp-backed torch views in-place

if self._fixed_pos_b is not None:
    offset_pos = wp.to_torch(self._offset_pos_wp)
    offset_quat = wp.to_torch(self._offset_quat_wp)
    cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone()
    offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos)
    offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat)

This correctly clones before reading to avoid self-aliasing, but the composition formula new_offset = T_body_to_xform * existing_offset assumes the existing offset is identity for the base RayCaster case. The camera path writes cfg.offset into these buffers in _initialize_rays_impl before _initialize_body_tracker runs. Verify that the camera's _initialize_rays_impl runs before _initialize_body_tracker in the MRO chain — it does, per line 135-137 in base_ray_caster.py.

🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/multi_mesh_ray_caster.py:352-358 — Duplicate buffer allocation pattern

self._ray_distance_w = wp.zeros((self._num_envs, self.num_rays), dtype=wp.float32, device=self._device)

This is allocated in MultiMeshRayCaster._initialize_rays_impl after calling super()._initialize_rays_impl(). The base class already allocates _dummy_ray_distance but with size (1,1). Consider documenting why MultiMesh needs the full-sized buffer (atomic_min across meshes) vs the dummy in the base.

🔵 Improvement: source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py:509-525 — Deprecated functions still present

The deprecated _compute_view_world_poses and _compute_camera_world_poses methods emit warnings but remain. Consider adding a # TODO: Remove in v3.0 comment with the deprecation version for tracking.

🔵 Improvement: source/isaaclab/test/sensors/test_ray_caster_sensor.py:377-378 — Test uses hardcoded ground-truth for PhysX body view

body_z = _parent_body_z_ground_truth(sensor)
if body_z is not None:
    assert abs(z_after - body_z) < 1e-4, ...

This tight assertion (1e-4) is appropriate for PhysX but the function returns None for Newton. Consider adding an equivalent ground-truth check for Newton using SensorFrameTransform.transforms if available.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR refactors RayCaster and RayCasterCamera from a single FabricFrameView-backed implementation into a backend-agnostic base class plus per-backend mixins (_PhysXRayCasterMixin / _NewtonRayCasterMixin). The PhysX path now drives pose tracking directly through a RigidBodyView, eliminating the stale-pose regression introduced in #5179. A parallel scene-level refactor replaces TemplateCloneCfg / template-root USD prims with a cfg-driven flat ClonePlan, enabling heterogeneous environment layouts without a separate template subtree.

  • BaseRayCaster / BaseRayCasterCamera extract all backend-neutral ray logic; backend mixins supply _initialize_pose_tracking and _get_view_transforms_wp. MultiMeshRayCaster* re-parents onto the new base but retains the legacy FrameView-backed tracker (marked xfail in tests).
  • InteractiveScene._build_clone_plan_from_cfg now constructs the flat ClonePlan from declared asset configs before _add_entities_from_cfg, supporting both homogeneous and heterogeneous scenes through a single code path.

Confidence Score: 5/5

The core bug fix (PhysX RigidBodyView replacing FabricFrameView for pose tracking) is cleanly isolated to the new mixin; the cloner refactor is additive and the fallback homogeneous path is unchanged in behavior.

The backend split is architecturally sound: the mixin contract is narrow (_initialize_pose_tracking + _get_view_transforms_wp), both backends implement it consistently, and the existing warp kernel pipeline is untouched. The two flagged items are quality observations rather than functional defects. No correctness regressions were found in the primary single-mesh PhysX or Newton sensor paths.

source/isaaclab/isaaclab/sensors/ray_caster/_target_tracker_utils.py (never imported — intended shared logic that wasn't wired up) and source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py (_update_mesh_transforms per-step allocations).

Important Files Changed

Filename Overview
source/isaaclab_physx/isaaclab_physx/sensors/ray_caster/ray_caster.py New PhysX backend: _PhysXRayCasterMixin implements _initialize_pose_tracking via RigidBodyView (fixing the FabricFrameView staleness bug) and _get_view_transforms_wp returning live body poses. Static-parent path caches poses once at init.
source/isaaclab_newton/isaaclab_newton/sensors/ray_caster/ray_caster.py New Newton backend: _NewtonRayCasterMixin registers Newton sites at construction time, resolves them at init, and reads per-step poses via a Warp kernel. Per-step warp allocations in _update_mesh_transforms are a performance concern.
source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster.py New backend-agnostic base class extracted from old RayCaster; defines _initialize_pose_tracking as abstract, all concrete ray logic and warp kernel launches remain here.
source/isaaclab/isaaclab/sensors/ray_caster/base_ray_caster_camera.py New backend-agnostic camera base class; _initialize_rays_impl allocates _offset_pos_wp/_offset_quat_wp from cfg.offset (replacing whatever the backend mixin set), set_world_poses and reset read from these offset buffers.
source/isaaclab/isaaclab/sensors/ray_caster/_target_tracker_utils.py New shared utility module for multi-mesh target prototype walking; however, all three exported functions are dead code — the module is never imported by any backend implementation.
source/isaaclab/isaaclab/sensors/ray_caster/base_multi_mesh_ray_caster.py Extracted from old MultiMeshRayCaster; ClonePlan-aware mesh table initialization and _update_mesh_transforms base are unchanged in behavior but now dispatch to backend-specific _create_tracked_target_view.
source/isaaclab/isaaclab/scene/interactive_scene.py Cloner refactored from TemplateCloneCfg to CloneCfg; _build_clone_plan_from_cfg now drives heterogeneous scene variant assignment; template_root Xform define removed.
source/isaaclab/isaaclab/cloner/clone_plan.py ClonePlan restructured from per-group (dest_template + prototype_paths) to flat (sources + destinations + clone_mask); eq=False added and the 'each column sums to exactly one' invariant note was removed from the docstring.

Sequence Diagram

sequenceDiagram
    participant Scene as InteractiveScene
    participant Base as BaseRayCaster
    participant Mixin as PhysX/Newton Mixin
    participant Kernel as Warp Kernel

    Note over Scene: Construction
    Scene->>Scene: _build_clone_plan_from_cfg()
    Scene->>Base: __init__(cfg)
    Base->>Base: _resolve_and_spawn("raycaster")

    Note over Mixin: Newton only: __init__
    Mixin->>Mixin: _register_sites_for_expr(prim_path)

    Note over Base: physics-ready callback
    Base->>Mixin: _initialize_pose_tracking()
    Mixin-->>Base: "sets _view=self, _view_count, _offset_pos_wp"

    Base->>Base: _initialize_warp_meshes()
    Base->>Base: _initialize_rays_impl()

    Note over Base: per-step update
    Base->>Mixin: _get_view_transforms_wp()
    Mixin-->>Base: wp.array[transformf] (body or site poses)
    Base->>Kernel: update_ray_caster_kernel(transforms, _offset_pos_wp, ...)
    Kernel-->>Base: pos_w, quat_w, ray_starts_w, ray_dirs_w
    Base->>Kernel: raycast_mesh_masked_kernel(...)
    Kernel-->>Base: ray_hits_w
Loading

Reviews (2): Last reviewed commit: "multi raycaster work" | Re-trigger Greptile

Comment on lines +118 to +128
# apply a single offset to the body pose. Skipped when ancestor==prim
# (no fixed offset to compose). ``wp.to_torch`` returns zero-copy alias
# views; slice-writes update warp memory the kernel reads.
if self._fixed_pos_b is not None:
offset_pos = wp.to_torch(self._offset_pos_wp)
offset_quat = wp.to_torch(self._offset_quat_wp)
cur_pos, cur_quat = offset_pos.clone(), offset_quat.clone()
offset_pos[:] = self._fixed_pos_b + math_utils.quat_apply(self._fixed_quat_b, cur_pos)
offset_quat[:] = math_utils.quat_mul(self._fixed_quat_b, cur_quat)

def _get_sensor_transforms_wp(self) -> wp.array:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Offset baking corrupts camera reset() / set_world_poses()

_offset_pos_wp / _offset_quat_wp are warp-owned memory, and _offset_pos / _offset_quat (created in BaseRayCasterCamera._initialize_rays_impl) are zero-copy torch aliases of the same memory. The baking on lines 127–128 overwrites those warp arrays with T_body_camera so the raycasting kernel sees a single composed offset — correct. However, BaseRayCasterCamera.reset() reads self._offset_pos[env_ids] expecting T_xform_camera (the config-level camera offset), so it computes T_world_xform * T_body_camera instead of T_world_xform * T_xform_camera = T_world_camera, storing a wrong pose in _data.pos_w / quat_w_world.

set_world_poses() is even more fragile: it writes a new T_xform_camera back into _offset_pos (and therefore into _offset_pos_wp), silently overwriting the baked T_body_camera, so the kernel subsequently applies body_pose * T_xform_camera without the body-to-xform offset — a wrong ray origin for every subsequent step.

This regression is specific to RayCasterCamera (PhysX) when the sensor xform is parented below a rigid body (ancestor != prim). The fix is to preserve T_xform_camera in a separate, non-shared buffer so reset() and set_world_poses() apply it correctly, while only _offset_pos_wp/_offset_quat_wp (kernel path) hold the baked T_body_camera.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is a real correctness bug (and my regression tests did not exercise it because they use cfg.offset = identity and a sensor Xform with identity USD-local translation, so T_body_camera == cfg.offset).

Fixed in 32d9109 with option (a): drop the bake entirely and compose body_pose * T_body_to_xform per step via a tiny wp.kernel into a pre-allocated _sensor_transforms_wp buffer. _offset_pos_wp / _offset_quat_wp (and their torch aliases on the camera path) stay holding cfg.offset, so reset() and set_world_poses() see the right value.

Added two regression tests in test_ray_caster_sensor.py:

  • test_camera_offset_buffer_survives_body_tracker_init asserts _offset_pos still equals cfg.offset after _initialize_body_tracker under a rigid parent. (Direct invariant — would have caught the original bake.)
  • test_camera_set_world_poses_under_rigid_parent exercises the user-visible failure: kinematic rigid parent at z=5, sensor Xform at (0, 0.1, 0) USD-local, set_world_poses(positions=[(1, 2, 6)]), then assert data.pos_w[0] ≈ (1, 2, 6) within 1 cm. Pre-fix path would land at (1, 1.9, 6) (off by the y=0.1 body-to-Xform offset).

Per-step cost is one extra wp.launch of a transform-multiply kernel ((num_envs,) threads). Negligible vs the existing ray-cast launch.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits appear to be minor (diff shows no substantive changes to the code logic). Reviewing against previous findings:

  1. Previous Critical (Newton site re-registration): Still present at the same location. The _invalidate_body_tracker logic remains unchanged — the concern about _site_args_cached=True potentially preventing re-registration when sites were cleared by NewtonManager.close() still applies. However, looking more carefully at the code flow: _register_body_tracker checks if not self._site_args_cached before the USD walk, but it always calls cl_register_site for non-static parents (lines 89-91). So re-registration does happen. This previous concern was overstated — the code is correct.

  2. Previous Warnings: All remain unchanged but are minor/defensive-coding issues, not bugs.

  3. Previous Improvements: Documentation suggestions still applicable but non-blocking.

CI is still pending — recommend waiting for CI to pass before merging.

Verdict

Ship it once CI passes. The architecture is sound, tests are comprehensive, and the bug fix is well-validated. No new issues introduced.

@ooctipus
Copy link
Copy Markdown
Collaborator Author

ooctipus commented May 6, 2026

Thanks for the review! Triage on each item:

🔴 Newton _invalidate_body_tracker — push back. The cached flag (_site_args_cached=True) only short-circuits the USD walk — the cached (body_pattern, site_xform) pair. The two cl_register_site calls always run on the rigid-body branch (lines 104–105 in ray_caster.py), so both the world-origin label and the source label get re-issued every invalidate cycle. Mirrors the same pattern in isaaclab_newton.sensors.frame_transformer.FrameTransformer._invalidate_initialize_callback.

🟡 _zero_data_buffers .zero_() vs wp.launch / mid-step race — push back. The .zero_() pattern is the established one for this lifecycle event: see isaaclab_newton.sensors.pva.Pva._invalidate_initialize_callback (lines 240–252) which uses the identical for buf in [...]: buf.zero_() loop. Race-with-kernel-writes isn't a concern: PhysicsEvent.STOP is dispatched synchronously from SimulationContext.stop() after the sim loop has exited — there's no concurrent kernel running at that point.

🟡 Unnecessary .clone() in BaseRayCasterCamera.reset — punt. Correct observation but pre-existing — those clones predate this PR (verbatim from the original RayCasterCamera). Removing them is a separate cleanup; perf-irrelevant since reset / set_world_poses are cold paths.

🟡 PhysX offset composition / camera ordering — already validated in the same comment ("does, per line 135-137"). No action.

🔵 Multi-mesh _ray_distance_w documentation — addressed in ebcc2b9. Comment now contrasts the full-size atomic_min target with the base's (1, 1) placeholder dummy.

🔵 Deprecated method TODO version comments — push back. _compute_view_world_poses / _compute_camera_world_poses are pre-existing methods I just renamed-through (the rename was _get_view_world_poses_get_sensor_world_poses upstream of these). Their existing .. deprecated:: v2.3.1 Sphinx markers carry the version metadata. Adding # TODO: Remove in v3.0 would be a guess at a target the project hasn't publicly committed to.

🔵 Newton ground-truth in regression test — push back. A real Newton ground-truth would need to read state.body_q[body_idx] after resolving the parent body's index from NewtonManager._model.body_label — that's nontrivial test plumbing. Comparing against sensor._newton_transforms would be tautological (it is the sensor pose post-composition, not an independent body pose). The change-detection assertion (drop > 1 cm) already pins the bug class on Newton; the tight ground-truth on PhysX is the bonus that comes for free because _body_view is the right shape there.

So: 1 fix landed, 6 push-backs above. Happy to reconsider any of them — let me know if I'm misreading something.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits show no substantive changes to the code logic — the diff appears identical to what was previously reviewed. The previous bot review correctly identified that the Newton site re-registration concern was overstated (the code does re-register sites correctly).

CI is still pending — all test jobs show ⏳ status. Recommend waiting for CI to pass before merging.

Verdict

Ship it once CI passes. No new issues introduced.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

No code changes since the previous review — the diff is identical. The previous review correctly identified this as ready to ship.

CI Status Update: Most jobs are still pending (⏳), but isaaclab_ov and isaaclab_assets have passed (✅). The core isaaclab test jobs that would exercise the ray-caster changes are still running.

Verdict

Ship it once CI completes. No new concerns.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

No code changes since the previous review — the diff is identical to the last review cycle.

CI Status Update: More jobs have completed. Key results:

  • isaaclab_ov, isaaclab_assets, isaaclab_teleop passed
  • pre-commit, Build Wheel, Check changelog fragments passed
  • ⏳ Core test jobs (isaaclab (core), isaaclab_physx, isaaclab_newton, isaaclab_tasks) still pending

Verdict

Ship it once the pending CI jobs (especially isaaclab (core) and backend-specific tests) complete successfully. No new concerns.

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from 155b06e to 7a17c3d Compare May 7, 2026 06:14
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026
@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch 3 times, most recently from 14d3e5b to e746b7b Compare May 7, 2026 22:15
@kellyguo11 kellyguo11 moved this to In review in Isaac Lab May 7, 2026
@ooctipus
Copy link
Copy Markdown
Collaborator Author

ooctipus commented May 8, 2026

@greptile-apps review again

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from e746b7b to ef4f340 Compare May 8, 2026 02:27
@kellyguo11 kellyguo11 assigned StafaH and unassigned AntoineRichard May 11, 2026
@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from 3385e58 to ca933c6 Compare May 11, 2026 19:05
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR refactors RayCaster and RayCasterCamera (plus MultiMesh variants) into a backend-agnostic base with per-backend implementations for PhysX and Newton. The primary goal is fixing the staleness regression from #5179 where sensors parented under articulations/rigid bodies returned spawn-time poses.

✅ Strengths

  1. Clean architecture: Base classes contain backend-agnostic logic while PhysX/Newton mixins handle only pose tracking. Consistent with the split pattern used by Pva, FrameTransformer, and ContactSensor.

  2. Proper factory dispatch: FactoryBase shims correctly dispatch to backend implementations.

  3. Critical bug fix: Directly addresses the staleness issue where sensors returned spawn-time poses forever during headless training.

  4. Good test coverage: New tests validate physics body parent motion tracking and heterogeneous ClonePlan consumption.

🔍 Observations

  1. Rectangular mesh table padding (Acceptable): Heterogeneous scenes pad shorter environments with a dummy mesh at (1e9, 1e9, 1e9) to maintain kernel ABI compatibility. The PR notes a follow-up is planned for a new kernel.

  2. Newton site registration timing: Sites are registered in __init__ before cloning occurs, which is correct and matches existing patterns.

  3. MultiMesh staleness persists for FrameView-backed targets (Known limitation): Tracked as xfail with a planned follow-up.

📋 Minor Suggestions

  1. Changelog fragment CI failure: The Check changelog fragments check is failing - may need a formatting adjustment.

  2. Type hints in mixins: Some methods use self: Any due to mixin patterns. Consider adding a brief docstring explaining the expected MRO chain for maintainability.

  3. PhysX body view cleanup: _physx_body_view is created but not explicitly released in __del__. This is likely fine since views are cleaned up with simulation, but worth noting.

✅ Verdict

Well-structured refactoring that correctly addresses the staleness regression while maintaining API compatibility. The architecture follows established patterns and includes proper testing for the new behavior.


Files reviewed
  • base_ray_caster.py, base_ray_caster_camera.py, base_multi_mesh_ray_caster.py, base_multi_mesh_ray_caster_camera.py
  • isaaclab_physx/sensors/ray_caster/*.py
  • isaaclab_newton/sensors/ray_caster/*.py
  • Test files: test_ray_caster_sensor.py, test_ray_caster_integration.py, test_multi_mesh_ray_caster_camera.py
  • Changelog fragments

Copy link
Copy Markdown

@StafaH StafaH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Left some comments.

Overall there's alot of switching between torch and warp ops, which could be cleaned up to use warp only. Also we could use ProxyArrays in all the base classes to clean things up even more. Nothing critical for this PR though, can merge it in and clean up later

self._num_meshes_per_env: dict[str, int] = {}

self._raycast_targets_cfg: list[MultiMeshRayCasterCfg.RaycastTargetCfg] = []
for target in self.cfg.mesh_prim_paths:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge the two target_cfg loops?

mesh_orientations: list[list[tuple[float, float, float, float]]] = []

# Pack all targets into the rectangular table used by the Warp kernel.
for env_id in range(self._num_envs):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be vectorized with numpy. Also seems the dummy record is only used to extend the list. Can we not allocate the array in advance?

Even better it seems you could populate it with a warp kernel, since the end result is a warp kernel below?

)
self._mesh_positions_w_torch = wp.to_torch(self._mesh_positions_w)
self._mesh_orientations_w_torch = wp.to_torch(self._mesh_orientations_w)
self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use proxy array here?

self._mesh_positions_w_torch[:] = torch.tensor(mesh_positions, dtype=torch.float32, device=self.device)
self._mesh_orientations_w_torch[:] = torch.tensor(mesh_orientations, dtype=torch.float32, device=self.device)

def _build_mesh_records(self, target_cfg: _RaycastTargetCfg, plan: ClonePlan | None, dummy_mesh_id: int | None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use an alias for _RaycastTargetCfg?


# Prefer ClonePlan data for env-scoped targets; destination USD prims may not exist.
if plan is not None:
target_path = re.sub(r"env_\.\*", "env_0", target_cfg.prim_expr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use target_cfg.prim_expr.replace here?

meshes: ClassVar[dict[tuple[str, str], wp.Mesh]] = {}
"""A dictionary to store warp meshes for raycasting, shared across all instances.

The keys are ``(prim_path, device)`` tuples and values are the corresponding warp Mesh objects.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are warp meshes being created on both devices? Should it not just be on the device currently set globally?


# read prims to ray-cast
for mesh_prim_path in self.cfg.mesh_prim_paths:
mesh_key = (mesh_prim_path, self._device)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment, seems _device will not change? Then we don't need to key with tuple


def _initialize_rays_impl(self):
# Create all indices buffer
self._ALL_INDICES = torch.arange(self._view.count, device=self._device, dtype=torch.long)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this does not need to be caps


logger = logging.getLogger(__name__)

_MeshRecord = tuple[int, tuple[float, float, float], tuple[float, float, float, float]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In alot of places this tuple exists for vec3 and vec4/quat, maybe we can alias the type somewhere so it's a bit easier to read. If we convert more of the code to warp we can use wp.vec3 and wp.quat which would be alot cleaner

def __str__(self) -> str:
"""Returns: A string containing information about the instance."""
return (
f"Ray-caster @ '{self.cfg.prim_path}': \n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseRayCaster has the same str overload, can we just call that one?

@ooctipus ooctipus force-pushed the octi/raycaster-backend-split branch from be77c7c to 7b43350 Compare May 12, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

5 participants