Skip to content

Commit 0bd7aec

Browse files
committed
Updated Camera to use Warp Arrays instead of Torch
1 parent f267f96 commit 0bd7aec

49 files changed

Lines changed: 846 additions & 556 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/source/how-to/save_camera_output.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,19 @@ PyTorch operations which allows faster computation.
5858

5959
.. code-block:: python
6060
61+
import warp as wp
6162
from isaaclab.utils.math import transform_points, unproject_depth
6263
63-
# Pointcloud in world frame
64-
points_3d_cam = unproject_depth(
65-
camera.data.output["distance_to_image_plane"], camera.data.intrinsic_matrices
66-
)
64+
# Camera ``data.output`` and pose fields are ``wp.array`` values; lift them to torch
65+
# tensors before invoking the torch-based math utilities below.
66+
depth = wp.to_torch(camera.data.output["distance_to_image_plane"])
67+
intrinsics = wp.to_torch(camera.data.intrinsic_matrices)
68+
pos_w = wp.to_torch(camera.data.pos_w)
69+
quat_w_ros = wp.to_torch(camera.data.quat_w_ros)
6770
68-
points_3d_world = transform_points(points_3d_cam, camera.data.pos_w, camera.data.quat_w_ros)
71+
# Pointcloud in world frame
72+
points_3d_cam = unproject_depth(depth, intrinsics)
73+
points_3d_world = transform_points(points_3d_cam, pos_w, quat_w_ros)
6974
7075
Alternately, we can use the :meth:`isaaclab.sensors.camera.utils.create_pointcloud_from_depth` function
7176
to create a point cloud from the depth image and transform it to the world frame.

docs/source/overview/core-concepts/sensors/camera.rst

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -162,8 +162,13 @@ Accessing camera data
162162

163163
.. code-block:: python
164164
165+
import warp as wp
166+
165167
tiled_camera = Camera(cfg.tiled_camera)
166-
data = tiled_camera.data.output["rgb"] # shape: (num_cameras, H, W, 3), torch.uint8
168+
# ``data.output`` entries are ``wp.array`` values (e.g. ``wp.uint8``); use
169+
# :func:`warp.to_torch` when Torch tensor operations are required.
170+
data_wp = tiled_camera.data.output["rgb"] # shape: (num_cameras, H, W, 3), wp.uint8
171+
data = wp.to_torch(data_wp) # zero-copy torch.uint8 view
167172
168173
The returned data has shape ``(num_cameras, height, width, num_channels)``, ready to use directly
169174
as an observation in RL training.
@@ -207,11 +212,12 @@ RGB and RGBA
207212
:figwidth: 100%
208213
:alt: A scene captured in RGB
209214

210-
``rgb`` returns a 3-channel RGB image of type ``torch.uint8``, shape ``(B, H, W, 3)``.
215+
``rgb`` returns a 3-channel RGB image of type ``wp.uint8``, shape ``(B, H, W, 3)``.
211216

212-
``rgba`` returns a 4-channel RGBA image of type ``torch.uint8``, shape ``(B, H, W, 4)``.
217+
``rgba`` returns a 4-channel RGBA image of type ``wp.uint8``, shape ``(B, H, W, 4)``.
213218

214-
To convert to ``torch.float32``, divide by 255.0.
219+
Use :func:`warp.to_torch` to obtain a zero-copy ``torch.uint8`` view; divide by ``255.0``
220+
to convert to ``torch.float32``.
215221

216222
Depth and Distances
217223
~~~~~~~~~~~~~~~~~~~
@@ -222,10 +228,10 @@ Depth and Distances
222228
:alt: A scene captured as depth
223229

224230
``distance_to_camera`` returns a single-channel depth image with distance to the camera optical
225-
center, shape ``(B, H, W, 1)``, type ``torch.float32``.
231+
center, shape ``(B, H, W, 1)``, type ``wp.float32``.
226232

227233
``distance_to_image_plane`` returns distances of 3D points from the camera plane along the Z-axis,
228-
shape ``(B, H, W, 1)``, type ``torch.float32``.
234+
shape ``(B, H, W, 1)``, type ``wp.float32``.
229235

230236
``depth`` is an alias for ``distance_to_image_plane``.
231237

@@ -238,14 +244,14 @@ Normals
238244
:alt: A scene captured with surface normals
239245

240246
``normals`` returns local surface normal vectors at each pixel, shape ``(B, H, W, 3)`` containing
241-
``(x, y, z)``, type ``torch.float32``.
247+
``(x, y, z)``, type ``wp.float32``.
242248

243249
Motion Vectors
244250
~~~~~~~~~~~~~~
245251

246252
``motion_vectors`` returns per-pixel motion vectors in image space between frames.
247253
Shape ``(B, H, W, 2)``: ``x`` is horizontal motion (positive = left), ``y`` is vertical motion
248-
(positive = up). Type ``torch.float32``.
254+
(positive = up). Type ``wp.float32``.
249255

250256
Semantic Segmentation
251257
~~~~~~~~~~~~~~~~~~~~~
@@ -259,8 +265,8 @@ Semantic Segmentation
259265
An ``info`` dictionary is available via ``tiled_camera.data.info['semantic_segmentation']``.
260266

261267
- If ``colorize_semantic_segmentation=True``: 4-channel RGBA image, shape ``(B, H, W, 4)``,
262-
type ``torch.uint8``. The ``idToLabels`` dict maps color to semantic label.
263-
- If ``colorize_semantic_segmentation=False``: shape ``(B, H, W, 1)``, type ``torch.int32``,
268+
type ``wp.uint8``. The ``idToLabels`` dict maps color to semantic label.
269+
- If ``colorize_semantic_segmentation=False``: shape ``(B, H, W, 1)``, type ``wp.int32``,
264270
containing semantic IDs. The ``idToLabels`` dict maps ID to label.
265271

266272
Instance ID Segmentation
@@ -274,9 +280,9 @@ Instance ID Segmentation
274280
``instance_id_segmentation_fast`` outputs per-pixel instance IDs, unique per USD prim path.
275281
An ``info`` dictionary is available via ``tiled_camera.data.info['instance_id_segmentation_fast']``.
276282

277-
- If ``colorize_instance_id_segmentation=True``: shape ``(B, H, W, 4)``, type ``torch.uint8``.
283+
- If ``colorize_instance_id_segmentation=True``: shape ``(B, H, W, 4)``, type ``wp.uint8``.
278284
The ``idToLabels`` dict maps color to USD prim path.
279-
- If ``colorize_instance_id_segmentation=False``: shape ``(B, H, W, 1)``, type ``torch.int32``.
285+
- If ``colorize_instance_id_segmentation=False``: shape ``(B, H, W, 1)``, type ``wp.int32``.
280286
The ``idToLabels`` dict maps instance ID to USD prim path.
281287

282288
Instance Segmentation
@@ -292,8 +298,8 @@ to the lowest level with semantic labels (unlike ``instance_id_segmentation_fast
292298
goes to the leaf prim).
293299
An ``info`` dictionary is available via ``tiled_camera.data.info['instance_segmentation_fast']``.
294300

295-
- If ``colorize_instance_segmentation=True``: shape ``(B, H, W, 4)``, type ``torch.uint8``.
296-
- If ``colorize_instance_segmentation=False``: shape ``(B, H, W, 1)``, type ``torch.int32``.
301+
- If ``colorize_instance_segmentation=True``: shape ``(B, H, W, 4)``, type ``wp.uint8``.
302+
- If ``colorize_instance_segmentation=False``: shape ``(B, H, W, 1)``, type ``wp.int32``.
297303

298304
The ``idToLabels`` dict maps color to USD prim path. The ``idToSemantics`` dict maps color to
299305
semantic label.

scripts/benchmarks/benchmark_cameras.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@
256256
import numpy as np
257257
import psutil
258258
import torch
259+
import warp as wp
259260

260261
import isaaclab.sim as sim_utils
261262
from isaaclab.assets import RigidObject, RigidObjectCfg
@@ -635,7 +636,7 @@ def run_simulator(
635636
# Set camera world poses
636637
for camera_list in camera_lists:
637638
for camera in camera_list:
638-
num_cameras = camera.data.intrinsic_matrices.size(0)
639+
num_cameras = camera.data.intrinsic_matrices.shape[0]
639640
positions = torch.tensor([[2.5, 2.5, 2.5]], device=sim.device).repeat(num_cameras, 1)
640641
targets = torch.tensor([[0.0, 0.0, 0.0]], device=sim.device).repeat(num_cameras, 1)
641642
camera.set_world_poses_from_view(positions, targets)
@@ -675,23 +676,24 @@ def run_simulator(
675676
# Only update the camera if it hasn't been updated as part of scene_entities.update ...
676677
camera.update(dt=sim.get_physics_dt())
677678

679+
# camera outputs and intrinsics are wp.array; lift to torch for math + collection
680+
intrinsics_torch = wp.to_torch(camera.data.intrinsic_matrices)
681+
678682
for data_type in data_types:
679683
data_label = f"{label}_{cam_idx}_{data_type}"
684+
output_torch = wp.to_torch(camera.data.output[data_type])
680685

681686
if depth_predicate(data_type): # is a depth image, want to create cloud
682-
depth = camera.data.output[data_type]
687+
depth = output_torch
683688
depth_images[data_label + "_raw"] = depth
684689
if perspective_depth_predicate(data_type) and convert_depth_to_camera_to_image_plane:
685-
depth = orthogonalize_perspective_depth(
686-
camera.data.output[data_type], camera.data.intrinsic_matrices
687-
)
690+
depth = orthogonalize_perspective_depth(output_torch, intrinsics_torch)
688691
depth_images[data_label + "_undistorted"] = depth
689692

690-
pointcloud = unproject_depth(depth=depth, intrinsics=camera.data.intrinsic_matrices)
693+
pointcloud = unproject_depth(depth=depth, intrinsics=intrinsics_torch)
691694
clouds[data_label] = pointcloud
692695
else: # rgb image, just save it
693-
image = camera.data.output[data_type]
694-
images[data_label] = image
696+
images[data_label] = output_torch
695697

696698
# End timing for the step
697699
step_end_time = time.time()

scripts/demos/sensors/cameras.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
import matplotlib.pyplot as plt
4545
import numpy as np
4646
import torch
47+
import warp as wp
4748

4849
import isaaclab.sim as sim_utils
4950
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
@@ -239,8 +240,15 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
239240
# save every 10th image (for visualization purposes only)
240241
# note: saving images will slow down the simulation
241242
if count % 10 == 0:
243+
# camera outputs are wp.array; lift to torch for image saving / matplotlib
244+
camera_rgb = wp.to_torch(scene["camera"].data.output["rgb"])
245+
tiled_rgb = wp.to_torch(scene["tiled_camera"].data.output["rgb"])
246+
camera_depth = wp.to_torch(scene["camera"].data.output["distance_to_image_plane"])
247+
tiled_depth = wp.to_torch(scene["tiled_camera"].data.output["distance_to_image_plane"])
248+
raycast_depth = wp.to_torch(scene["raycast_camera"].data.output["distance_to_image_plane"])
249+
242250
# compare generated RGB images across different cameras
243-
rgb_images = [scene["camera"].data.output["rgb"][0, ..., :3], scene["tiled_camera"].data.output["rgb"][0]]
251+
rgb_images = [camera_rgb[0, ..., :3], tiled_rgb[0]]
244252
save_images_grid(
245253
rgb_images,
246254
subtitles=["Camera"],
@@ -250,9 +258,9 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
250258

251259
# compare generated Depth images across different cameras
252260
depth_images = [
253-
scene["camera"].data.output["distance_to_image_plane"][0],
254-
scene["tiled_camera"].data.output["distance_to_image_plane"][0, ..., 0],
255-
scene["raycast_camera"].data.output["distance_to_image_plane"][0],
261+
camera_depth[0],
262+
tiled_depth[0, ..., 0],
263+
raycast_depth[0],
256264
]
257265
save_images_grid(
258266
depth_images,
@@ -263,16 +271,15 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
263271
)
264272

265273
# save all tiled RGB images
266-
tiled_images = scene["tiled_camera"].data.output["rgb"]
267274
save_images_grid(
268-
tiled_images,
269-
subtitles=[f"Cam{i}" for i in range(tiled_images.shape[0])],
275+
tiled_rgb,
276+
subtitles=[f"Cam{i}" for i in range(tiled_rgb.shape[0])],
270277
title="Tiled RGB Image",
271278
filename=os.path.join(output_dir, "tiled_rgb", f"{count:04d}.jpg"),
272279
)
273280

274281
# save all camera RGB images
275-
cam_images = scene["camera"].data.output["rgb"][..., :3]
282+
cam_images = camera_rgb[..., :3]
276283
save_images_grid(
277284
cam_images,
278285
subtitles=[f"Cam{i}" for i in range(cam_images.shape[0])],

scripts/tutorials/04_sensors/run_ray_caster_camera.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
import os
3939

4040
import torch
41+
import warp as wp
4142

4243
import omni.replicator.core as rep
4344

@@ -144,17 +145,17 @@ def run_simulator(sim: sim_utils.SimulationContext, scene_entities: dict):
144145
rep_output["trigger_outputs"] = {"on_time": camera.frame[camera_index]}
145146
rep_writer.write(rep_output)
146147

147-
# Pointcloud in world frame
148-
points_3d_cam = unproject_depth(
149-
camera.data.output["distance_to_image_plane"], camera.data.intrinsic_matrices
150-
)
148+
# Pointcloud in world frame; convert wp.array camera outputs to torch for math ops
149+
depth_torch = wp.to_torch(camera.data.output["distance_to_image_plane"])
150+
intrinsics_torch = wp.to_torch(camera.data.intrinsic_matrices)
151+
points_3d_cam = unproject_depth(depth_torch, intrinsics_torch)
151152

152153
# Check methods are valid
153154
im_height, im_width = camera.image_shape
154155
# -- project points to (u, v, d)
155-
reproj_points = project_points(points_3d_cam, camera.data.intrinsic_matrices)
156+
reproj_points = project_points(points_3d_cam, intrinsics_torch)
156157
reproj_depths = reproj_points[..., -1].view(-1, im_width, im_height).transpose_(1, 2)
157-
sim_depths = camera.data.output["distance_to_image_plane"].squeeze(-1)
158+
sim_depths = depth_torch.squeeze(-1)
158159
torch.testing.assert_close(reproj_depths, sim_depths)
159160

160161

scripts/tutorials/04_sensors/run_usd_camera.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565

6666
import numpy as np
6767
import torch
68+
import warp as wp
6869
from isaaclab_physx.renderers import IsaacRtxRendererCfg
6970

7071
import omni.replicator.core as rep
@@ -254,12 +255,12 @@ def run_simulator(sim: sim_utils.SimulationContext, scene_entities: dict):
254255
and args_cli.draw
255256
and "distance_to_image_plane" in camera.data.output.keys()
256257
):
257-
# Derive pointcloud from camera at camera_index
258+
# Derive pointcloud from camera at camera_index; lift wp.array fields to torch
258259
pointcloud = create_pointcloud_from_depth(
259-
intrinsic_matrix=camera.data.intrinsic_matrices[camera_index],
260-
depth=camera.data.output["distance_to_image_plane"][camera_index],
261-
position=camera.data.pos_w[camera_index],
262-
orientation=camera.data.quat_w_ros[camera_index],
260+
intrinsic_matrix=wp.to_torch(camera.data.intrinsic_matrices)[camera_index],
261+
depth=wp.to_torch(camera.data.output["distance_to_image_plane"])[camera_index],
262+
position=wp.to_torch(camera.data.pos_w)[camera_index],
263+
orientation=wp.to_torch(camera.data.quat_w_ros)[camera_index],
263264
device=sim.device,
264265
)
265266

source/isaaclab/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "4.6.22"
4+
version = "4.6.24"
55

66
# Description
77
title = "Isaac Lab framework for Robot Learning"

source/isaaclab/docs/CHANGELOG.rst

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,67 @@
11
Changelog
22
---------
33

4+
4.6.24 (2026-04-30)
5+
~~~~~~~~~~~~~~~~~~~
6+
7+
Fixed
8+
^^^^^
9+
10+
* Fixed :class:`~isaaclab.envs.mdp.observations.image` (and the equivalent
11+
``image()`` helper in the Franka stack ``stack_ik_rel_blueprint`` config) so
12+
that Torch tensor operations are applied via :func:`warp.to_torch` rather than
13+
invoked directly on the new ``wp.array`` camera outputs.
14+
* Fixed downstream consumers (camera tutorials, ``demos/sensors/cameras.py``,
15+
``benchmarks/benchmark_cameras.py``, dexsuite ``vision_camera`` observation,
16+
visualizer integration test, ``save_camera_output`` how-to and the camera
17+
overview docs) to lift ``wp.array`` camera fields to torch tensors via
18+
:func:`warp.to_torch` before performing Torch operations.
19+
* Fixed the camera and ray-caster camera test suites to use ``wp.array`` dtypes
20+
(``wp.uint8``, ``wp.float32``, ``wp.int32``) and :func:`warp.to_torch` views
21+
in assertions on ``CameraData.output``, ``CameraData.intrinsic_matrices``,
22+
``CameraData.pos_w``/``quat_w_*`` and ``Camera.frame``.
23+
* Fixed :class:`~isaaclab.renderers.NewtonWarpRenderer` to populate the ``rgb``
24+
output buffer when both ``rgb`` and ``rgba`` are requested, restoring the
25+
legacy "rgb mirrors rgba" behavior that broke when ``rgb`` and ``rgba``
26+
became independent ``wp.array`` allocations.
27+
28+
Changed
29+
^^^^^^^
30+
31+
* Tightened :class:`~isaaclab.renderers.RenderBufferSpec` ``dtype`` annotation
32+
from ``Any`` to ``type`` to document that all renderers must publish Warp
33+
scalar dtype classes (e.g. ``warp.uint8``).
34+
* Removed the transitional ``torch.dtype → wp.dtype`` shim in
35+
:meth:`~isaaclab.sensors.camera.CameraData.allocate` now that all in-tree
36+
renderers publish ``wp`` dtypes via :class:`~isaaclab.renderers.RenderBufferSpec`.
37+
* Documented the transitional Torch input fallback on
38+
:func:`~isaaclab.utils.math.convert_camera_frame_orientation_convention` and
39+
consolidated the redundant ``wp ↔ torch`` round-trips in
40+
:meth:`~isaaclab.sensors.camera.Camera.set_world_poses` and
41+
:meth:`~isaaclab.sensors.camera.Camera.set_world_poses_from_view`.
42+
43+
44+
4.6.23 (2026-04-30)
45+
~~~~~~~~~~~~~~~~~~~
46+
47+
Changed
48+
^^^^^^^
49+
50+
* Changed :class:`~isaaclab.sensors.camera.CameraData` array fields and
51+
:attr:`~isaaclab.sensors.camera.CameraData.output` buffers to expose
52+
``wp.array`` values instead of :class:`torch.Tensor` values. Use
53+
:func:`warp.to_torch` where Torch tensor operations are required.
54+
* Changed :class:`~isaaclab.sensors.camera.Camera` pose, intrinsic, and frame
55+
array APIs to accept or return ``wp.array`` values instead of
56+
:class:`torch.Tensor` values. Existing Torch inputs are still accepted during
57+
the transition; prefer ``wp.array`` at public call sites.
58+
* Changed :class:`~isaaclab.renderers.BaseRenderer` output and camera-update
59+
APIs to exchange ``wp.array`` buffers with camera sensors.
60+
* Changed :func:`~isaaclab.utils.math.convert_camera_frame_orientation_convention`
61+
to accept and return ``wp.array`` quaternion arrays. Use :func:`warp.to_torch`
62+
where Torch tensor operations are required.
63+
64+
465
4.6.22 (2026-04-27)
566
~~~~~~~~~~~~~~~~~~~
667

source/isaaclab/isaaclab/envs/mdp/observations.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from typing import TYPE_CHECKING
1515

1616
import torch
17+
import warp as wp
1718

1819
import isaaclab.utils.math as math_utils
1920
from isaaclab.managers import SceneEntityCfg
@@ -402,12 +403,12 @@ def image(
402403
# extract the used quantities (to enable type-hinting)
403404
sensor: Camera | RayCasterCamera = env.scene.sensors[sensor_cfg.name]
404405

405-
# obtain the input image
406-
images = sensor.data.output[data_type]
406+
# obtain the input image; camera outputs are wp.array, lift to torch for tensor ops
407+
images = wp.to_torch(sensor.data.output[data_type])
407408

408409
# depth image conversion
409410
if (data_type == "distance_to_camera") and convert_perspective_to_orthogonal:
410-
images = math_utils.orthogonalize_perspective_depth(images, sensor.data.intrinsic_matrices)
411+
images = math_utils.orthogonalize_perspective_depth(images, wp.to_torch(sensor.data.intrinsic_matrices))
411412

412413
# rgb/depth/normals image normalization
413414
if normalize:

0 commit comments

Comments
 (0)