Skip to content

Commit 4cb8977

Browse files
committed
docs: clarify volumetric array axis order is WHD (X,Y,Z), not HWD (#8337)
MONAI's NIfTI/ITK readers return ndarrays whose axis 0 is columns/Width (X), axis 1 is rows/Height (Y), and axis 2 is Depth/slices (Z) — see `ITKReader._get_array_data` in `monai/data/image_reader.py` which applies `.T` to the ITK-native (Z,Y,X) array to produce (X,Y,Z), and nibabel which already serves (X,Y,Z). Several docstrings and one error message described this layout as `HWD` / `CHWD` / `NCHWD`, which swaps the spatial meanings of H and W and is misleading to users writing custom loaders or comparing array shapes against DICOM/NIfTI metadata. Update the user-facing docstrings to use `WHD` / `CWHD` / `NCWHD` and add a brief note on each describing the per-axis meaning so the convention is unambiguous when read in isolation: - monai/data/image_writer.py: `H, HW, HWD` → `W, WH, WHD` in the `resample_if_needed` docstring (the canonical NIfTI-semantics description in the writer hierarchy). - monai/visualize/utils.py: `matshow3d` volume shape doc. - monai/visualize/img2tensorboard.py: docstrings + the `AssertionError` message in `_image3_animated_gif`. The TensorBoard API arguments `dataformats="HW"/"CHW"/"NCHWT"` and the user-visible GIF tag suffix `f"{tag}_HWD"` are intentionally left unchanged — those are TensorBoard-side conventions / public log keys. - monai/handlers/tensorboard_handlers.py: `frame_dim` docstring on `TensorBoardImageHandler`. - monai/apps/vista3d/inferer.py: `point_based_window_inferer` input shape doc. This change is documentation-only; no runtime behavior is affected. Out of scope for this commit (tracked separately): - `monai/apps/deepedit/transforms.py` uses the `CHWD` label in 15 places that are tightly coupled to click-coordinate construction (`[g[0], g[-2], g[-1], sid]`). Renaming the label without verifying the click semantics would create a doc/code mismatch and may mask a real coordinate-order bug; this needs a separate code-level review. - `monai/apps/deepgrow/transforms.py` uses `CDHW` (depth-first), which is a different convention question from the H/W swap addressed here. - Tensor-axis docstrings in losses/metrics/inferers (NCHW, NCHWD, BCDHW, BC[HWD], etc.) follow the established PyTorch placeholder convention and are dimension-agnostic; not touched. - Detection box-mode enums (`XYZWHD`, `CCCWHD`) and anchor flatten counts (`HWA`, `HWDA`) define H/W/D explicitly within their own subsystems and are internally consistent; not touched. Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
1 parent 5a2d0a7 commit 4cb8977

5 files changed

Lines changed: 42 additions & 27 deletions

File tree

monai/apps/vista3d/inferer.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,16 @@ def point_based_window_inferer(
4545
patch inference and average output stitching, and finally returns the segmented mask.
4646
4747
Args:
48-
inputs: [1CHWD], input image to be processed.
48+
inputs: [1CWHD], input image to be processed (axis 0 is columns/Width,
49+
axis 1 is rows/Height, axis 2 is Depth, matching arrays returned by
50+
MONAI's NIfTI/ITK readers).
4951
roi_size: the spatial window size for inferences.
5052
When its components have None or non-positives, the corresponding inputs dimension will be used.
5153
if the components of the `roi_size` are non-positive values, the transform will use the
5254
corresponding components of img size. For example, `roi_size=(32, -1)` will be adapted
5355
to `(32, 64)` if the second spatial dimension size of img is `64`.
5456
sw_batch_size: the batch size to run window slices.
55-
predictor: the model. For vista3D, the output is [B, 1, H, W, D] which needs to be transposed to [1, B, H, W, D].
57+
predictor: the model. For vista3D, the output is [B, 1, W, H, D] which needs to be transposed to [1, B, W, H, D].
5658
Add transpose=True in kwargs for vista3d.
5759
point_coords: [B, N, 3]. Point coordinates for B foreground objects, each has N points.
5860
point_labels: [B, N]. Point labels. 0/1 means negative/positive points for regular supported or zero-shot classes.
@@ -61,13 +63,13 @@ def point_based_window_inferer(
6163
prompt_class: [B]. The same as class_vector representing the point class and inform point head about
6264
supported class or zeroshot, not used for automatic segmentation. If None, point head is default
6365
to supported class segmentation.
64-
prev_mask: [1, B, H, W, D]. The value is before sigmoid. An optional tensor of previously segmented masks.
66+
prev_mask: [1, B, W, H, D]. The value is before sigmoid. An optional tensor of previously segmented masks.
6567
point_start: only use points starting from this number. All points before this number is used to generate
6668
prev_mask. This is used to avoid re-calculating the points in previous iterations if given prev_mask.
6769
center_only: for each point, only crop the patch centered at this point. If false, crop 3 patches for each point.
6870
margin: if center_only is false, this value is the distance between point to the patch boundary.
6971
Returns:
70-
stitched_output: [1, B, H, W, D]. The value is before sigmoid.
72+
stitched_output: [1, B, W, H, D]. The value is before sigmoid.
7173
Notice: The function only supports SINGLE OBJECT INFERENCE with B=1.
7274
"""
7375
if not point_coords.shape[0] == 1:

monai/data/image_writer.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -226,8 +226,11 @@ def resample_if_needed(
226226
transformation computed from ``affine`` and ``target_affine``.
227227
228228
This function assumes the NIfTI dimension notations. Spatially it
229-
supports up to three dimensions, that is, H, HW, HWD for 1D, 2D, 3D
230-
respectively. When saving multiple time steps or multiple channels,
229+
supports up to three dimensions, that is, ``W``, ``WH``, ``WHD`` for
230+
1D, 2D, 3D respectively (equivalently ``X``, ``XY``, ``XYZ``; axis 0
231+
is columns/Width, axis 1 is rows/Height, axis 2 is Depth/slices,
232+
matching the array order returned by NIfTI/ITK readers). When saving
233+
multiple time steps or multiple channels,
231234
time and/or modality axes should be appended after the first three
232235
dimensions. For example, shape of 2D eight-class segmentation
233236
probabilities to be saved could be `(64, 64, 1, 8)`. Also, data in
@@ -303,8 +306,8 @@ def convert_to_channel_last(
303306
``None`` indicates no channel dimension, a new axis will be appended as the channel dimension.
304307
a sequence of integers indicates multiple non-spatial dimensions.
305308
squeeze_end_dims: if ``True``, any trailing singleton dimensions will be removed (after the channel
306-
has been moved to the end). So if input is `(H,W,D,C)` and C==1, then it will be saved as `(H,W,D)`.
307-
If D is also 1, it will be saved as `(H,W)`. If ``False``, image will always be saved as `(H,W,D,C)`.
309+
has been moved to the end). So if input is `(W,H,D,C)` and C==1, then it will be saved as `(W,H,D)`.
310+
If D is also 1, it will be saved as `(W,H)`. If ``False``, image will always be saved as `(W,H,D,C)`.
308311
spatial_ndim: modifying the spatial dims if needed, so that output to have at least
309312
this number of spatial dims. If ``None``, the output will have the same number of
310313
spatial dimensions as the input.

monai/handlers/tensorboard_handlers.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -288,8 +288,8 @@ def _default_iteration_writer(self, engine: Engine, writer: SummaryWriter | Summ
288288
class TensorBoardImageHandler(TensorBoardHandler):
289289
"""
290290
TensorBoardImageHandler is an Ignite Event handler that can visualize images, labels and outputs as 2D/3D images.
291-
2D output (shape in Batch, channel, H, W) will be shown as simple image using the first element in the batch,
292-
for 3D to ND output (shape in Batch, channel, H, W, D) input, each of ``self.max_channels`` number of images'
291+
2D output (shape in Batch, channel, W, H) will be shown as simple image using the first element in the batch,
292+
for 3D to ND output (shape in Batch, channel, W, H, D) input, each of ``self.max_channels`` number of images'
293293
last three dimensions will be shown as animated GIF along the last axis (typically Depth).
294294
And if writer is from TensorBoardX, data has 3 channels and `max_channels=3`, will plot as RGB video.
295295
@@ -350,7 +350,7 @@ def __init__(
350350
index: plot which element in a data batch, default is the first element.
351351
max_channels: number of channels to plot.
352352
frame_dim: if plotting 3D image as GIF, specify the dimension used as frames,
353-
expect input data shape as `NCHWD`, default to `-3` (the first spatial dim)
353+
expect input data shape as `NCWHD`, default to `-3` (the first spatial dim)
354354
max_frames: if plot 3D RGB image as video in TensorBoardX, set the FPS to `max_frames`.
355355
"""
356356
super().__init__(summary_writer=summary_writer, log_dir=log_dir)

monai/visualize/img2tensorboard.py

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,16 @@ def _image3_animated_gif(
5050
5151
Args:
5252
tag: Data identifier
53-
image: 3D image tensors expected to be in `HWD` format
53+
image: 3D image tensors expected to be in `WHD` format (axis 0 is
54+
columns/Width, axis 1 is rows/Height, axis 2 is Depth, matching
55+
arrays returned by MONAI's NIfTI/ITK readers).
5456
writer: the tensorboard writer to plot image
55-
frame_dim: the dimension used as frames for GIF image, expect data shape as `HWD`, default to `0`.
57+
frame_dim: the dimension used as frames for GIF image, expect data shape as `WHD`, default to `0`.
5658
scale_factor: amount to multiply values by. if the image data is between 0 and 1, using 255 for this value will
5759
scale it to displayable range
5860
"""
5961
if len(image.shape) != 3:
60-
raise AssertionError("3D image tensors expected to be in `HWD` format, len(image.shape) != 3")
62+
raise AssertionError("3D image tensors expected to be in `WHD` format, len(image.shape) != 3")
6163

6264
image_np, *_ = convert_data_type(image, output_type=np.ndarray)
6365
ims = [(i * scale_factor).astype(np.uint8, copy=False) for i in np.moveaxis(image_np, frame_dim, 0)]
@@ -85,14 +87,15 @@ def make_animated_gif_summary(
8587
frame_dim: int = -3,
8688
scale_factor: float = 1.0,
8789
) -> Summary:
88-
"""Creates an animated gif out of an image tensor in 'CHWD' format and returns Summary.
90+
"""Creates an animated gif out of an image tensor in 'CWHD' format and returns Summary.
8991
9092
Args:
9193
tag: Data identifier
92-
image: The image, expected to be in `CHWD` format
94+
image: The image, expected to be in `CWHD` format (channel-first; spatial axes are
95+
Width, Height, Depth, matching arrays returned by MONAI's NIfTI/ITK readers).
9396
writer: the tensorboard writer to plot image
9497
max_out: maximum number of image channels to animate through
95-
frame_dim: the dimension used as frames for GIF image, expect input data shape as `CHWD`,
98+
frame_dim: the dimension used as frames for GIF image, expect input data shape as `CWHD`,
9699
default to `-3` (the first spatial dim)
97100
scale_factor: amount to multiply values by.
98101
if the image data is between 0 and 1, using 255 for this value will scale it to displayable range
@@ -122,14 +125,16 @@ def add_animated_gif(
122125
scale_factor: float = 1.0,
123126
global_step: int | None = None,
124127
) -> None:
125-
"""Creates an animated gif out of an image tensor in 'CHWD' format and writes it with SummaryWriter.
128+
"""Creates an animated gif out of an image tensor in 'CWHD' format and writes it with SummaryWriter.
126129
127130
Args:
128131
writer: Tensorboard SummaryWriter to write to
129132
tag: Data identifier
130-
image_tensor: tensor for the image to add, expected to be in `CHWD` format
133+
image_tensor: tensor for the image to add, expected to be in `CWHD` format (channel-first;
134+
spatial axes are Width, Height, Depth, matching arrays returned by MONAI's
135+
NIfTI/ITK readers).
131136
max_out: maximum number of image channels to animate through
132-
frame_dim: the dimension used as frames for GIF image, expect input data shape as `CHWD`,
137+
frame_dim: the dimension used as frames for GIF image, expect input data shape as `CWHD`,
133138
default to `-3` (the first spatial dim)
134139
scale_factor: amount to multiply values by. If the image data is between 0 and 1, using 255 for this value will
135140
scale it to displayable range
@@ -168,7 +173,7 @@ def plot_2d_or_3d_image(
168173
index: plot which element in the input data batch, default is the first element.
169174
max_channels: number of channels to plot.
170175
frame_dim: if plotting 3D image as GIF, specify the dimension used as frames,
171-
expect input data shape as `NCHWD`, default to `-3` (the first spatial dim)
176+
expect input data shape as `NCWHD`, default to `-3` (the first spatial dim)
172177
max_frames: if plot 3D RGB image as video in TensorBoardX, set the FPS to `max_frames`.
173178
tag: tag of the plotted image on TensorBoard.
174179
"""

monai/visualize/utils.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -53,19 +53,24 @@ def matshow3d(
5353
Create a 3D volume figure as a grid of images.
5454
5555
Args:
56-
volume: 3D volume to display. data shape can be `BCHWD`, `CHWD` or `HWD`.
57-
Higher dimensional arrays will be reshaped into (-1, H, W, [C]), `C` depends on `channel_dim` arg.
58-
A list of channel-first (C, H[, W, D]) arrays can also be passed in,
56+
volume: 3D volume to display. data shape can be `BCWHD`, `CWHD` or `WHD`
57+
(axis 0 is columns/Width, axis 1 is rows/Height, axis 2 is Depth, matching
58+
arrays returned by MONAI's NIfTI/ITK readers).
59+
Higher dimensional arrays will be reshaped into (-1, spatial0, spatial1, [C]),
60+
`C` depends on `channel_dim` arg.
61+
A list of channel-first (C, W[, H, D]) arrays can also be passed in,
5962
in which case they will be displayed as a padded and stacked volume.
6063
fig: matplotlib figure or Axes to use. If None, a new figure will be created.
6164
title: title of the figure.
6265
figsize: size of the figure.
6366
frames_per_row: number of frames to display in each row. If None, sqrt(firstdim) will be used.
6467
frame_dim: for higher dimensional arrays, which dimension from (`-1`, `-2`, `-3`) is moved to
65-
the `-3` dimension. dim and reshape to (-1, H, W) shape to construct frames, default to `-3`.
68+
the `-3` dimension. dim and reshape to (-1, spatial0, spatial1) shape to construct frames,
69+
default to `-3`.
6670
channel_dim: if not None, explicitly specify the channel dimension to be transposed to the
67-
last dimensionas shape (-1, H, W, C). this can be used to plot RGB color image.
68-
if None, the channel dimension will be flattened with `frame_dim` and `batch_dim` as shape (-1, H, W).
71+
last dimensionas shape (-1, spatial0, spatial1, C). this can be used to plot RGB color image.
72+
if None, the channel dimension will be flattened with `frame_dim` and `batch_dim` as
73+
shape (-1, spatial0, spatial1).
6974
note that it can only support 3D input image. default is None.
7075
vmin: `vmin` for the matplotlib `imshow`.
7176
vmax: `vmax` for the matplotlib `imshow`.

0 commit comments

Comments
 (0)