Skip to content

Commit 1624d13

Browse files
authored
Widen read_vrt buffer to fit all selected band dtypes (#1701)
* Widen read_vrt output buffer to fit all selected band dtypes read_vrt allocated the multi-band output buffer from selected_bands[0].dtype only. Each band's source array was then assigned with result[..., band_idx] = src_arr[...], which silently casts the source to the narrow buffer dtype. A Float32 band 1 after a Byte band 0 returned uint8 with the float values truncated; a Byte band with <ScaleRatio>0.5</ScaleRatio> returned uint8 with the post-scale fractional part lost. Compute the effective per-band dtype (declared dtype, or float64 when any source has scale or offset, matching the existing promotion at _vrt.py L562-565) and take np.result_type across all selected bands before allocating the buffer. The single-band branch follows the same logic so a single-band scaled VRT also widens. All-integer VRTs without scaling stay integer, so memory is not blown up for the common case. Fixes #1696 * Address Copilot review on PR #1701: drop line-number cites, guard empty VRT Three issues raised in review: * `_vrt.py` allocation comment cited specific line numbers ("see L562-565") for the ComplexSource scaling block. Line numbers drift; replace with a named reference to the `# Apply ComplexSource scaling` block. * The same comment claimed mixes "widen to float64". `np.result_type` may also produce `float32` (e.g. NumPy 2.x on `uint8 + float32`) or `complex128` when complex bands are present. Reword to describe the common-dtype rule and list the typical outcomes. * `test_vrt_multiband_dtype_1696.py` module docstring and one test docstring cited `_vrt.py` L327-334 / L562-565. Replace with named references (`parse_vrt` ComplexSource branch, `# Apply ComplexSource scaling` block) that survive future edits. * A VRT with zero `<VRTRasterBand>` elements made `np.result_type(*[])` raise the generic "at least one array or dtype is required" error. Add an explicit `if not selected_bands` guard that raises a clear ValueError, plus a regression test asserting the new message.
1 parent de75800 commit 1624d13

2 files changed

Lines changed: 477 additions & 7 deletions

File tree

xrspatial/geotiff/_vrt.py

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -522,15 +522,54 @@ def read_vrt(vrt_path: str, *, window=None,
522522
else:
523523
selected_bands = vrt.bands
524524

525-
# Allocate output
525+
# Allocate output.
526+
#
527+
# The output buffer dtype must be wide enough to hold every selected
528+
# band losslessly. A naive ``selected_bands[0].dtype`` would silently
529+
# truncate any wider band's values when the per-band placement at the
530+
# ``result[...] = src_arr[...]`` line below casts to the buffer dtype.
531+
# Two sources of widening matter:
532+
#
533+
# * Heterogeneous declared dtypes across bands (e.g. ``Byte`` + ``Float32``).
534+
# * ``ComplexSource`` ``ScaleRatio`` / ``ScaleOffset`` promote source
535+
# values to ``float64`` before placement (see the ``# Apply
536+
# ComplexSource scaling`` block later in this function); the
537+
# destination has to be float-typed too, otherwise the fractional
538+
# part is lost.
539+
#
540+
# ``np.result_type`` produces the common dtype that holds every
541+
# contributing dtype without loss. For an all-integer VRT it stays
542+
# integer; for mixes it widens to the common dtype (typically
543+
# ``float64`` when integer and floating-point bands mix, but it can
544+
# be a narrower float -- e.g. ``float32`` under NumPy 2.x for
545+
# ``uint8`` + ``float32`` -- or ``complex128`` when complex bands
546+
# are present). See issue #1696.
547+
#
548+
# Guard against a malformed VRT with zero ``<VRTRasterBand>``
549+
# elements: ``np.result_type()`` with no args raises a generic
550+
# "at least one array or dtype is required" message that gives the
551+
# caller no hint about the underlying cause.
552+
if not selected_bands:
553+
raise ValueError(
554+
"VRT has no <VRTRasterBand> elements; cannot determine "
555+
"output dtype"
556+
)
557+
effective_dtypes = []
558+
for vrt_band in selected_bands:
559+
eff = vrt_band.dtype
560+
for src in vrt_band.sources:
561+
scaled = src.scale is not None and src.scale != 1.0
562+
offset = src.offset is not None and src.offset != 0.0
563+
if scaled or offset:
564+
eff = np.dtype(np.float64)
565+
break
566+
effective_dtypes.append(eff)
567+
dtype = np.result_type(*effective_dtypes)
568+
fill = np.nan if dtype.kind in ('f', 'c') else 0
526569
if len(selected_bands) == 1:
527-
dtype = selected_bands[0].dtype
528-
result = np.full((out_h, out_w), np.nan if dtype.kind == 'f' else 0,
529-
dtype=dtype)
570+
result = np.full((out_h, out_w), fill, dtype=dtype)
530571
else:
531-
dtype = selected_bands[0].dtype
532-
result = np.full((out_h, out_w, len(selected_bands)),
533-
np.nan if dtype.kind == 'f' else 0, dtype=dtype)
572+
result = np.full((out_h, out_w, len(selected_bands)), fill, dtype=dtype)
534573

535574
for band_idx, vrt_band in enumerate(selected_bands):
536575
nodata = vrt_band.nodata

0 commit comments

Comments
 (0)