Skip to content

Commit c80d0f3

Browse files
brendancolCopilot
andauthored
Fix read_vrt(band=N) using band 0's nodata sentinel (#1598) (#1602)
* Fix read_vrt(band=N) using band 0's nodata sentinel (#1598) read_vrt sourced attrs['nodata'] and the integer-with-nodata float64 promotion mask from vrt.bands[0].nodata regardless of which band was selected. A multi-band VRT with per-band <NoDataValue> tags would silently return band N's pixels with band 0's sentinel reported in attrs and band N's actual sentinel left as literal integers in the array (no NaN, no float64 promotion). Use vrt.bands[band].nodata when a band is selected, fall back to band 0 when no band kwarg was given (preserves the prior multi-band "first band wins" contract). Tests cover band=0 sanity, the broken band=1 case, and the multi-band fallback. * Address PR #1602 review: validate band upstream in _vrt.read_vrt - Move the ``band`` range/type check into the internal ``_vrt.read_vrt`` reader where ``len(vrt.bands)`` is already in scope. Reject negative indices, non-int values, and out-of-range positive values with a clear ``ValueError`` that names the available band count. Previously these would either silently return a different band (Python negative indexing), raise an opaque ``IndexError`` deep in the read path, or raise ``TypeError`` on the list index. - Drop the now-redundant post-parse guard in the public ``read_vrt`` and update the comment to reflect that ``band`` is validated upstream. - Add three regression tests covering the negative, out-of-range, and non-int rejection paths. The original bug from #1598 (band=N reads pairing with band 0's nodata sentinel) was masked by these silent acceptances on the negative-band edge: ``read_vrt(path, band=-1)`` previously returned the last band's data with ``attrs['nodata']`` unset entirely. * Merge origin/main into branch (resolve merge) Agent-Logs-Url: https://github.com/xarray-contrib/xarray-spatial/sessions/5e4ca757-8a1b-4021-aa2e-a9e7eab80367 Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
1 parent 950317e commit c80d0f3

4 files changed

Lines changed: 169 additions & 1 deletion

File tree

.claude/sweep-metadata-state.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
2+
geotiff,2026-05-11,1598,MEDIUM,4,"read_vrt(band=N) used vrt.bands[0].nodata for attrs and integer-promotion mask, mis-masking band N reads with per-band sentinels (#1598)."
23
geotiff,2026-05-11,1597,HIGH,4;5,read_geotiff_dask dropped nodata mask when sentinel only hit non-first chunks (per-chunk dtype divergence; #1597).
34
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.

xrspatial/geotiff/__init__.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2769,9 +2769,18 @@ def read_vrt(source: str, *, dtype=None, window=None,
27692769
attrs['crs_wkt'] = vrt.crs_wkt
27702770
if vrt.raster_type == 'point':
27712771
attrs['raster_type'] = 'point'
2772+
# When a specific band is selected, source its nodata from that
2773+
# band's <NoDataValue> instead of band 0's. Otherwise multi-band
2774+
# VRTs with per-band sentinels would mis-mask the read: attrs would
2775+
# advertise band 0's sentinel, the integer-promotion block below
2776+
# would mask against band 0's sentinel, and band N's actual nodata
2777+
# pixels would survive as literal integers. See issue #1598.
2778+
# ``band`` has already been validated by ``_vrt.read_vrt`` as
2779+
# 0 <= band < len(vrt.bands), so a simple lookup is safe here.
27722780
nodata = None
27732781
if vrt.bands:
2774-
nodata = vrt.bands[0].nodata
2782+
band_idx_for_nodata = band if band is not None else 0
2783+
nodata = vrt.bands[band_idx_for_nodata].nodata
27752784
if nodata is not None:
27762785
attrs['nodata'] = nodata
27772786

xrspatial/geotiff/_vrt.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,23 @@ def read_vrt(vrt_path: str, *, window=None,
242242
vrt_dir = os.path.dirname(os.path.abspath(vrt_path))
243243
vrt = parse_vrt(xml_str, vrt_dir)
244244

245+
# Validate ``band`` against the parsed band count. Python list
246+
# indexing would silently accept negative values (``vrt.bands[-1]``
247+
# returns the last band) and raise an opaque ``IndexError`` for
248+
# out-of-range positive values, neither of which is the contract the
249+
# public API documents (``band`` is a 0-based positive index). Reject
250+
# both up front with a clear ``ValueError`` so callers do not get
251+
# band-N data paired with band-0's nodata sentinel or a downstream
252+
# IndexError from deep in the read path.
253+
if band is not None:
254+
if not isinstance(band, (int, np.integer)) or isinstance(band, bool):
255+
raise ValueError(
256+
f"band must be a non-negative int, got {band!r}")
257+
if band < 0 or band >= len(vrt.bands):
258+
raise ValueError(
259+
f"band index {band} out of range for VRT with "
260+
f"{len(vrt.bands)} band(s)")
261+
245262
if window is not None:
246263
r0, c0, r1, c1 = window
247264
r0 = max(0, r0)
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
"""Regression tests for issue #1598.
2+
3+
``read_vrt(path, band=N)`` used to always source the nodata sentinel
4+
from ``vrt.bands[0]`` rather than the requested band, so a multi-band
5+
VRT with per-band ``<NoDataValue>`` would mis-mask reads of any band
6+
other than band 0:
7+
8+
* ``attrs['nodata']`` advertised band 0's sentinel (wrong).
9+
* The integer-to-float64 promotion ran against band 0's sentinel, so
10+
band N's actual nodata pixels stayed as literal integers.
11+
* The returned dtype was integer when it should have been float64.
12+
13+
The fix uses ``vrt.bands[band].nodata`` when a band is selected.
14+
"""
15+
from __future__ import annotations
16+
17+
import numpy as np
18+
import pytest
19+
20+
from xrspatial.geotiff import read_vrt
21+
from xrspatial.geotiff._writer import write
22+
23+
24+
def _write_two_band_per_band_nodata_vrt(tmp_path):
25+
"""Two single-band uint16 sources, each with a distinct nodata
26+
sentinel, exposed as bands 1 and 2 of a hand-rolled VRT.
27+
"""
28+
band0 = np.array([[1, 2], [3, 65535]], dtype=np.uint16)
29+
band1 = np.array([[7, 8], [9, 65000]], dtype=np.uint16)
30+
p0 = str(tmp_path / 'vrt_band0_1598.tif')
31+
p1 = str(tmp_path / 'vrt_band1_1598.tif')
32+
write(band0, p0, nodata=65535, compression='none', tiled=False)
33+
write(band1, p1, nodata=65000, compression='none', tiled=False)
34+
35+
vrt_path = str(tmp_path / 'two_band_per_band_nodata_1598.vrt')
36+
vrt_xml = f"""<VRTDataset rasterXSize="2" rasterYSize="2">
37+
<GeoTransform>0.0, 1.0, 0.0, 0.0, 0.0, -1.0</GeoTransform>
38+
<VRTRasterBand dataType="UInt16" band="1">
39+
<NoDataValue>65535</NoDataValue>
40+
<SimpleSource>
41+
<SourceFilename relativeToVRT="0">{p0}</SourceFilename>
42+
<SourceBand>1</SourceBand>
43+
<SrcRect xOff="0" yOff="0" xSize="2" ySize="2"/>
44+
<DstRect xOff="0" yOff="0" xSize="2" ySize="2"/>
45+
</SimpleSource>
46+
</VRTRasterBand>
47+
<VRTRasterBand dataType="UInt16" band="2">
48+
<NoDataValue>65000</NoDataValue>
49+
<SimpleSource>
50+
<SourceFilename relativeToVRT="0">{p1}</SourceFilename>
51+
<SourceBand>1</SourceBand>
52+
<SrcRect xOff="0" yOff="0" xSize="2" ySize="2"/>
53+
<DstRect xOff="0" yOff="0" xSize="2" ySize="2"/>
54+
</SimpleSource>
55+
</VRTRasterBand>
56+
</VRTDataset>"""
57+
with open(vrt_path, 'w') as f:
58+
f.write(vrt_xml)
59+
return vrt_path
60+
61+
62+
def test_read_vrt_band0_uses_band0_nodata(tmp_path):
63+
"""Sanity check the band-0 selection still works after the fix.
64+
65+
Confirms the refactor did not flip the index.
66+
"""
67+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
68+
r = read_vrt(vrt_path, band=0)
69+
assert r.dtype == np.float64
70+
assert r.attrs.get('nodata') == 65535.0
71+
assert np.isnan(r.values[1, 1])
72+
assert r.values[0, 0] == 1
73+
74+
75+
def test_read_vrt_band1_uses_band1_nodata(tmp_path):
76+
"""The previously-broken case: band=1 must use band 1's sentinel.
77+
78+
Before the fix this returned dtype=uint16 with values=[[7,8],
79+
[9,65000]] and attrs['nodata']=65535.
80+
"""
81+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
82+
r = read_vrt(vrt_path, band=1)
83+
assert r.dtype == np.float64, (
84+
"band=1 read kept uint16 dtype; per-band nodata regression."
85+
)
86+
assert r.attrs.get('nodata') == 65000.0, (
87+
f"attrs['nodata'] was {r.attrs.get('nodata')}, "
88+
f"expected 65000 from band 1's <NoDataValue>."
89+
)
90+
assert np.isnan(r.values[1, 1]), (
91+
"band 1's sentinel pixel was not NaN-masked; "
92+
"promotion ran against the wrong sentinel."
93+
)
94+
assert r.values[0, 0] == 7
95+
assert r.values[1, 0] == 9
96+
97+
98+
def test_read_vrt_no_band_keeps_band0_nodata_attr(tmp_path):
99+
"""Unselected reads still surface band 0's sentinel.
100+
101+
Multi-band VRTs with mixed sentinels return all bands stacked, and
102+
the canonical attr cannot encode per-band values; advertising
103+
band 0's sentinel matches the prior behavior and the documented
104+
"first band wins" contract for multi-band reads.
105+
"""
106+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
107+
r = read_vrt(vrt_path)
108+
assert r.attrs.get('nodata') == 65535.0
109+
110+
111+
def test_read_vrt_negative_band_raises(tmp_path):
112+
"""Negative band indices used to be silently accepted via Python
113+
list indexing (``vrt.bands[-1]`` returned the last band) while the
114+
public reader's nodata lookup rejected them, producing band-N data
115+
with no nodata sentinel. They are now a clear ValueError up front.
116+
"""
117+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
118+
with pytest.raises(ValueError, match="band"):
119+
read_vrt(vrt_path, band=-1)
120+
121+
122+
def test_read_vrt_out_of_range_band_raises(tmp_path):
123+
"""Out-of-range band indices used to raise IndexError from deep in
124+
the read path. They are now a ValueError that names the available
125+
band count.
126+
"""
127+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
128+
with pytest.raises(ValueError, match="out of range"):
129+
read_vrt(vrt_path, band=5)
130+
131+
132+
def test_read_vrt_non_integer_band_raises(tmp_path):
133+
"""A non-int ``band`` would previously have raised TypeError on the
134+
list index. ValueError here matches the rest of the input
135+
validation surface.
136+
"""
137+
vrt_path = _write_two_band_per_band_nodata_vrt(tmp_path)
138+
with pytest.raises(ValueError, match="band"):
139+
read_vrt(vrt_path, band="1")
140+
with pytest.raises(ValueError, match="band"):
141+
read_vrt(vrt_path, band=True)

0 commit comments

Comments
 (0)