Skip to content

Commit de75800

Browse files
authored
Validate band index in _read_cog_http for backend parity (#1700)
* Validate band index in _read_cog_http for backend parity The HTTP eager read path accepted band= but never validated the index. band=-1 silently returned the last channel via numpy negative indexing on the post-decode slice, band>=samples_per_pixel leaked a raw numpy IndexError, and band=N (N != 0) on a single-band HTTP COG was dropped on the floor because the post-decode slice was gated on arr.ndim == 3 and samples_per_pixel > 1. Mirror the validator added to read_to_array in PR #1673 inside _read_cog_http, after the orientation and window guards and before the tile fetch. source.close() runs before raising since the HTTP source holds a network handle. Update the NOTE comment in read_to_array (L2031-2034 before this change) that flagged the gap; the HTTP branch is now part of the "all backends agree" set. Add xrspatial/geotiff/tests/test_http_band_validation_1695.py with 11 tests covering negative band, out-of-range band, single-band nonzero band, single-band band=0, band=None, and cross-path parity with the local eager read on the IndexError messages. Fixes #1695 * Clarify HTTP source.close() comment in band validator The previous comment claimed source.close() runs before raising because the HTTP source holds a network handle, but _HTTPSource.close() is currently a no-op. The urllib3 PoolManager is shared module-level, not per-source. Reword to state that source.close() is called for symmetry with the success-path teardown so future resource-holding sources work correctly.
1 parent ed8b22f commit de75800

2 files changed

Lines changed: 358 additions & 6 deletions

File tree

xrspatial/geotiff/_reader.py

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1627,6 +1627,30 @@ def _read_cog_http(url: str, overview_level: int | None = None,
16271627
f"window={window} is outside the source extent "
16281628
f"({ifd.height}x{ifd.width}) or has non-positive size.")
16291629

1630+
# Validate ``band`` against the selected IFD's sample count before
1631+
# the tile fetch. Without this, ``band=-1`` silently picks the last
1632+
# channel via numpy negative indexing and ``band>=samples_per_pixel``
1633+
# leaks a raw numpy ``IndexError``; on a single-band file ``band=N``
1634+
# (N != 0) is dropped on the floor because the post-decode slice
1635+
# below is gated on ``arr.ndim == 3 and samples_per_pixel > 1``.
1636+
# Mirrors the local-path validator in ``read_to_array`` so all
1637+
# backends agree on the contract: 0-based non-negative index only.
1638+
# ``source.close()`` is called for symmetry with the success-path
1639+
# teardown below; it is a no-op on ``_HTTPSource`` today (the
1640+
# urllib3 ``PoolManager`` is shared module-level, not per-source)
1641+
# but a future resource-holding source will need it. See issue #1695.
1642+
if band is not None:
1643+
if ifd.samples_per_pixel <= 1:
1644+
if band != 0:
1645+
source.close()
1646+
raise IndexError(
1647+
f"band={band} requested on a single-band file.")
1648+
elif not 0 <= band < ifd.samples_per_pixel:
1649+
source.close()
1650+
raise IndexError(
1651+
f"band={band} out of range for "
1652+
f"{ifd.samples_per_pixel}-band file.")
1653+
16301654
arr = _fetch_decode_cog_http_tiles(
16311655
source, header, ifd, max_pixels=max_pixels, window=window)
16321656
source.close()
@@ -2024,14 +2048,10 @@ def read_to_array(source, *, window=None, overview_level: int | None = None,
20242048
# via numpy negative indexing and ``band>=samples_per_pixel``
20252049
# leaks a raw numpy ``IndexError`` with the internal slice
20262050
# shape. Mirrors the dask path's pre-flight validator (see
2027-
# ``read_geotiff_dask`` in ``__init__.py``) and the GPU path
2051+
# ``read_geotiff_dask`` in ``__init__.py``), the GPU path, and
2052+
# the HTTP path (``_read_cog_http`` above, as of issue #1695)
20282053
# so all backends agree on the contract: 0-based non-negative
20292054
# index only. See issue #1673.
2030-
#
2031-
# NOTE: the HTTP branch (``_read_cog_http`` above) currently
2032-
# accepts ``band`` but never slices on it; issue #1669 will
2033-
# add the slice. When that lands, mirror this same check
2034-
# there so HTTP rejects the same inputs.
20352055
ifd_samples = ifd.samples_per_pixel
20362056
if band is not None:
20372057
if ifd_samples <= 1:
Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
"""Regression tests for issue #1695.
2+
3+
``_read_cog_http`` accepted ``band`` but did not validate the index, so
4+
HTTP reads diverged from local, dask, GPU, and VRT reads on bad input:
5+
6+
* ``band=-1`` silently returned the last channel via numpy negative
7+
indexing on the post-decode slice (L1638).
8+
* ``band=N`` with ``N >= samples_per_pixel`` leaked a raw numpy
9+
``IndexError`` whose message exposed the internal slice shape.
10+
* ``band=N`` (N != 0) on a single-band HTTP COG was dropped because the
11+
post-decode slice was gated on
12+
``arr.ndim == 3 and samples_per_pixel > 1``; the call returned the
13+
full single-band raster as if ``band`` had been ``None``.
14+
15+
PR #1673 already pinned the local eager and dask paths to the contract
16+
"0-based non-negative index only". The in-file NOTE at the time of
17+
issue #1695 (``_reader.py:2031-2034``) flagged that the HTTP branch had
18+
not been mirrored. These tests pin the HTTP branch to the same contract
19+
and confirm cross-path parity on the error messages.
20+
21+
Each test FAILS before the fix and PASSES after.
22+
"""
23+
from __future__ import annotations
24+
25+
import http.server
26+
import socketserver
27+
import threading
28+
29+
import numpy as np
30+
import pytest
31+
32+
from xrspatial.geotiff import open_geotiff
33+
from xrspatial.geotiff._reader import _read_cog_http, read_to_array
34+
from xrspatial.geotiff._writer import write
35+
36+
37+
# ---------------------------------------------------------------------------
38+
# Loopback HTTP server with Range support
39+
# ---------------------------------------------------------------------------
40+
#
41+
# Mirrors the helper from ``test_http_window_band_planar_1669.py`` so the
42+
# fixtures stay self-contained without depending on ``tifffile`` or a
43+
# live network. Each server holds one payload and shuts down at test
44+
# teardown.
45+
46+
class _RangeHandler(http.server.BaseHTTPRequestHandler):
47+
payload: bytes = b''
48+
49+
def do_GET(self): # noqa: N802
50+
rng = self.headers.get('Range')
51+
if rng and rng.startswith('bytes='):
52+
spec = rng[len('bytes='):]
53+
start_s, _, end_s = spec.partition('-')
54+
start = int(start_s)
55+
end = int(end_s) if end_s else len(self.payload) - 1
56+
chunk = self.payload[start:end + 1]
57+
self.send_response(206)
58+
self.send_header('Content-Type', 'application/octet-stream')
59+
self.send_header(
60+
'Content-Range',
61+
f'bytes {start}-{start + len(chunk) - 1}/{len(self.payload)}',
62+
)
63+
self.send_header('Content-Length', str(len(chunk)))
64+
self.end_headers()
65+
self.wfile.write(chunk)
66+
return
67+
self.send_response(200)
68+
self.send_header('Content-Type', 'application/octet-stream')
69+
self.send_header('Content-Length', str(len(self.payload)))
70+
self.end_headers()
71+
self.wfile.write(self.payload)
72+
73+
def log_message(self, *_args, **_kwargs):
74+
pass
75+
76+
77+
def _serve(payload: bytes):
78+
handler_cls = type(
79+
'RangeHandler1695', (_RangeHandler,), {'payload': payload}
80+
)
81+
httpd = socketserver.TCPServer(('127.0.0.1', 0), handler_cls)
82+
port = httpd.server_address[1]
83+
thread = threading.Thread(target=httpd.serve_forever, daemon=True)
84+
thread.start()
85+
return f'http://127.0.0.1:{port}/cog.tif', httpd, thread
86+
87+
88+
def _stop(httpd):
89+
httpd.shutdown()
90+
httpd.server_close()
91+
92+
93+
@pytest.fixture(autouse=True)
94+
def _allow_loopback(monkeypatch):
95+
"""The HTTP source rejects loopback by default after #1664."""
96+
monkeypatch.setenv('XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS', '1')
97+
98+
99+
# ---------------------------------------------------------------------------
100+
# Fixtures
101+
# ---------------------------------------------------------------------------
102+
103+
@pytest.fixture
104+
def multi_band_cog(tmp_path):
105+
"""3-band tiled chunky (planar=1) COG. The writer emits planar=1 by
106+
default for ``(H, W, bands)`` input. Returns ``(path, payload, arr)``
107+
where ``payload`` is the on-disk bytes ready to serve.
108+
"""
109+
h, w, bands = 32, 48, 3
110+
rng = np.random.RandomState(1695)
111+
arr = rng.randint(0, 200, size=(h, w, bands)).astype(np.uint8)
112+
path = str(tmp_path / 'tmp_1695_multi.tif')
113+
write(arr, path, compression='deflate', tiled=True, tile_size=16,
114+
cog=True)
115+
with open(path, 'rb') as f:
116+
payload = f.read()
117+
return path, payload, arr
118+
119+
120+
@pytest.fixture
121+
def single_band_cog(tmp_path):
122+
"""64x64 single-band float32 tiled COG."""
123+
arr = np.arange(64 * 64, dtype=np.float32).reshape(64, 64)
124+
path = str(tmp_path / 'tmp_1695_single.tif')
125+
write(arr, path, compression='deflate', tiled=True, tile_size=16,
126+
cog=True)
127+
with open(path, 'rb') as f:
128+
payload = f.read()
129+
return path, payload, arr
130+
131+
132+
# ---------------------------------------------------------------------------
133+
# Negative band index on multi-band HTTP read
134+
# ---------------------------------------------------------------------------
135+
136+
def test_http_negative_band_rejected(multi_band_cog):
137+
"""``band=-1`` on a multi-band HTTP COG raises ``IndexError`` instead
138+
of silently selecting the last channel.
139+
140+
Before the fix, ``arr[:, :, -1]`` returned the trailing band without
141+
any error. The local path raises
142+
``IndexError("band=-1 out of range for 3-band file.")`` via #1673.
143+
"""
144+
_path, payload, _arr = multi_band_cog
145+
url, httpd, _ = _serve(payload)
146+
try:
147+
with pytest.raises(IndexError, match=r"band=-1 out of range"):
148+
read_to_array(url, band=-1)
149+
finally:
150+
_stop(httpd)
151+
152+
153+
def test_http_negative_band_rejected_via_low_level(multi_band_cog):
154+
"""The low-level ``_read_cog_http`` rejects ``band=-1`` too, not just
155+
the ``read_to_array`` wrapper. Catches any future caller that bypasses
156+
the wrapper.
157+
"""
158+
_path, payload, _arr = multi_band_cog
159+
url, httpd, _ = _serve(payload)
160+
try:
161+
with pytest.raises(IndexError, match=r"band=-1 out of range"):
162+
_read_cog_http(url, band=-1)
163+
finally:
164+
_stop(httpd)
165+
166+
167+
# ---------------------------------------------------------------------------
168+
# Out-of-range band index on multi-band HTTP read
169+
# ---------------------------------------------------------------------------
170+
171+
def test_http_band_equal_to_samples_rejected(multi_band_cog):
172+
"""``band=samples_per_pixel`` (off-by-one) raises the typed error
173+
instead of leaking the raw numpy axis-2-out-of-bounds message.
174+
"""
175+
_path, payload, _arr = multi_band_cog
176+
url, httpd, _ = _serve(payload)
177+
try:
178+
# File has 3 bands; valid indices are 0, 1, 2.
179+
with pytest.raises(IndexError, match=r"band=3 out of range"):
180+
read_to_array(url, band=3)
181+
finally:
182+
_stop(httpd)
183+
184+
185+
def test_http_band_far_above_samples_rejected(multi_band_cog):
186+
"""A wildly out-of-range band index also raises the typed error."""
187+
_path, payload, _arr = multi_band_cog
188+
url, httpd, _ = _serve(payload)
189+
try:
190+
with pytest.raises(IndexError, match=r"band=42 out of range"):
191+
read_to_array(url, band=42)
192+
finally:
193+
_stop(httpd)
194+
195+
196+
# ---------------------------------------------------------------------------
197+
# Single-band HTTP read
198+
# ---------------------------------------------------------------------------
199+
200+
def test_http_nonzero_band_on_single_band_rejected(single_band_cog):
201+
"""``band=1`` on a single-band HTTP COG raises ``IndexError`` instead
202+
of silently returning the full raster.
203+
204+
Before the fix, the post-decode slice at L1660 was gated on
205+
``arr.ndim == 3 and samples_per_pixel > 1`` so ``band=1`` on a 2D
206+
single-band array was dropped on the floor. The local path raises
207+
``IndexError("band=1 requested on a single-band file.")`` via #1673.
208+
"""
209+
_path, payload, _arr = single_band_cog
210+
url, httpd, _ = _serve(payload)
211+
try:
212+
with pytest.raises(IndexError,
213+
match=r"band=1 requested on a single-band file"):
214+
read_to_array(url, band=1)
215+
finally:
216+
_stop(httpd)
217+
218+
219+
def test_http_band_zero_on_single_band_still_works(single_band_cog):
220+
"""``band=0`` on a single-band HTTP COG succeeds.
221+
222+
Negative case: the guard rejects nonzero indices but must not
223+
over-reject the only valid index on a single-band file. Mirrors the
224+
local-path validator's ``if band != 0`` branch.
225+
"""
226+
_path, payload, expected = single_band_cog
227+
url, httpd, _ = _serve(payload)
228+
try:
229+
arr, _ = read_to_array(url, band=0)
230+
np.testing.assert_array_equal(arr, expected)
231+
finally:
232+
_stop(httpd)
233+
234+
235+
# ---------------------------------------------------------------------------
236+
# band=None preserves multi-band behaviour (regression)
237+
# ---------------------------------------------------------------------------
238+
239+
def test_http_band_none_returns_all_bands(multi_band_cog):
240+
"""``band=None`` on a multi-band HTTP COG returns the full
241+
``(H, W, bands)`` array unchanged. Regression for the validator: a
242+
typo that promoted ``None`` to an integer comparison would break
243+
every multi-band HTTP read.
244+
"""
245+
_path, payload, expected = multi_band_cog
246+
url, httpd, _ = _serve(payload)
247+
try:
248+
arr, _ = read_to_array(url)
249+
assert arr.shape == expected.shape
250+
np.testing.assert_array_equal(arr, expected)
251+
finally:
252+
_stop(httpd)
253+
254+
255+
# ---------------------------------------------------------------------------
256+
# Cross-path parity with local-path eager read
257+
# ---------------------------------------------------------------------------
258+
259+
def test_local_and_http_negative_band_parity(multi_band_cog):
260+
"""The local eager path and the HTTP path raise the same
261+
``IndexError`` class with the same diagnostic substring on
262+
``band=-1``. This is the parity guard #1673 set up for local vs dask
263+
vs GPU; the HTTP branch joins after #1695.
264+
"""
265+
path, payload, _arr = multi_band_cog
266+
url, httpd, _ = _serve(payload)
267+
try:
268+
with pytest.raises(IndexError) as local_exc:
269+
read_to_array(path, band=-1)
270+
with pytest.raises(IndexError) as http_exc:
271+
read_to_array(url, band=-1)
272+
assert "band=-1 out of range" in str(local_exc.value)
273+
assert "band=-1 out of range" in str(http_exc.value)
274+
# Same wording, not just same substring.
275+
assert str(local_exc.value) == str(http_exc.value)
276+
finally:
277+
_stop(httpd)
278+
279+
280+
def test_local_and_http_band_equal_to_samples_parity(multi_band_cog):
281+
"""Local and HTTP agree on the off-by-one rejection message."""
282+
path, payload, _arr = multi_band_cog
283+
url, httpd, _ = _serve(payload)
284+
try:
285+
with pytest.raises(IndexError) as local_exc:
286+
read_to_array(path, band=3)
287+
with pytest.raises(IndexError) as http_exc:
288+
read_to_array(url, band=3)
289+
assert "band=3 out of range" in str(local_exc.value)
290+
assert "band=3 out of range" in str(http_exc.value)
291+
assert str(local_exc.value) == str(http_exc.value)
292+
finally:
293+
_stop(httpd)
294+
295+
296+
def test_local_and_http_single_band_nonzero_parity(single_band_cog):
297+
"""Local and HTTP agree on the single-band nonzero rejection
298+
message. Before the fix, the local path raised
299+
``"band=1 requested on a single-band file."`` and the HTTP path
300+
returned the full single-band raster without erroring at all.
301+
"""
302+
path, payload, _arr = single_band_cog
303+
url, httpd, _ = _serve(payload)
304+
try:
305+
with pytest.raises(IndexError) as local_exc:
306+
read_to_array(path, band=1)
307+
with pytest.raises(IndexError) as http_exc:
308+
read_to_array(url, band=1)
309+
assert "single-band file" in str(local_exc.value)
310+
assert "single-band file" in str(http_exc.value)
311+
assert str(local_exc.value) == str(http_exc.value)
312+
finally:
313+
_stop(httpd)
314+
315+
316+
# ---------------------------------------------------------------------------
317+
# open_geotiff wrapper passes the rejection through (smoke test)
318+
# ---------------------------------------------------------------------------
319+
320+
def test_open_geotiff_http_negative_band_rejected(multi_band_cog):
321+
"""The public ``open_geotiff`` wrapper also rejects ``band=-1`` on
322+
HTTP, not just the low-level ``read_to_array``. Users hit the
323+
wrapper, so a regression there would be invisible to the low-level
324+
test.
325+
"""
326+
_path, payload, _arr = multi_band_cog
327+
url, httpd, _ = _serve(payload)
328+
try:
329+
with pytest.raises(IndexError, match=r"band=-1 out of range"):
330+
open_geotiff(url, band=-1)
331+
finally:
332+
_stop(httpd)

0 commit comments

Comments
 (0)