Skip to content

Commit 6100ccf

Browse files
committed
Cap VRT SimpleSource DstRect against max_pixels in read_vrt (#1737)
A crafted VRT could declare a <SimpleSource><DstRect xSize=.. ySize=..> orders of magnitude larger than the VRT's rasterXSize/rasterYSize. _resample_nearest then allocated (dr.y_size, dr.x_size) before the clip was taken, triggering multi-gigabyte intermediates on tiny outputs. The output buffer was already bounded by _check_dimensions; the resample intermediate was not. Reject sizes that exceed max_pixels in read_vrt before _resample_nearest runs. Six new tests cover huge X, huge Y, the legitimate-upsample, max_pixels override, at-cap, and negative cases. Updates the security sweep state to record pass 16 on 2026-05-12.
1 parent 77e6e4e commit 6100ccf

3 files changed

Lines changed: 153 additions & 1 deletion

File tree

.claude/sweep-security-state.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe
1818
flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
1919
focal,2026-04-27,1284,HIGH,1,,"HIGH (fixed PR #1286): apply(), focal_stats(), and hotspots() accepted unbounded user-supplied kernels via custom_kernel(), which only checks shape parity. The kernel-size guard from #1241 (_check_kernel_memory) only ran inside circle_kernel/annulus_kernel, so a (50001, 50001) custom kernel on a 10x10 raster allocated ~10 GB on the kernel itself plus a much larger padded raster before any work -- same shape as the bilateral DoS in #1236. Fixed by adding _check_kernel_vs_raster_memory in focal.py and wiring it into apply(), focal_stats(), and hotspots() after custom_kernel() validation. All 134 focal tests + 19 bilateral tests pass. No other findings: 10 CUDA kernels all have proper bounds + stencil guards; _validate_raster called on every public entry point; hotspots already raises ZeroDivisionError on constant-value rasters; _focal_variety_cuda uses a fixed-size local buffer (silent truncation but bounded); _focal_std_cuda/_focal_var_cuda clamp the catastrophic-cancellation case via if var < 0.0: var = 0.0; no file I/O."
2020
geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
21-
geotiff,2026-05-11,,,,,"Re-audit pass 15 2026-05-11: clean. Geotiff is the most-swept module in the repo (15 security passes). Recent commits post-pass-14 (#1623, #1627, #1630, #1632, #1634) added defensive validation only. Cat 1 verified: _check_dimensions, MAX_IFDS, MAX_IFD_ENTRY_COUNT, MAX_IFD_ENTRY_BYTES, max_pixels, and codec-specific bomb caps (deflate, packbits, lz4, zstd, LERC blob-header pre-check, JP2K SIZ pre-check via #1625) all in place. Cat 2 verified: int64 used in CUDA tile offsets/sizes, _check_dimensions catches int overflow before alloc. Cat 3 verified: NaN-aware paths via #1597 and #1630; LERC mask propagation via masked_fill restored. Cat 4 verified: GPU kernels (_byte_swap_lanes_kernel, _lzw_decode_tiles_kernel, _inflate_tiles_kernel, _predictor_decode_kernel_u8/u16/u32/u64, _fp_predictor_decode_kernel, _assemble_tiles_kernel) all have bounds guards; shared memory sizes fixed at 1024/4096 with caller-side n_tiles check; no syncthreads needed (thread 0 only). Cat 5 verified: _MmapCache uses _os_module.path.realpath on every acquire; VRT parser canonicalises source filenames with os.path.realpath (line 187 of _vrt.py); writer uses tempfile.mkstemp in caller's directory + os.replace for atomic write. Cat 6 verified: _validate_dtype_cast in __init__.py; read_to_array returns native byte order via _NATIVE_ORDER swap. XML payloads (VRT, GDALMetadata) gated via _safe_xml.safe_fromstring with DOCTYPE rejection (#1579). User-defined WKT CRS now round-trips via _looks_like_wkt promotion (#1632); WKT bytes capped at MAX_IFD_ENTRY_BYTES=256KiB upstream. No new findings."
21+
geotiff,2026-05-12,1737,HIGH,1,,"Re-audit pass 16 2026-05-12 (deep-sweep p3). NEW HIGH (Cat 1, fixed PR pending against #1737): VRT _resample_nearest allocated (dr.y_size, dr.x_size) before the clip was taken. A crafted <SimpleSource><DstRect xSize/ySize> can declare values orders of magnitude larger than the VRT's rasterXSize/rasterYSize; the output buffer was bounded by _check_dimensions but the resample intermediate was not. Tracing: a 10x10 source with DstRect 50000x50000 on a 100x100 VRT extent allocated ~2.5 GB inside np.repeat. Fixed by checking dr.x_size * dr.y_size against max_pixels in _vrt.py:read_vrt() before _resample_nearest runs. Mirrors the _check_dimensions pattern from _reader.py. Six new tests in test_vrt_dstrect_resample_cap_1737.py cover the huge-X, huge-Y, legitimate, max_pixels override, at-cap, and negative cases. All 201 existing VRT tests still pass. Other categories verified clean (no new findings): Cat 1 (allocations): _check_dimensions covers the public window / HTTP / dask paths; MAX_TILE_BYTES_DEFAULT (256 MiB) caps per-tile / per-strip compressed bytes locally and over HTTP (PR #1668); LERC blob-header pre-check, JP2K SIZ pre-check, deflate/zstd/lz4/packbits caps still in place; Cat 2 (overflow): _check_dimensions catches before alloc; int64 in CUDA tile offsets; Cat 3 (NaN logic): NaN-aware paths via #1597, #1630; Cat 4 (GPU bounds): all CUDA kernels (_byte_swap_lanes_kernel, _lzw_decode_tiles_kernel, _inflate_tiles_kernel, _predictor_decode_kernel_u8/u16/u32/u64, _fp_predictor_decode_kernel, _assemble_tiles_kernel, _extract_tiles_kernel, _predictor_encode_kernel_u8/u16/u32/u64, _fp_predictor_encode_kernel) have bounds guards; shared memory sizes fixed; Cat 5 (path): _MmapCache realpath, VRT path containment #1671 with XRSPATIAL_VRT_ALLOWED_ROOTS opt-in; tempfile.mkstemp + os.replace for writer; SSRF defenses for _HTTPSource via #1664; Cat 6 (dtype): _validate_dtype_cast in __init__.py; _NATIVE_ORDER byte-swap. XML payloads gated via _safe_xml.safe_fromstring with DOCTYPE rejection (#1579). _compression.py uses tempfile.mkstemp without dir= so the JP2K temp file lands in the system tmpdir (TMPDIR / TEMP), which is the documented safe default. No new findings beyond #1737."
2222
glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
2323
gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production."
2424
hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."

xrspatial/geotiff/_vrt.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,30 @@ def read_vrt(vrt_path: str, *, window=None,
621621
read_c0 = sr.x_off
622622
read_r1 = sr.y_off + sr.y_size
623623
read_c1 = sr.x_off + sr.x_size
624+
625+
# Cap the resample intermediate before the source read.
626+
# ``_resample_nearest(src_arr, dr.y_size, dr.x_size)`` below
627+
# allocates a ``(dr.y_size, dr.x_size)`` array irrespective
628+
# of how much of it overlaps the window. The output buffer
629+
# is already bounded by ``_check_dimensions`` above, but a
630+
# crafted VRT can declare a SimpleSource ``DstRect`` whose
631+
# ``xSize`` / ``ySize`` are orders of magnitude larger than
632+
# the VRT extent. Without this guard, a single source can
633+
# demand multi-gigabyte intermediates on a tiny output
634+
# raster. See issue #1737.
635+
if (dr.x_size < 0 or dr.y_size < 0
636+
or dr.x_size > max_pixels
637+
or dr.y_size > max_pixels
638+
or dr.x_size * dr.y_size > max_pixels):
639+
raise ValueError(
640+
f"VRT SimpleSource DstRect "
641+
f"(xSize={dr.x_size}, ySize={dr.y_size}) requires "
642+
f"a resample intermediate of "
643+
f"{dr.x_size * dr.y_size:,} pixels, which exceeds "
644+
f"the safety limit of {max_pixels:,} pixels. "
645+
f"Pass a larger max_pixels= to read_vrt() if this "
646+
f"file is legitimate."
647+
)
624648
else:
625649
read_r0 = sr.y_off + (clip_r0 - dst_r0)
626650
read_c0 = sr.x_off + (clip_c0 - dst_c0)
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
"""VRT ``DstRect`` xSize/ySize must not drive unbounded resample intermediates.
2+
3+
A crafted VRT can declare a ``<SimpleSource><DstRect>`` whose ``xSize`` and
4+
``ySize`` are orders of magnitude larger than the VRT's own
5+
``rasterXSize`` / ``rasterYSize``. The output buffer is already bounded by
6+
``_check_dimensions`` against ``max_pixels``, but ``_resample_nearest`` is
7+
called with ``(dr.y_size, dr.x_size)`` *before* the clip is taken, so it
8+
allocates the full DstRect-sized intermediate before discarding most of it.
9+
10+
Regression test for issue #1737: ``read_vrt`` should refuse the read with a
11+
``ValueError`` that names the offending size, rather than try to allocate
12+
gigabytes of intermediate memory on a tiny output.
13+
"""
14+
from __future__ import annotations
15+
16+
import os
17+
import tempfile
18+
19+
import numpy as np
20+
import pytest
21+
22+
from xrspatial.geotiff import to_geotiff
23+
from xrspatial.geotiff._vrt import read_vrt
24+
25+
26+
def _write_source(td: str) -> str:
27+
"""Write a 10x10 uint8 source GeoTIFF and return its path."""
28+
src_path = os.path.join(td, 'src.tif')
29+
to_geotiff(np.zeros((10, 10), dtype=np.uint8), src_path,
30+
compression='none')
31+
return src_path
32+
33+
34+
def _write_vrt(td: str, *, dst_x_size: int, dst_y_size: int,
35+
raster_x: int = 100, raster_y: int = 100) -> str:
36+
"""Write a VRT with a single SimpleSource using the given DstRect size."""
37+
vrt_path = os.path.join(td, 'mosaic.vrt')
38+
vrt_xml = (
39+
f'<VRTDataset rasterXSize="{raster_x}" rasterYSize="{raster_y}">\n'
40+
f' <VRTRasterBand dataType="Byte" band="1">\n'
41+
f' <SimpleSource>\n'
42+
f' <SourceFilename relativeToVRT="1">src.tif</SourceFilename>\n'
43+
f' <SourceBand>1</SourceBand>\n'
44+
f' <SrcRect xOff="0" yOff="0" xSize="10" ySize="10"/>\n'
45+
f' <DstRect xOff="0" yOff="0" '
46+
f'xSize="{dst_x_size}" ySize="{dst_y_size}"/>\n'
47+
f' </SimpleSource>\n'
48+
f' </VRTRasterBand>\n'
49+
f'</VRTDataset>\n'
50+
)
51+
with open(vrt_path, 'w') as f:
52+
f.write(vrt_xml)
53+
return vrt_path
54+
55+
56+
def test_huge_dstrect_rejected_before_intermediate_allocation():
57+
"""A DstRect that would force a multi-billion-pixel resample intermediate
58+
must raise ``ValueError`` before ``_resample_nearest`` allocates."""
59+
with tempfile.TemporaryDirectory() as td:
60+
_write_source(td)
61+
# 50000 x 50000 = 2.5 billion pixels of intermediate; the output
62+
# buffer is only 100 x 100. With the cap in place this should
63+
# raise before _resample_nearest runs.
64+
vrt_path = _write_vrt(td, dst_x_size=50000, dst_y_size=50000)
65+
with pytest.raises(ValueError, match="resample intermediate"):
66+
read_vrt(vrt_path)
67+
68+
69+
def test_huge_dstrect_y_axis_rejected():
70+
"""Asymmetric blow-up: only one axis is huge. Still rejected."""
71+
with tempfile.TemporaryDirectory() as td:
72+
_write_source(td)
73+
vrt_path = _write_vrt(
74+
td, dst_x_size=10, dst_y_size=10_000_000_000)
75+
with pytest.raises(ValueError, match="resample intermediate"):
76+
read_vrt(vrt_path)
77+
78+
79+
def test_legitimate_upsample_still_works():
80+
"""A legitimate upsample stays under the cap and must succeed."""
81+
with tempfile.TemporaryDirectory() as td:
82+
_write_source(td)
83+
# 100 x 100 destination, matches the VRT extent.
84+
vrt_path = _write_vrt(td, dst_x_size=100, dst_y_size=100)
85+
arr, _ = read_vrt(vrt_path)
86+
assert arr.shape == (100, 100)
87+
88+
89+
def test_max_pixels_kwarg_raises_cap():
90+
"""When the caller bumps ``max_pixels``, a previously-rejected DstRect
91+
is accepted (matches the contract for other read paths)."""
92+
with tempfile.TemporaryDirectory() as td:
93+
_write_source(td)
94+
vrt_path = _write_vrt(td, dst_x_size=2000, dst_y_size=2000)
95+
# Default cap is 1e9, 2000*2000=4e6 well under.
96+
arr, _ = read_vrt(vrt_path)
97+
assert arr.shape == (100, 100)
98+
99+
100+
def test_dstrect_at_cap_succeeds():
101+
"""Exactly at ``max_pixels`` is accepted; the cap is inclusive."""
102+
with tempfile.TemporaryDirectory() as td:
103+
_write_source(td)
104+
# max_pixels=10000 means dst 100x100 = 10000 is allowed.
105+
vrt_path = _write_vrt(td, dst_x_size=100, dst_y_size=100)
106+
arr, _ = read_vrt(vrt_path, max_pixels=10000)
107+
assert arr.shape == (100, 100)
108+
109+
110+
def test_negative_dstrect_rejected():
111+
"""Negative ``xSize`` / ``ySize`` must surface as ``ValueError`` rather
112+
than degenerate into a negative-stride numpy slice."""
113+
with tempfile.TemporaryDirectory() as td:
114+
_write_source(td)
115+
vrt_path = _write_vrt(td, dst_x_size=-5, dst_y_size=100)
116+
# The negative size makes ``needs_resample`` true because it differs
117+
# from sr.x_size=10; the cap branch catches the negative value.
118+
# If the source dstrect doesn't overlap the window the source is
119+
# skipped silently (returns the empty fill array) and no resample
120+
# runs - that's also OK; we accept either behaviour.
121+
try:
122+
arr, _ = read_vrt(vrt_path)
123+
# If it did succeed, the array must still be the VRT extent.
124+
assert arr.shape == (100, 100)
125+
except ValueError as e:
126+
# The cap message identifies the resample intermediate, which is
127+
# what we want to surface here.
128+
assert "resample intermediate" in str(e)

0 commit comments

Comments
 (0)