Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/sweep-security-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe
flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
focal,2026-04-27,1284,HIGH,1,,"HIGH (fixed PR #1286): apply(), focal_stats(), and hotspots() accepted unbounded user-supplied kernels via custom_kernel(), which only checks shape parity. The kernel-size guard from #1241 (_check_kernel_memory) only ran inside circle_kernel/annulus_kernel, so a (50001, 50001) custom kernel on a 10x10 raster allocated ~10 GB on the kernel itself plus a much larger padded raster before any work -- same shape as the bilateral DoS in #1236. Fixed by adding _check_kernel_vs_raster_memory in focal.py and wiring it into apply(), focal_stats(), and hotspots() after custom_kernel() validation. All 134 focal tests + 19 bilateral tests pass. No other findings: 10 CUDA kernels all have proper bounds + stencil guards; _validate_raster called on every public entry point; hotspots already raises ZeroDivisionError on constant-value rasters; _focal_variety_cuda uses a fixed-size local buffer (silent truncation but bounded); _focal_std_cuda/_focal_var_cuda clamp the catastrophic-cancellation case via if var < 0.0: var = 0.0; no file I/O."
geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
geotiff,2026-05-11,1607,MEDIUM,5,,"MEDIUM (Cat 5 XML injection, filed #1607): write_vrt in _vrt.py embedded the caller-supplied crs_wkt and source filenames into VRT XML via plain f-strings, so values carrying XML special chars (< > & "" ') could break the document or inject elements like <Metadata><MDI key='AREA_OR_POINT'>Point</MDI></Metadata> (verified the injection flipped raster_type from 'area' to 'point' on a round trip). Fix: route every text slot through xml.sax.saxutils.escape via new _xml_text/_xml_attr helpers; numeric fields stay int/float literal interpolation. Other categories clean: parse_header / parse_ifd enforce MAX_IFD_ENTRY_COUNT, MAX_IFD_ENTRY_BYTES, MAX_IFDS. _safe_xml.safe_fromstring rejects DOCTYPE. read_to_array gates dims via MAX_PIXELS_DEFAULT (1B px) and HTTP per-tile via XRSPATIAL_COG_MAX_TILE_BYTES (256 MiB). validate_tile_layout runs before both CPU and GPU tile decoders (#1219). All CUDA kernels in _gpu_decode.py guard tile_idx >= tile_offsets.shape[0] and write loops gate on out_pos < dst_len. LZW/deflate decoders are called only with expected_size > 0 from _decode_strip_or_tile, so the decompression-bomb cap fires. _FileSource uses os.path.realpath; _vrt.parse_vrt canonicalises SourceFilename via realpath. No int32 overflow paths."
geotiff,2026-05-11,1614,MEDIUM,5,,"MEDIUM (Cat 5 XML injection, filed #1614): _build_gdal_metadata_xml in _geotags.py used plain f-strings to embed caller-supplied keys and values into the GDALMetadata XML payload (tag 42112), so a key or value carrying XML special chars (< > & "" ') silently produced malformed XML (ParseError on read -> attrs round-tripped as {}) or let a crafted key inject attributes into <Item> (e.g. name='foo"" malicious=""bar'). Fix mirrors #1607: route every text slot through xml.sax.saxutils.escape and every attribute slot through quoteattr; sample indices are emitted from int() casts. Same bug class as #1607 but on a separate code path the earlier sweep did not cover. Other categories clean (rest of geotiff already hardened by #1607/#1579/#1584/#1219/#1196/#1189). Reachable from to_geotiff, _write_vrt_tiled, and write_geotiff_gpu whenever a caller passes attrs['gdal_metadata'] as a dict."
glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production."
hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."
Expand Down
23 changes: 21 additions & 2 deletions xrspatial/geotiff/_geotags.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,15 +199,34 @@ def _build_gdal_metadata_xml(meta: dict) -> str:

Accepts the same dict format that _parse_gdal_metadata produces:
string keys for dataset-level, (name, band) tuples for per-band.

Every caller-supplied text and attribute slot is routed through
``xml.sax.saxutils.escape`` / ``quoteattr`` so a key or value
containing XML special characters (``& < > " '``) cannot break
the document or inject extra elements. Sample indices are emitted
from an ``int(...)`` cast and need no escaping. See issue #1614.
"""
from xml.sax.saxutils import escape as _xml_escape, quoteattr as _xml_quoteattr

def _text(v) -> str:
if v is None:
return ""
return _xml_escape(str(v), {'"': "&quot;", "'": "&apos;"})

def _attr(v) -> str:
if v is None:
return '""'
return _xml_quoteattr(str(v))

lines = ['<GDALMetadata>']
for key, value in meta.items():
if isinstance(key, tuple):
name, sample = key
lines.append(
f' <Item name="{name}" sample="{sample}">{value}</Item>')
f' <Item name={_attr(name)} sample="{int(sample)}">'
f'{_text(value)}</Item>')
else:
lines.append(f' <Item name="{key}">{value}</Item>')
lines.append(f' <Item name={_attr(key)}>{_text(value)}</Item>')
lines.append('</GDALMetadata>')
return '\n'.join(lines) + '\n'

Expand Down
128 changes: 128 additions & 0 deletions xrspatial/geotiff/tests/test_gdal_metadata_xml_escape_1614.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""Tests for #1614: _build_gdal_metadata_xml escapes XML special chars.

The serialiser previously interpolated keys and values with plain
f-strings, which corrupted the document on any value containing
``& < > " '`` and let a caller-crafted key inject extra attributes
into the ``<Item>`` element. This file pins the escaped behaviour:
all five XML special characters round-trip through write/read
without loss, and attribute slots refuse injection attempts.
"""
from __future__ import annotations

import xml.etree.ElementTree as ET

import numpy as np
import pytest

from xrspatial.geotiff._geotags import (
_build_gdal_metadata_xml,
_parse_gdal_metadata,
)


class TestBuildGdalMetadataXMLEscape:
"""Verify each XML special character round-trips intact."""

@pytest.mark.parametrize(
"payload",
[
"A & B",
"value < 5",
"value > 5",
'quote "x"',
"apos 'x'",
"&<>\"'", # all five at once
"no specials here", # control: plain text still works
],
)
def test_value_round_trip_for_special_chars(self, payload):
meta = {"NOTE": payload}
xml = _build_gdal_metadata_xml(meta)
# Document parses as well-formed XML
ET.fromstring(xml)
# Round-trip via the package's own parser reproduces the value
assert _parse_gdal_metadata(xml) == meta

@pytest.mark.parametrize(
"payload",
[
"A & B",
"value < 5",
'quote "x"',
],
)
def test_per_band_value_round_trip(self, payload):
meta = {("STAT", 0): payload}
xml = _build_gdal_metadata_xml(meta)
ET.fromstring(xml)
assert _parse_gdal_metadata(xml) == meta

def test_attribute_injection_via_name_blocked(self):
"""A key containing a quote cannot inject extra attributes."""
injection_key = 'foo" malicious="bar'
meta = {injection_key: "value"}
xml = _build_gdal_metadata_xml(meta)

root = ET.fromstring(xml)
items = root.findall("Item")
assert len(items) == 1
# Only the legitimate ``name`` attribute exists; the injection
# attempt did not create a second attribute.
assert set(items[0].attrib.keys()) == {"name"}
assert items[0].attrib["name"] == injection_key
assert items[0].text == "value"
# Round-trip reproduces the original key intact.
assert _parse_gdal_metadata(xml) == meta

def test_element_injection_via_value_blocked(self):
"""A value with closing-tag syntax cannot inject a sibling Item."""
meta = {"OUTER": "</Item><Item name=\"X\">stuff</Item><!--"}
xml = _build_gdal_metadata_xml(meta)

root = ET.fromstring(xml)
items = root.findall("Item")
# Only the one Item we wrote -- the injection payload sits inside
# its text content, not as a new element.
assert len(items) == 1
assert items[0].attrib == {"name": "OUTER"}
assert _parse_gdal_metadata(xml) == meta

def test_unicode_value_unaffected(self):
"""Non-ASCII text passes through unchanged (only XML specials escape)."""
meta = {"DESC": "élévation ☃"}
xml = _build_gdal_metadata_xml(meta)
ET.fromstring(xml)
assert _parse_gdal_metadata(xml) == meta

def test_existing_clean_string_format_unchanged(self):
"""Plain keys/values still emit the original double-quoted form."""
xml = _build_gdal_metadata_xml({"DataType": "Generic"})
# quoteattr picks double quotes when the string has no double
# quote of its own, so the legacy assertion ``name="DataType"``
# still holds and downstream tools see the same output.
assert '<Item name="DataType">Generic</Item>' in xml


class TestToGeotiffMetadataRoundTrip:
"""End-to-end: special-char metadata survives a write/read cycle."""

def test_to_geotiff_special_chars_round_trip(self, tmp_path):
import xarray as xr
from xrspatial.geotiff import open_geotiff, to_geotiff

arr = np.ones((4, 4), dtype=np.float32)
da = xr.DataArray(arr, dims=["y", "x"])
da.attrs["gdal_metadata"] = {
"NOTE": "value < 5 & ok",
("STAT", 0): 'quote "x"',
}

path = tmp_path / "meta.tif"
to_geotiff(da, str(path))

opened = open_geotiff(str(path))
round_tripped = opened.attrs.get("gdal_metadata")
assert round_tripped is not None, opened.attrs
# Both dataset-level and per-band entries survive intact.
assert round_tripped.get("NOTE") == "value < 5 & ok"
assert round_tripped.get(("STAT", 0)) == 'quote "x"'
Loading