Skip to content

Commit edec07c

Browse files
authored
rasterize: propagate like.attrs/coords and emit _FillValue (#2018) (#2024)
* rasterize: propagate like.attrs/coords and emit _FillValue (#2018) When the caller passed ``like=template``, ``rasterize()`` dropped all of the template's ``attrs`` (including ``crs``, ``res``, ``transform``, ``nodatavals``) and rebuilt the output coords via ``np.linspace`` from re-derived bounds. The output then no longer ``xr.align``-ed with the template, and chained pipelines like ``slope(rasterize(gdf, like=elevation))`` silently saw a no-CRS, no-res raster and recomputed cell-size from coords -- the same class of bug as the #1407 sky_view_factor cellsize issue. Three fixes, all at the same site (line 2227, where the output ``xr.DataArray`` is built): 1. ``_extract_grid_from_like`` now also returns the input ``x`` and ``y`` coords and ``attrs``. ``rasterize()`` copies ``like.attrs`` onto the output. 2. When the resolved output grid matches ``like`` (same width, height, and bounds), the output reuses ``like.coords['x']`` and ``like.coords['y']`` directly so the result is bit-identical to the template and ``xr.align`` keeps working. 3. The ``fill`` value is now emitted as ``attrs['_FillValue']`` and ``attrs['nodatavals']`` when ``fill`` is not NaN. Downstream tools (``to_geotiff``, custom masks) can identify nodata pixels. All four backends (numpy, cupy, dask+numpy, dask+cupy) route through the same final ``xr.DataArray`` constructor, so the fix is in one place and behaves identically across backends. Adds ``TestMetadataPropagation`` to ``test_rasterize.py`` with 9 cases covering attrs propagation, bit-identical coord reuse, fill-value emission, isolation from the template's attrs dict, and parity across all four backends. Closes #2018. * rasterize: address review suggestions on metadata propagation Strip inherited nodata / _FillValue / nodatavals from like.attrs before emitting a fresh triplet keyed off the actual fill, so stale sentinels from a prior round-trip can't outlive the new fill. Detect numpy-typed NaN via float() + np.isnan() so np.float32(np.nan) is treated as NaN. Propagate non-dim coords (e.g. rioxarray's spatial_ref). Replace the float-equal bounds check in the coord-reuse predicate with a check on "caller didn't override bounds/resolution". Adds tests for inherited nodata, numpy-scalar NaN, spatial_ref propagation, and a geotiff round-trip pinning the user-visible #2018 fix.
1 parent cd95746 commit edec07c

3 files changed

Lines changed: 296 additions & 9 deletions

File tree

.claude/sweep-metadata-state.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
22
geotiff,2026-05-15,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit."
3+
rasterize,2026-05-17,2018,HIGH,1;2;4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean."
34
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.

xrspatial/rasterize.py

Lines changed: 76 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1979,7 +1979,13 @@ def _parse_input(geometries, column=None, columns=None):
19791979

19801980

19811981
def _extract_grid_from_like(like):
1982-
"""Extract width, height, bounds, dtype from a template DataArray."""
1982+
"""Extract width, height, bounds, dtype from a template DataArray.
1983+
1984+
Also returns the input coords and attrs so the caller can propagate
1985+
them onto the output raster. Reusing ``like.coords`` directly (rather
1986+
than rebuilding with ``linspace``) keeps the output bit-identical to
1987+
the template and so preserves ``xr.align`` compatibility.
1988+
"""
19831989
if not isinstance(like, xr.DataArray):
19841990
raise TypeError("'like' must be an xr.DataArray")
19851991
if like.ndim != 2 or 'y' not in like.dims or 'x' not in like.dims:
@@ -2007,7 +2013,17 @@ def _extract_grid_from_like(like):
20072013
ymin = float(np.min(y)) - py / 2
20082014
ymax = float(np.max(y)) + py / 2
20092015

2010-
return width, height, (xmin, ymin, xmax, ymax), dt
2016+
# Carry through any non-dim coords (e.g. rioxarray's ``spatial_ref``
2017+
# CRS coord). The y/x dim coords are returned separately because the
2018+
# caller decides whether to reuse them (bit-identical grid) or build
2019+
# fresh ones (resized grid).
2020+
extra_coords = {
2021+
k: v for k, v in like.coords.items() if k not in ('x', 'y')
2022+
}
2023+
2024+
return (width, height, (xmin, ymin, xmax, ymax), dt,
2025+
like.coords['x'], like.coords['y'],
2026+
extra_coords, dict(like.attrs))
20112027

20122028

20132029
# ---------------------------------------------------------------------------
@@ -2170,8 +2186,13 @@ def rasterize(
21702186

21712187
# Extract defaults from template raster
21722188
like_width = like_height = like_bounds = like_dtype = None
2189+
like_x_coord = like_y_coord = None
2190+
like_extra_coords = {}
2191+
like_attrs = None
2192+
bounds_explicit = bounds is not None
21732193
if like is not None:
2174-
like_width, like_height, like_bounds, like_dtype = \
2194+
(like_width, like_height, like_bounds, like_dtype,
2195+
like_x_coord, like_y_coord, like_extra_coords, like_attrs) = \
21752196
_extract_grid_from_like(like)
21762197

21772198
# Parse input geometries
@@ -2269,15 +2290,61 @@ def rasterize(
22692290
final_height, final_width, fill, final_dtype,
22702291
all_touched, merge_fn)
22712292

2272-
# Build coordinates
2273-
px = (xmax - xmin) / final_width
2274-
py = (ymax - ymin) / final_height
2275-
x_coords = np.linspace(xmin + px / 2, xmax - px / 2, final_width)
2276-
y_coords = np.linspace(ymax - py / 2, ymin + py / 2, final_height)
2293+
# Build coordinates. When the caller didn't override the grid (no
2294+
# explicit width/height/bounds/resolution that resizes the output),
2295+
# reuse like.coords directly so the output is bit-identical to the
2296+
# template and xr.align keeps working. Float-equal bound comparison
2297+
# would be fragile, so key the reuse on size + "bounds weren't
2298+
# overridden" instead.
2299+
reuse_like_coords = (
2300+
like_x_coord is not None
2301+
and like_x_coord.sizes['x'] == final_width
2302+
and like_y_coord.sizes['y'] == final_height
2303+
and not bounds_explicit
2304+
and resolution is None
2305+
)
2306+
if reuse_like_coords:
2307+
x_coords = like_x_coord
2308+
y_coords = like_y_coord
2309+
else:
2310+
px = (xmax - xmin) / final_width
2311+
py = (ymax - ymin) / final_height
2312+
x_coords = np.linspace(xmin + px / 2, xmax - px / 2, final_width)
2313+
y_coords = np.linspace(ymax - py / 2, ymin + py / 2, final_height)
2314+
2315+
# Build attrs. Start from like.attrs if given so chained spatial
2316+
# pipelines (slope(rasterize(gdf, like=elevation))) see the same res,
2317+
# crs, transform, etc. as the template. Strip any inherited nodata
2318+
# keys first -- the template's old fill value almost certainly
2319+
# disagrees with this call's fill, and the geotiff writer's
2320+
# _resolve_nodata_attr would otherwise tag pixels with a stale
2321+
# sentinel. Then re-emit a consistent triplet keyed off the actual
2322+
# fill: nodata (xrspatial's primary key), _FillValue (CF), and
2323+
# nodatavals (rioxarray's per-band tuple).
2324+
out_attrs = like_attrs if like_attrs is not None else {}
2325+
for k in ('nodata', '_FillValue', 'nodatavals'):
2326+
out_attrs.pop(k, None)
2327+
try:
2328+
fill_as_float = float(fill)
2329+
fill_is_nan = np.isnan(fill_as_float)
2330+
except (TypeError, ValueError):
2331+
fill_is_nan = False
2332+
if not fill_is_nan:
2333+
out_attrs['nodata'] = fill
2334+
out_attrs['_FillValue'] = fill
2335+
out_attrs['nodatavals'] = (fill,)
2336+
2337+
# Combine y/x dim coords with any non-dim coords carried from the
2338+
# template (e.g. rioxarray's spatial_ref CRS coord).
2339+
out_coords = {'y': y_coords, 'x': x_coords}
2340+
for k, v in like_extra_coords.items():
2341+
if k not in out_coords:
2342+
out_coords[k] = v
22772343

22782344
return xr.DataArray(
22792345
out,
22802346
name=name,
22812347
dims=['y', 'x'],
2282-
coords={'y': y_coords, 'x': x_coords},
2348+
coords=out_coords,
2349+
attrs=out_attrs,
22832350
)

xrspatial/tests/test_rasterize.py

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1592,3 +1592,222 @@ def test_rasterize_public_path_still_works_after_int64_change(self):
15921592
)
15931593
# The polygon covers a 6x6 inner block; just check it burned in.
15941594
assert np.any(result.values == 3.0)
1595+
1596+
1597+
# ---------------------------------------------------------------------------
1598+
# Metadata propagation (attrs from `like`, coord reuse, _FillValue)
1599+
# ---------------------------------------------------------------------------
1600+
1601+
def _make_like(width=10, height=10, attrs=None, dtype=np.float64):
1602+
"""Build a 2D template DataArray with georeferenced coords and attrs."""
1603+
x = np.linspace(0.5, width - 0.5, width)
1604+
y = np.linspace(height - 0.5, 0.5, height)
1605+
data = np.zeros((height, width), dtype=dtype)
1606+
return xr.DataArray(
1607+
data, dims=['y', 'x'],
1608+
coords={'y': y, 'x': x},
1609+
attrs=dict(attrs or {}),
1610+
)
1611+
1612+
1613+
class TestMetadataPropagation:
1614+
"""Verify `like.attrs` propagates, `like.coords` are reused, and the
1615+
fill value lands in `_FillValue` / `nodatavals`.
1616+
"""
1617+
1618+
def test_like_propagates_attrs(self):
1619+
attrs = {
1620+
'crs': 'EPSG:32610',
1621+
'transform': (1.0, 0.0, 0.0, 0.0, -1.0, 10.0),
1622+
'res': (1.0, 1.0),
1623+
}
1624+
like = _make_like(attrs=attrs)
1625+
result = rasterize(
1626+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0,
1627+
)
1628+
assert result.attrs.get('crs') == 'EPSG:32610'
1629+
assert result.attrs.get('transform') == \
1630+
(1.0, 0.0, 0.0, 0.0, -1.0, 10.0)
1631+
assert result.attrs.get('res') == (1.0, 1.0)
1632+
1633+
def test_like_preserves_coords_bit_identical(self):
1634+
like = _make_like()
1635+
result = rasterize(
1636+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0,
1637+
)
1638+
# Output reuses like.coords exactly so xr.align keeps working.
1639+
np.testing.assert_array_equal(
1640+
result.coords['x'].values, like.coords['x'].values)
1641+
np.testing.assert_array_equal(
1642+
result.coords['y'].values, like.coords['y'].values)
1643+
1644+
def test_like_attrs_isolated_from_template(self):
1645+
"""Mutating output attrs must not mutate the template's attrs."""
1646+
attrs = {'crs': 'EPSG:32610'}
1647+
like = _make_like(attrs=attrs)
1648+
result = rasterize(
1649+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0,
1650+
)
1651+
result.attrs['crs'] = 'EPSG:4326'
1652+
assert like.attrs['crs'] == 'EPSG:32610'
1653+
1654+
def test_fill_value_recorded_when_not_nan(self):
1655+
result = rasterize(
1656+
[(box(2, 2, 8, 8), 1.0)],
1657+
width=10, height=10, bounds=(0, 0, 10, 10),
1658+
fill=-9999, dtype=np.int32,
1659+
)
1660+
# Emit the full triplet: xrspatial's primary (nodata), the CF
1661+
# alias (_FillValue), and rioxarray's per-band tuple (nodatavals).
1662+
# All three must agree with fill so downstream consumers all
1663+
# resolve to the same sentinel regardless of which key they read.
1664+
assert result.attrs.get('nodata') == -9999
1665+
assert result.attrs.get('_FillValue') == -9999
1666+
assert result.attrs.get('nodatavals') == (-9999,)
1667+
# Sentinel round-trips cleanly when the array is cast to its
1668+
# declared dtype -- pins #1973-style dtype mismatch regressions.
1669+
assert np.array(-9999, dtype=result.dtype) == \
1670+
np.array(result.attrs['_FillValue'], dtype=result.dtype)
1671+
1672+
def test_fill_value_omitted_for_nan(self):
1673+
"""Default fill=NaN should not pollute attrs with nodata keys."""
1674+
result = rasterize(
1675+
[(box(2, 2, 8, 8), 1.0)],
1676+
width=10, height=10, bounds=(0, 0, 10, 10),
1677+
)
1678+
assert 'nodata' not in result.attrs
1679+
assert '_FillValue' not in result.attrs
1680+
assert 'nodatavals' not in result.attrs
1681+
1682+
def test_no_like_no_attrs_pollution(self):
1683+
"""Without `like` and with NaN fill, attrs must stay empty."""
1684+
result = rasterize(
1685+
[(box(2, 2, 8, 8), 1.0)],
1686+
width=10, height=10, bounds=(0, 0, 10, 10),
1687+
)
1688+
assert result.attrs == {}
1689+
1690+
@skip_no_dask
1691+
def test_like_attrs_propagated_dask(self):
1692+
like = _make_like(attrs={'crs': 'EPSG:32610'})
1693+
result = rasterize(
1694+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0, chunks=5,
1695+
)
1696+
# The dask backend routes through the same final xr.DataArray
1697+
# constructor, so attrs / coords / _FillValue behave identically.
1698+
assert result.attrs.get('crs') == 'EPSG:32610'
1699+
assert result.attrs.get('_FillValue') == 0
1700+
np.testing.assert_array_equal(
1701+
result.coords['x'].values, like.coords['x'].values)
1702+
1703+
@skip_no_cuda
1704+
def test_like_attrs_propagated_cupy(self):
1705+
like = _make_like(attrs={'crs': 'EPSG:32610'})
1706+
result = rasterize(
1707+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0, use_cuda=True,
1708+
)
1709+
assert result.attrs.get('crs') == 'EPSG:32610'
1710+
assert result.attrs.get('_FillValue') == 0
1711+
1712+
@skip_no_cuda
1713+
@skip_no_dask
1714+
def test_like_attrs_propagated_dask_cupy(self):
1715+
like = _make_like(attrs={'crs': 'EPSG:32610'})
1716+
result = rasterize(
1717+
[(box(2, 2, 8, 8), 1.0)],
1718+
like=like, fill=0, use_cuda=True, chunks=5,
1719+
)
1720+
assert result.attrs.get('crs') == 'EPSG:32610'
1721+
assert result.attrs.get('_FillValue') == 0
1722+
1723+
def test_like_stale_nodata_keys_replaced_with_fill(self):
1724+
"""Inherited nodata keys from like must not outlive the new fill.
1725+
1726+
The geotiff writer's _resolve_nodata_attr checks ``nodata`` →
1727+
``nodatavals`` → ``_FillValue``. If a previous round-trip left
1728+
any of them on the template, they have to be replaced (or
1729+
cleared) so the writer doesn't tag pixels with a stale sentinel
1730+
that disagrees with the actual fill in the new array.
1731+
"""
1732+
like = _make_like(attrs={
1733+
'nodata': -9999,
1734+
'_FillValue': -9999,
1735+
'nodatavals': (-9999,),
1736+
'crs': 'EPSG:32610',
1737+
})
1738+
# Case 1: explicit fill replaces all three keys.
1739+
result = rasterize(
1740+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0,
1741+
)
1742+
assert result.attrs.get('nodata') == 0
1743+
assert result.attrs.get('_FillValue') == 0
1744+
assert result.attrs.get('nodatavals') == (0,)
1745+
assert result.attrs.get('crs') == 'EPSG:32610'
1746+
# Case 2: NaN fill strips inherited nodata keys outright -- the
1747+
# actual unwritten pixels are NaN, not -9999, so advertising
1748+
# -9999 would lie to downstream tools.
1749+
result = rasterize(
1750+
[(box(2, 2, 8, 8), 1.0)], like=like,
1751+
)
1752+
assert 'nodata' not in result.attrs
1753+
assert '_FillValue' not in result.attrs
1754+
assert 'nodatavals' not in result.attrs
1755+
assert result.attrs.get('crs') == 'EPSG:32610'
1756+
1757+
def test_numpy_float_nan_fill_treated_as_nan(self):
1758+
"""Numpy scalar NaN must be detected as NaN, not emitted as fill.
1759+
1760+
``isinstance(np.float32(np.nan), float)`` is False, so a naive
1761+
``isinstance(fill, float) and np.isnan(fill)`` check would let a
1762+
numpy-typed NaN slip through and land in ``_FillValue``.
1763+
"""
1764+
result = rasterize(
1765+
[(box(2, 2, 8, 8), 1.0)],
1766+
width=10, height=10, bounds=(0, 0, 10, 10),
1767+
fill=np.float32(np.nan),
1768+
)
1769+
assert 'nodata' not in result.attrs
1770+
assert '_FillValue' not in result.attrs
1771+
assert 'nodatavals' not in result.attrs
1772+
1773+
def test_like_non_dim_coords_propagated(self):
1774+
"""Non-dim coords on like (e.g. rioxarray's spatial_ref) carry over."""
1775+
like = _make_like()
1776+
# rioxarray attaches the CRS as a scalar non-dim coord
1777+
# called ``spatial_ref``. rasterize must propagate it.
1778+
like = like.assign_coords(spatial_ref=0)
1779+
like['spatial_ref'].attrs['crs_wkt'] = (
1780+
'GEOGCS["WGS 84",DATUM["WGS_1984"]]'
1781+
)
1782+
result = rasterize(
1783+
[(box(2, 2, 8, 8), 1.0)], like=like, fill=0,
1784+
)
1785+
assert 'spatial_ref' in result.coords
1786+
assert (result.coords['spatial_ref'].attrs.get('crs_wkt')
1787+
== 'GEOGCS["WGS 84",DATUM["WGS_1984"]]')
1788+
1789+
def test_geotiff_round_trip_preserves_fill(self, tmp_path):
1790+
"""rasterize → to_geotiff → read → nodata matches.
1791+
1792+
The user-visible motivation for #2018: a rasterized output
1793+
written to GeoTIFF must round-trip the fill sentinel so masks
1794+
and downstream readers identify the right nodata pixels.
1795+
"""
1796+
pytest.importorskip('tifffile')
1797+
from xrspatial.geotiff import to_geotiff, open_geotiff
1798+
1799+
result = rasterize(
1800+
[(box(2, 2, 8, 8), 1.0)],
1801+
width=10, height=10, bounds=(0, 0, 10, 10),
1802+
fill=-9999.0, dtype=np.float32,
1803+
)
1804+
path = str(tmp_path / 'rasterize_fill_round_trip.tif')
1805+
to_geotiff(result, path)
1806+
read_back = open_geotiff(path)
1807+
# GeoTIFF stores nodata as a scalar; either ``nodata`` or
1808+
# ``_FillValue`` should land on the read-back DataArray and
1809+
# match the fill we supplied.
1810+
nodata = (read_back.attrs.get('nodata')
1811+
or read_back.attrs.get('_FillValue'))
1812+
assert nodata is not None
1813+
assert float(nodata) == -9999.0

0 commit comments

Comments
 (0)