Skip to content

Commit 865de20

Browse files
Copilotbrendancol
andauthored
Merge origin/main into branch (resolve merge)
Agent-Logs-Url: https://github.com/xarray-contrib/xarray-spatial/sessions/5e4ca757-8a1b-4021-aa2e-a9e7eab80367 Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
1 parent 6a999af commit 865de20

7 files changed

Lines changed: 414 additions & 8 deletions

.claude/sweep-accuracy-state.csv

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

.claude/sweep-metadata-state.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
22
geotiff,2026-05-11,1598,MEDIUM,4,"read_vrt(band=N) used vrt.bands[0].nodata for attrs and integer-promotion mask, mis-masking band N reads with per-band sentinels (#1598)."
3+
geotiff,2026-05-11,1597,HIGH,4;5,read_geotiff_dask dropped nodata mask when sentinel only hit non-first chunks (per-chunk dtype divergence; #1597).
34
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.

CONTRIBUTING.md

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Contributing to Xarray-Spatial
1+
# Contributing to xarray-spatial
22

3-
As stated in [Xarray Spatial code of conduct](https://github.com/xarray-contrib/xarray-spatial/blob/main/CODE_OF_CONDUCT.md), a primary goal of Xarray Spatial is to be inclusive to the largest number of contributors. However, we do have some requests for how contributions should be made. Please read these guidelines before contributing to have a most positive experience with Xarray Spatial.
3+
As stated in the [xarray-spatial code of conduct](https://github.com/xarray-contrib/xarray-spatial/blob/main/CODE_OF_CONDUCT.md), a primary goal of xarray-spatial is to be inclusive to the largest number of contributors. However, we do have some requests for how contributions should be made. Please read these guidelines before contributing to have a most positive experience with xarray-spatial.
44

55
### Getting Started
66

@@ -28,7 +28,7 @@ In order to avoid duplication of effort, it's always a good idea to comment on a
2828

2929
1. Make sure that there is a corresponding issue for your change first. If there isn't yet, create one.
3030

31-
2. Create a fork of the Xarray Spatial repository on GitHub (this is only done before *first*) contribution.
31+
2. Create a fork of the xarray-spatial repository on GitHub (this is only done before *first*) contribution.
3232

3333
3. Create a branch off the `main` branch with a meaningful name. Preferably include issue number and a few keywords, so that we will have a rough idea what the branch refers to, without looking up the issue.
3434

@@ -43,6 +43,43 @@ In order to avoid duplication of effort, it's always a good idea to comment on a
4343
8. We will review your PR as time permits. Reviewers may comment on your contributions, ask you questions regarding the implementation or request changes. If changes are requested, push new commits to the existing branch. Do *NOT* rebase, amend, or cherry-pick published commits. Any of those actions will make us start the review from scratch. If you need updates from `main`, just merge it into your branch.
4444

4545

46+
### AI-Assisted Contribution and Review
47+
48+
xarray-spatial welcomes responsible AI-assisted development and review when it helps contributors move faster, improve quality, and reduce maintainer burden. The project has limited dedicated development capacity, so contributors are encouraged to use AI tools for testing, debugging, feature work, documentation, deployment, and release preparation.
49+
50+
Attribution and responsibility belong to human contributors. Do not add AI tools, coding agents, or AI companies as commit authors, co-authors, reviewers, or attribution lines in commits, pull requests, changelogs, or release notes.
51+
52+
Please avoid attribution such as:
53+
54+
Co-authored-by: Claude <...>
55+
Co-authored-by: ChatGPT <...>
56+
Generated with Claude Code
57+
Generated by Copilot
58+
Reviewed-by: <AI tool>
59+
60+
If AI-assisted code introduces a bug, security issue, regression, or incorrect behavior, the responsible party is the human contributor who submitted and reviewed the change. AI tools may assist the work, but they do not own the work.
61+
62+
#### Review Expectations
63+
64+
Before submitting a pull request, contributors should review and use the relevant commands in:
65+
66+
.claude/commands/
67+
68+
These command markdown files reflect current project expectations for performance, accuracy, security, testing, maintainability, deployment, and release readiness. New contributors should read through them to understand the project's quality standards.
69+
70+
Contributors who do not use Claude Code can adapt the command markdown files into prompts or checklists for their preferred AI tools and review workflows. The important part is the quality review they represent, not the specific tool used to run them.
71+
72+
Where applicable, run the relevant sweep or review commands before requesting maintainer review. If a command flags an issue, either address it or clearly explain why it is acceptable for the current change.
73+
74+
Copilot code review may run on pull requests as automated review assistance. Its feedback is useful, but it does not replace human responsibility or maintainer judgment.
75+
76+
#### Current Priority
77+
78+
The next sprint is focused on stabilizing xarray-spatial for its first major release. Contributors should prioritize bug fixes, reliability, tests, release blockers, and performance, accuracy, or security issues. New features should avoid adding significant release risk unless they are essential.
79+
80+
The preferred contribution style is fast, careful, and human-accountable: use AI aggressively to improve velocity and quality, but keep ownership and attribution with the people contributing to the project.
81+
82+
4683
### Attribution
4784

4885
Portions of text derived from [Bokeh CONTRIBUTING file]: (https://github.com/bokeh/bokeh/blob/branch-3.4/.github/CONTRIBUTING.md)

xrspatial/geotiff/__init__.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1641,13 +1641,23 @@ def read_geotiff_dask(source: str, *, dtype=None, chunks: int | tuple = 512,
16411641
# Translate window-relative chunk coords back to file-relative
16421642
# coords for ``read_to_array``. ``win_r0`` / ``win_c0`` are 0
16431643
# when no window was requested.
1644+
# Always thread ``target_dtype`` so each delayed chunk lands
1645+
# in the same dtype that the dask array declared. Without
1646+
# this, an integer raster with an in-range nodata sentinel
1647+
# would have ``effective_dtype=float64`` declared on the
1648+
# dask graph but per-chunk arrays only promoted when a
1649+
# chunk happened to contain a sentinel pixel. Dask
1650+
# concatenation then preallocates from the first chunk's
1651+
# actual dtype (uint16), silently casting later float64
1652+
# chunks back to int and converting their NaNs to 0. See
1653+
# issue #1597.
16441654
block = da.from_delayed(
16451655
_delayed_read_window(source,
16461656
r0 + win_r0, c0 + win_c0,
16471657
r1 + win_r0, c1 + win_c0,
16481658
overview_level, nodata,
16491659
band_arg,
1650-
target_dtype=target_dtype if dtype is not None else None,
1660+
target_dtype=target_dtype,
16511661
http_meta_key=http_meta_key,
16521662
max_pixels=max_pixels),
16531663
shape=block_shape,
@@ -2583,6 +2593,33 @@ def write_geotiff_gpu(data, path: str, *,
25832593
samples = arr.shape[2] if arr.ndim == 3 else 1
25842594
np_dtype = np.dtype(str(arr.dtype)) # cupy dtype -> numpy dtype
25852595

2596+
# Mirror the CPU writer's NaN-to-sentinel substitution (issue #1599).
2597+
# Without this step the GPU writer emits raw NaN bytes interleaved
2598+
# with valid data even when ``nodata=<finite>`` is supplied; the
2599+
# GDAL_NODATA tag still advertises the sentinel but external readers
2600+
# (rasterio / GDAL / QGIS) mask only on the sentinel value and
2601+
# therefore see the NaN pixels as valid data. The CPU writer does
2602+
# the equivalent rewrite at ``to_geotiff`` (lines around
2603+
# ``arr.copy(); arr[nan_mask] = arr.dtype.type(nodata)``); both
2604+
# paths must produce byte-equivalent files for the same input.
2605+
# We always copy before the in-place sentinel write. Some upstream
2606+
# branches above already produce a fresh buffer (``cupy.asarray``
2607+
# from numpy/dask, ``ascontiguousarray`` from the band-first
2608+
# moveaxis); others (a CuPy-backed DataArray taking the no-moveaxis
2609+
# path, or a plain CuPy positional ``data``) hand ``arr`` back as
2610+
# the caller's buffer. Rather than tracking provenance across that
2611+
# branch tree, copy unconditionally when we are about to mutate --
2612+
# the cost is one GPU array allocation, only on the NaN-present
2613+
# path, and it guarantees the CPU writer's defensive-copy semantics
2614+
# in every case.
2615+
if (nodata is not None
2616+
and np_dtype.kind == 'f'
2617+
and not np.isnan(float(nodata))):
2618+
nan_mask = cupy.isnan(arr)
2619+
if bool(nan_mask.any()):
2620+
arr = arr.copy()
2621+
arr[nan_mask] = np_dtype.type(nodata)
2622+
25862623
comp_tag = _compression_tag(compression)
25872624
pred_val = normalize_predictor(predictor, np_dtype, comp_tag)
25882625

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
"""Regression tests for issue #1597.
2+
3+
``read_geotiff_dask`` on an integer raster with an in-range nodata
4+
sentinel used to silently lose the mask when the sentinel only appeared
5+
in non-first chunks. Per-chunk dtype divergence (uint16 vs float64)
6+
caused dask concatenation to preallocate from the first chunk's actual
7+
dtype, casting float64 chunks back to int and converting NaN to 0.
8+
9+
The fix threads the resolved ``target_dtype`` (the dask graph's
10+
declared dtype) unconditionally through ``_delayed_read_window`` so
11+
every chunk lands as float64 regardless of whether its mask hit.
12+
"""
13+
from __future__ import annotations
14+
15+
import numpy as np
16+
import pytest
17+
18+
from xrspatial.geotiff import open_geotiff
19+
from xrspatial.geotiff._writer import write
20+
21+
22+
@pytest.fixture
23+
def uint16_with_sentinel_only_in_corner(tmp_path):
24+
"""Write a uint16 8x8 TIFF whose nodata sentinel is in the
25+
bottom-right 2x2 quadrant. With ``chunks=4`` the top-left chunk
26+
never sees a sentinel and used to keep its uint16 dtype.
27+
"""
28+
arr = np.arange(64, dtype=np.uint16).reshape(8, 8) + 1
29+
arr[6:8, 6:8] = 65535
30+
path = str(tmp_path / 'uint16_corner_sentinel_1597.tif')
31+
write(arr, path, nodata=65535, compression='none', tiled=False)
32+
return path, arr
33+
34+
35+
def test_eager_promotes_to_float64_and_masks(uint16_with_sentinel_only_in_corner):
36+
"""Baseline: the eager path produces float64 with 4 NaNs."""
37+
path, _ = uint16_with_sentinel_only_in_corner
38+
eager = open_geotiff(path)
39+
assert eager.dtype == np.float64
40+
assert np.isnan(eager.values).sum() == 4
41+
assert np.isnan(eager.values[6:8, 6:8]).all()
42+
43+
44+
def test_dask_chunks_4_matches_eager(uint16_with_sentinel_only_in_corner):
45+
"""The dask compute result matches the eager path bit-for-bit.
46+
47+
Before the fix this returned a uint16 array with 0s where the
48+
sentinel had been, because dask coerced the late-arriving float64
49+
chunk back to uint16 at concat time.
50+
"""
51+
path, _ = uint16_with_sentinel_only_in_corner
52+
eager = open_geotiff(path)
53+
dk = open_geotiff(path, chunks=4)
54+
assert dk.dtype == np.float64
55+
computed = dk.compute()
56+
assert computed.dtype == np.float64
57+
np.testing.assert_array_equal(np.isnan(computed.values),
58+
np.isnan(eager.values))
59+
finite = ~np.isnan(eager.values)
60+
np.testing.assert_array_equal(computed.values[finite],
61+
eager.values[finite])
62+
63+
64+
def test_dask_chunks_2_per_chunk_dtype_uniform(
65+
uint16_with_sentinel_only_in_corner):
66+
"""Every dask chunk returns float64 regardless of mask hit.
67+
68+
Iterates the delayed blocks and asserts each one computes to
69+
float64; the regression had the first chunk's actual data come back
70+
as uint16 because the mask never matched there.
71+
"""
72+
path, _ = uint16_with_sentinel_only_in_corner
73+
dk = open_geotiff(path, chunks=2)
74+
blocks = dk.data.to_delayed().flatten()
75+
for i, block in enumerate(blocks):
76+
chunk = block.compute()
77+
assert chunk.dtype == np.float64, (
78+
f"chunk {i} computed as {chunk.dtype}, expected float64; "
79+
f"per-chunk dtype divergence is the #1597 regression."
80+
)
81+
82+
83+
def test_dask_keeps_dtype_for_out_of_range_sentinel(tmp_path):
84+
"""Out-of-range sentinels (uint16 + nodata=-9999) stay uint16.
85+
86+
The fix should not regress #1581: when the sentinel cannot match
87+
any pixel, no float64 promotion is needed and the dask path keeps
88+
the file's native dtype.
89+
"""
90+
arr = np.array([[1, 2, 3, 4]] * 4, dtype=np.uint16)
91+
path = str(tmp_path / 'uint16_out_of_range_1597.tif')
92+
write(arr, path, nodata=-9999, compression='none', tiled=False)
93+
94+
dk = open_geotiff(path, chunks=2)
95+
assert dk.dtype == np.uint16
96+
result = dk.compute()
97+
assert result.dtype == np.uint16
98+
np.testing.assert_array_equal(result.values, arr)
99+
100+
101+
def test_dask_float_input_with_sentinel_in_one_chunk(tmp_path):
102+
"""Float rasters with sentinel in non-first chunk also stay float.
103+
104+
The float path doesn't promote dtype, but it does in-place NaN
105+
substitution. Verify the substitution holds for chunks with and
106+
without the sentinel.
107+
"""
108+
arr = np.arange(64, dtype=np.float32).reshape(8, 8) + 1
109+
arr[6:8, 6:8] = -9999.0
110+
path = str(tmp_path / 'float_corner_sentinel_1597.tif')
111+
write(arr, path, nodata=-9999, compression='none', tiled=False)
112+
113+
eager = open_geotiff(path)
114+
dk = open_geotiff(path, chunks=4).compute()
115+
np.testing.assert_array_equal(np.isnan(dk.values),
116+
np.isnan(eager.values))

0 commit comments

Comments
 (0)