Fix read_geotiff_dask losing nodata mask in non-first chunks (#1597)#1601
Merged
Merged
Conversation
For an integer raster paired with an in-range nodata sentinel, read_geotiff_dask declares the output dtype as float64 but per-chunk _delayed_read_window only promoted a chunk to float64 when that chunk's mask actually hit. With the sentinel in a non-first chunk the top-left chunk stayed uint16, and dask concatenation preallocated the result from the first chunk's dtype -- silently casting later float64 chunks back to uint16 and converting their NaNs to 0. Always thread the resolved target_dtype through _delayed_read_window so every chunk lands as float64. Out-of-range sentinels still keep the file's native dtype because effective_dtype already short-circuits that case earlier in read_geotiff_dask. Tests cover the sentinel-in-corner regression, per-chunk dtype uniformity, the out-of-range #1581 path, and float-raster parity.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness issue in the GeoTIFF dask reader where nodata masking could be silently lost when an in-range integer nodata sentinel only appears in non-first chunks, causing dask concatenation to coerce later chunks and convert NaN to 0.
Changes:
- Thread the resolved
target_dtypethrough each delayed chunk read to keep per-chunk dtypes consistent with the dask array’s declared dtype. - Add a regression test suite covering the non-first-chunk sentinel case, per-chunk dtype uniformity, out-of-range sentinel behavior, and float-raster parity.
- Update sweep metadata to track issue #1597 inspection state.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Ensures dask-delayed chunk reads receive the resolved target_dtype to prevent per-chunk dtype divergence that breaks nodata masking. |
xrspatial/geotiff/tests/test_dask_int_nodata_chunks_1597.py |
Adds regression coverage for the dtype/nodata issue when the sentinel is only present in non-first chunks. |
.claude/sweep-metadata-state.csv |
Updates internal sweep metadata to reference issue #1597. |
Comments suppressed due to low confidence (1)
xrspatial/geotiff/init.py:1665
- Passing
target_dtypeunconditionally means_delayed_read_windowwill now always runarr.astype(target_dtype)for every chunk (becausetarget_dtypeis never None). Sinceastypecopies even when the dtype is already correct, this can add a full extra copy per chunk for the common case where no dtype promotion is needed (e.g. no nodata masking, float rasters, or integer rasters without an in-range sentinel). Consider updating_delayed_read_windowto only cast whenarr.dtype != target_dtype(or useastype(..., copy=False)), so #1597 is fixed without introducing a per-chunk copy/perf regression.
block = da.from_delayed(
_delayed_read_window(source,
r0 + win_r0, c0 + win_c0,
r1 + win_r0, c1 + win_c0,
overview_level, nodata,
band_arg,
target_dtype=target_dtype,
http_meta_key=http_meta_key,
max_pixels=max_pixels),
shape=block_shape,
dtype=target_dtype,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This was referenced May 11, 2026
brendancol
added a commit
that referenced
this pull request
May 11, 2026
* Skip no-op astype in _delayed_read_window (#1624) After #1601 widened the call site to always pass target_dtype, every dask chunk ran arr.astype(target_dtype) even when arr.dtype already matched. numpy.astype defaults to copy=True and so allocated a same- dtype chunk-sized buffer and memcpy on every read. Gate the cast on a real dtype mismatch; the #1597 mask path still promotes uint to float64 inline so every chunk lands in the dask-declared dtype. Regression test in test_dask_no_op_astype_1624.py wraps read_to_array to capture an ndarray subclass that records astype calls, then asserts no same-dtype call survives the per-chunk return path. * Address review: drop stub test, exercise uint16 fixture array Copilot review on #1626 flagged two unused-variable issues. Fixes: - Delete test_float32_chunks_avoid_redundant_copy. It declared same_dtype_casts and wrapped _delayed_read_window without recording anything, so it only asserted output dtype -- which does not validate the #1624 regression. test_astype_skipped_when_dtypes_match already covers the same surface properly via an ndarray subclass that tracks astype calls; verified it fails on the pre-fix code with "Same-dtype astype still runs per chunk". - In test_uint16_mask_path_still_promotes, exercise the previously- unused arr fixture: assert that pixels equal to the 65535 sentinel become NaN and every other pixel matches arr.astype(float64). Anchors the test to fixture values so regressions in the mask branch surface here, not just as dtype drift.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1597.
Summary
read_geotiff_daskdeclared the dask array asfloat64but only promoted individual chunks tofloat64when the chunk actually contained a sentinel pixel. Chunks without a sentinel stayed at the file's integer dtype.RuntimeWarningabout an invalid cast).target_dtypeunconditionally through_delayed_read_windowkeeps every chunk's dtype consistent with the dask graph. Out-of-range sentinels still keep the file's native dtype becauseeffective_dtypeshort-circuits that case earlier inread_geotiff_dask.Test plan
test_dask_int_nodata_chunks_1597.pycovering the sentinel-in-corner case, per-chunk dtype uniformity, the out-of-range Reading uint TIFF with negative GDAL_NODATA sentinel raises OverflowError #1581 path, and float-raster parity.test_nodata_out_of_range_1581,test_attrs_parity_1548,test_vrt_int_nodata_1564,test_metadata_round_trip_1484,test_dtype_read, plus 657 non-GPU geotiff tests.