Describe the bug
_delayed_read_window in xrspatial/geotiff/__init__.py:1871-1872 calls arr.astype(target_dtype) on every chunk. When target_dtype already equals arr.dtype (the common case for float source rasters), numpy.ndarray.astype still allocates a new buffer and copies because its default is copy=True.
PR #1601 widened the call site to always pass target_dtype so dask declared dtype and per-chunk dtype agree. That fixed #1597 for integer rasters with an in-range nodata sentinel, where the dask graph declared float64 but only chunks that hit the sentinel actually promoted, so concatenation cast later chunks back to int and clobbered NaN with 0. Correct fix, but the always-on cast now allocates a same-dtype copy on every chunk of every read.
Reproduction
import numpy as np, xarray as xr, tempfile, os
from xrspatial.geotiff import to_geotiff, read_geotiff_dask
import xrspatial.geotiff as gt
H, W = 1024, 1024
data = np.random.rand(H, W).astype(np.float32)
arr_in = xr.DataArray(data, dims=['y', 'x'],
coords={'y': np.arange(H), 'x': np.arange(W)})
tmp = tempfile.mkdtemp()
path = os.path.join(tmp, 'probe.tif')
to_geotiff(arr_in, path, compression='none')
orig = gt._delayed_read_window
trace = []
def patched(*args, **kwargs):
trace.append(kwargs.get('target_dtype'))
return orig(*args, **kwargs)
gt._delayed_read_window = patched
read_geotiff_dask(path, chunks=256)
# Every delayed call carries target_dtype=float32 even though no cast is needed.
assert all(t == np.dtype('float32') for t in trace if t is not None)
Expected behavior
Skip the cast when target_dtype == arr.dtype. A 30 TB float32 read with 100 MB chunks shouldn't pay an extra 100 MB allocation and memcpy per task for a no-op cast.
Fix
if target_dtype is not None and arr.dtype != target_dtype:
arr = arr.astype(target_dtype)
The #1597 fix still holds. The integer-mask branch above promotes arr to float64 in place when sentinels hit, so when target_dtype == float64 and arr.dtype == float64, the astype is a no-op; when target_dtype differs (caller-supplied dtype, or effective_dtype=float64 over an unmasked int chunk), the astype runs as before.
Context
Found during the 2026-05-11 geotiff performance sweep.
Describe the bug
_delayed_read_windowinxrspatial/geotiff/__init__.py:1871-1872callsarr.astype(target_dtype)on every chunk. Whentarget_dtypealready equalsarr.dtype(the common case for float source rasters),numpy.ndarray.astypestill allocates a new buffer and copies because its default iscopy=True.PR #1601 widened the call site to always pass
target_dtypeso dask declared dtype and per-chunk dtype agree. That fixed #1597 for integer rasters with an in-range nodata sentinel, where the dask graph declared float64 but only chunks that hit the sentinel actually promoted, so concatenation cast later chunks back to int and clobbered NaN with 0. Correct fix, but the always-on cast now allocates a same-dtype copy on every chunk of every read.Reproduction
Expected behavior
Skip the cast when
target_dtype == arr.dtype. A 30 TB float32 read with 100 MB chunks shouldn't pay an extra 100 MB allocation and memcpy per task for a no-op cast.Fix
The #1597 fix still holds. The integer-mask branch above promotes
arrto float64 in place when sentinels hit, so whentarget_dtype == float64andarr.dtype == float64, the astype is a no-op; whentarget_dtypediffers (caller-supplied dtype, oreffective_dtype=float64over an unmasked int chunk), the astype runs as before.Context
Found during the 2026-05-11 geotiff performance sweep.