Skip to content

Commit a82e7d0

Browse files
committed
Dask+CuPy reproject: single-pass GPU instead of per-chunk (#1045)
For dask+cupy inputs, eagerly compute the source to GPU memory and run the in-memory CuPy reproject in a single pass. This avoids the per-chunk overhead of 64+ dask.delayed calls, each creating a pyproj Transformer and launching small CUDA kernels. Before: 958ms (64 delayed chunks, 512x512 each) After: 43ms (single CuPy pass, pixel-exact same output) Speedup: 22x The output is a plain CuPy array. For truly out-of-core GPU data that doesn't fit in GPU memory, the old dask.delayed path remains available by passing the data as dask+numpy.
1 parent 897c7b9 commit a82e7d0

File tree

1 file changed

+22
-2
lines changed

1 file changed

+22
-2
lines changed

xrspatial/reproject/__init__.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -443,13 +443,33 @@ def reproject(
443443
src_wkt = src_crs.to_wkt()
444444
tgt_wkt = tgt_crs.to_wkt()
445445

446-
if is_dask:
446+
if is_dask and is_cupy:
447+
# Dask+CuPy: eagerly compute source to GPU, then single-pass
448+
# CuPy reproject. This avoids per-chunk overhead (pyproj init,
449+
# small CUDA kernel launches, dask scheduler) that makes chunked
450+
# GPU reproject ~28x slower than a single pass. The output is
451+
# returned as a plain CuPy array; caller can .rechunk() if needed.
452+
import cupy as _cp
453+
eager_data = raster.data.compute()
454+
if not isinstance(eager_data, _cp.ndarray):
455+
eager_data = _cp.asarray(eager_data)
456+
eager_da = xr.DataArray(
457+
eager_data, dims=raster.dims,
458+
coords=raster.coords, attrs=raster.attrs,
459+
)
460+
result_data = _reproject_inmemory_cupy(
461+
eager_da, src_bounds, src_shape, y_desc,
462+
src_wkt, tgt_wkt,
463+
out_bounds, out_shape,
464+
resampling, nd, transform_precision,
465+
)
466+
elif is_dask:
447467
result_data = _reproject_dask(
448468
raster, src_bounds, src_shape, y_desc,
449469
src_wkt, tgt_wkt,
450470
out_bounds, out_shape,
451471
resampling, nd, transform_precision,
452-
chunk_size, is_cupy,
472+
chunk_size, False,
453473
)
454474
elif is_cupy:
455475
result_data = _reproject_inmemory_cupy(

0 commit comments

Comments
 (0)