Commit a82e7d0
committed
Dask+CuPy reproject: single-pass GPU instead of per-chunk (#1045)
For dask+cupy inputs, eagerly compute the source to GPU memory and
run the in-memory CuPy reproject in a single pass. This avoids the
per-chunk overhead of 64+ dask.delayed calls, each creating a pyproj
Transformer and launching small CUDA kernels.
Before: 958ms (64 delayed chunks, 512x512 each)
After: 43ms (single CuPy pass, pixel-exact same output)
Speedup: 22x
The output is a plain CuPy array. For truly out-of-core GPU data
that doesn't fit in GPU memory, the old dask.delayed path remains
available by passing the data as dask+numpy.1 parent 897c7b9 commit a82e7d0
1 file changed
+22
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
443 | 443 | | |
444 | 444 | | |
445 | 445 | | |
446 | | - | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
447 | 467 | | |
448 | 468 | | |
449 | 469 | | |
450 | 470 | | |
451 | 471 | | |
452 | | - | |
| 472 | + | |
453 | 473 | | |
454 | 474 | | |
455 | 475 | | |
| |||
0 commit comments