Skip to content

Commit 23c041c

Browse files
committed
Update benchmark with warm-kernel numbers (#1045)
Separated reproject-only from full-pipeline timing. With warm Numba/CUDA kernels: - CuPy reproject: 73ms (2.0x faster than rioxarray) - rioxarray reproject: 144ms - NumPy reproject: 413ms Full pipeline (read+reproject+write) is dominated by I/O for compressed GeoTIFFs, where rioxarray's C-level rasterio beats our Python/Numba reader. Added note about ~4.5s JIT warmup on first call.
1 parent aa35aea commit 23c041c

File tree

1 file changed

+12
-9
lines changed

1 file changed

+12
-9
lines changed

benchmarks/reproject_benchmark.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,18 @@ dem_merc = reproject(dem, 'EPSG:3857')
2121
write_geotiff(dem_merc, 'output.tif')
2222
```
2323

24-
| Backend | End-to-end time | Notes |
25-
|:--------|----------------:|:------|
26-
| NumPy | 2,723 ms | Single-threaded Numba JIT resampling |
27-
| CuPy GPU | 348 ms | CUDA kernel for coordinate transform + resampling |
28-
| Dask+CuPy GPU | 343 ms | Chunked (512) GPU pipeline |
29-
| Dask (CPU) | 10,967 ms | Chunked (512) with Dask scheduler overhead |
30-
| rioxarray (GDAL) | 418 ms | C-level warp, highly optimized |
31-
32-
The GPU path (CuPy or Dask+CuPy) is the fastest option for large rasters, running slightly faster than GDAL. The NumPy path is slower due to Python/Numba overhead in the resampling loop. The Dask CPU path has significant scheduler overhead for this single-file workload.
24+
All times measured with warm Numba/CUDA kernels (first call incurs ~4.5s JIT compilation).
25+
26+
| Backend | End-to-end | Reproject only | vs rioxarray (reproject) |
27+
|:--------|----------:|--------------:|:------------------------|
28+
| CuPy GPU | 747 ms | 73 ms | **2.0x faster** |
29+
| Dask+CuPy GPU | 782 ms | ~80 ms | ~1.8x faster |
30+
| rioxarray (GDAL) | 411 ms | 144 ms | 1.0x |
31+
| NumPy | 2,907 ms | 413 ms | 0.3x |
32+
33+
The CuPy reproject is 2x faster than rioxarray for the coordinate transform + resampling. The end-to-end gap is due to I/O: rioxarray uses rasterio's C-level compressed read/write, while our geotiff reader is pure Python/Numba. For reproject-only workloads (data already in memory), CuPy is the clear winner.
34+
35+
**Note on JIT warmup**: The first `reproject()` call compiles the Numba kernels (~4.5s). All subsequent calls run at full speed. For long-running applications or batch processing, this is amortized over many calls.
3336

3437
---
3538

0 commit comments

Comments
 (0)