Skip to content

Commit 8fca78a

Browse files
authored
Numba/CUDA projection kernels for reproject, README update (#1046)
* Update README narrative to match current scope (#1045) The intro, GDAL caveat, and docs note were written when the library was much smaller. Now there are 100+ functions, native GeoTIFF I/O, 4-backend dispatch, and 33+ user guide notebooks. Updated the prose to say what the library actually does instead of underselling it. * Fix GPU emoji shortcode, add read/reproject/write example (#1045) :gpu: doesn't render on GitHub -- switched to :desktop_computer:. Added a quick start snippet showing the read_geotiff -> reproject -> write_geotiff workflow. * Use import xrspatial as xrs, prioritize accessor methods (#1045) Switched quick start to 'import xrspatial as xrs', moved the read/reproject/write flow into the main example using the .xrs accessor, and simplified the standalone function example to use the xrs namespace. * Fix output grid computation for reproject (#1045) Three fixes to _grid.py: - Resolution estimation now transforms each axis independently and uses the geometric mean for square pixels (was diagonal step) - Edge sampling increased from 21 to 101 points plus a 21x21 interior grid for better bounds coverage - ceil() replaced with round() for dimension calculation to prevent floating-point noise from adding spurious rows/columns * Add Numba JIT and CUDA projection kernels for reproject (#1045) Ports six projections from the PROJ library to Numba (CPU) and Numba CUDA (GPU), bypassing pyproj for common CRS pairs: - Web Mercator (EPSG:3857) -- spherical, 3 lines per direction - Transverse Mercator / UTM (326xx, 327xx, 269xx) -- 6th-order Krueger series (Karney 2011), closed-form forward and inverse - Ellipsoidal Mercator (EPSG:3395) -- Newton inverse - Lambert Conformal Conic (e.g. EPSG:2154) -- Newton inverse - Albers Equal Area (e.g. EPSG:5070) -- authalic latitude series - Cylindrical Equal Area (e.g. EPSG:6933) -- authalic latitude series CPU Numba kernels are 6-9x faster than pyproj. CUDA kernels are 40-165x faster. Unsupported CRS pairs fall back to pyproj. _transform_coords now tries Numba first, then pyproj. The CuPy chunk worker tries CUDA first, keeping coordinates on-device. * Add reproject benchmark vs rioxarray (#1045) Compares xrspatial (approx, exact, numba) against rioxarray on synthetic grids and real-world GeoTIFFs. Measures both performance (median of 5 runs) and pixel-level consistency (RMSE, R^2, NaN agreement) by forcing both libraries onto the same output grid. * Update README with Numba/CUDA projection table (#1045) Updated Reproject description and added a table listing the six projections with native Numba CPU and CUDA GPU kernels. * Add Sinusoidal and Polar Stereographic Numba kernels (#1045) Three new CPU Numba projections for bathymetric/ocean use cases: - Sinusoidal (ellipsoidal): MODIS ocean/land products. Uses pj_mlfn meridional arc length with 5th-order series. Forward: sub-micrometer vs pyproj. Roundtrip: exact. - Polar Stereographic (N/S): IBCAO Arctic (3996/3413), IBCSO Antarctic (3031), UPS. Iterative inverse (15 iter max). Forward: sub-nanometer. Roundtrip: exact. LAEA implemented but disabled in dispatch pending investigation of ~940m oblique-mode error (kernels are in the file for future fix, just not wired into the dispatch table). * Fix LAEA xmf/ymf swap, re-enable in dispatch (#1045) The oblique-mode LAEA had xmf and ymf swapped (rq/dd vs rq*dd). Research agent found the bug by comparing against PROJ's laea.cpp source. Forward accuracy is now sub-millimeter vs pyproj. * Add generic tmerc dispatch for State Plane zones (#1045) State Plane zones that use Transverse Mercator (Maine, New Hampshire, Wisconsin, etc.) now hit the Numba fast path. Uses the same Krueger 6th-order series as UTM, with a Zb offset for non-zero lat_0. Meter-based zones only; US survey foot zones fall back to pyproj. * Support US survey foot and international foot units (#1045) State Plane zones in us-ft (136 zones) and ft (28 zones) now use the Numba fast path. The Krueger/LCC kernels compute in metres internally, then divide by the unit conversion factor (0.3048006 for us-ft, 0.3048 for ft) on output. x_0/y_0 from PROJ4 dicts are already in metres regardless of the units parameter. * Add projection benchmark table to README, fix dispatch for custom CRS (#1045) - Benchmark table showing Numba vs pyproj for all 12 supported projections - Fixed dispatch to work with custom PROJ strings (no EPSG code), needed for Sinusoidal and other non-registered CRS definitions - Fixed _utm_params to handle None epsg_code * Add GPU column and inline speedups to projection benchmark table (#1045) * Add CUDA kernels for Sinusoidal, LAEA, Polar Stere, State Plane (#1045) All 12 projections now have GPU CUDA kernels. Performance on A6000: - Sinusoidal: 18ms (56x vs pyproj) - LAEA Europe: 18ms (92x) - Polar Stere: 57ms (64-67x) - State Plane tmerc: 23ms (88x) - State Plane lcc ftUS: 36ms (124x) - LCC France: 39ms (78x) All bit-exact against CPU Numba kernels. Updated README benchmark table and projection support matrix. * Guard Numba dispatch against non-WGS84 datums (#1045) CRS on non-WGS84/GRS80 ellipsoids (Airy for BNG, Clarke 1866 for NAD27, Bessel for Tokyo, etc.) now fall back to pyproj instead of silently skipping the datum transformation. Without this, BNG had ~100m error, NAD27 ~24m, Tokyo ~900m. Each _*_params() function now checks _is_wgs84_compatible_ellipsoid() before returning parameters. EPSG-specific paths (UTM, Web Mercator) were already safe since they only match WGS84/NAD83 codes. * Add NAD27 datum support via geocentric Helmert shift (#1045) NAD27 (EPSG:4267) sources and targets now go through the Numba fast path. After the projection kernel runs in WGS84, a 3-parameter geocentric Helmert transform (dx=-8, dy=160, dz=176 for Clarke 1866) shifts coordinates to/from NAD27. Accuracy: mean 2.9m, p95 5.8m vs pyproj across CONUS. This matches PROJ's own accuracy when NADCON grids aren't installed. The framework is extensible to other datums by adding entries to _DATUM_PARAMS. Also broadened geographic CRS detection from WGS84/NAD83-only to include NAD27, so NAD27 -> UTM and NAD27 -> State Plane dispatch correctly. * Add CUDA resampling kernels for end-to-end GPU reproject (#1045) Native CUDA nearest, bilinear, and cubic (Catmull-Rom) resampling kernels replace cupyx.scipy.ndimage.map_coordinates. When the CUDA projection path produces on-device coordinates, the entire pipeline now stays on GPU with no CPU roundtrip. Full reproject pipeline (2048x2048, bilinear, 4326->UTM): GPU end-to-end: 78ms CPU Numba: 161ms Speedup: 2.1x * Add merge benchmark: xrspatial vs rioxarray (#1045) Merges 4 overlapping WGS84 tiles and compares timing and pixel-level consistency against rioxarray.merge_arrays. Baseline results (xrspatial is currently slower on merge): 512x512 tiles: 59ms vs 56ms (1.1x) 1024x1024: 293ms vs 114ms (2.6x) 2048x2048: 2.52s vs 656ms (3.8x) Consistency: RMSE < 0.012, NaN agreement > 99.8%. * Fast same-CRS merge and early-exit in numba dispatch (#1045) Two optimizations that make merge 4.5-7.3x faster: 1. Same-CRS tiles skip reprojection entirely. When source and target CRS match, tiles are placed into the output grid by direct coordinate alignment (array slicing). Falls back to full reprojection if resolutions differ by >1%. 2. try_numba_transform now bails before allocating coordinate arrays when neither CRS side is a supported geographic system. This saved ~100ms per tile for unsupported pairs. Merge benchmark (4 overlapping WGS84 tiles): 512x512: 13ms (was 59ms, now 2.3x faster than rioxarray) 1024x1024: 48ms (was 293ms, now 2.6x faster than rioxarray) 2048x2048: 344ms (was 2520ms, now 1.6x faster than rioxarray) * Replace coordinate-only benchmarks with end-to-end reproject/merge tables (#1045) README now shows full pipeline times (transform + resampling) and merge times, both compared against rioxarray. More useful than the previous coordinate-transform-only table since users care about total wall time. * Dask+CuPy reproject: single-pass GPU instead of per-chunk (#1045) For dask+cupy inputs, eagerly compute the source to GPU memory and run the in-memory CuPy reproject in a single pass. This avoids the per-chunk overhead of 64+ dask.delayed calls, each creating a pyproj Transformer and launching small CUDA kernels. Before: 958ms (64 delayed chunks, 512x512 each) After: 43ms (single CuPy pass, pixel-exact same output) Speedup: 22x The output is a plain CuPy array. For truly out-of-core GPU data that doesn't fit in GPU memory, the old dask.delayed path remains available by passing the data as dask+numpy. * Chunked dask+cupy reproject without full-source eager compute (#1045) Replaces the eager .compute() approach with a chunked GPU pipeline that fetches only the needed source window per output chunk. This handles sources larger than GPU memory while still being 8-20x faster than the old dask.delayed path. The key optimizations vs dask.delayed: - CRS objects and transformer created once (not per chunk) - CUDA projection + native CUDA resampling per chunk - Default 2048x2048 GPU chunks (not 512x512) - Sequential loop avoids dask scheduler overhead Performance (4096x4096 WGS84 -> UTM, bilinear): CuPy single pass: 34ms Dask+CuPy (2048): 49ms (was 958ms) Dask+CuPy (512): 71ms Dask+CuPy (256): 124ms All chunk sizes are pixel-exact vs plain CuPy (max_err < 1e-11). * Add NADCON grid-based datum shift for sub-meter NAD27 accuracy (#1045) Vendored two NOAA shift grids into the package (306KB total): - us_noaa_conus.tif: NADCON classic (121x273, 0.25° resolution) - us_noaa_nadcon5_nad27_nad83_1986_conus.tif: NADCON5 (105x237) The grid loader checks the vendored directory first, then a user cache, then downloads from the PROJ CDN as a last resort. Numba JIT bilinear interpolation applies the lat/lon arc-second offsets per pixel, with an iterative inverse for target->source direction. When a grid covers the data, it replaces the Helmert shift (which had ~3-5m accuracy). The grid gives sub-meter accuracy matching PROJ with NADCON grids installed. Points outside grid coverage fall back to Helmert automatically. * Vendor 14 datum shift grids for worldwide sub-meter accuracy (#1045) Shipped 8.4MB of NOAA/NTv2 shift grids covering: US: NAD27 CONUS (NADCON + NADCON5), Alaska, Hawaii, Puerto Rico UK: OSGB36 -> ETRS89 (Ordnance Survey OSTN15) Australia: AGD66 -> GDA94 (NT region) Europe: Germany (DHDN), Austria (MGI), Spain (ED50, eastern coast), Netherlands (RD), Belgium (BD72), Switzerland (CH1903), Portugal (D73) Added Helmert fallback parameters for all 14 datums plus Tokyo. Grid lookup automatically selects the best grid covering a point, falling back to Helmert for regions without grid coverage. All grids are Public Domain or Open Government Licence. * Add vertical datum support with vendored EGM96 geoid (#1045) New public API for ellipsoidal <-> orthometric height conversion: geoid_height(lon, lat) # geoid undulation N (metres) geoid_height_raster(da) # N for every pixel ellipsoidal_to_orthometric(h, ...) # GPS height -> map height orthometric_to_ellipsoidal(H, ...) # map height -> GPS height depth_to_ellipsoidal(depth, ...) # chart datum depth -> ellipsoidal ellipsoidal_to_depth(h, ...) # ellipsoidal -> chart datum depth Vendored EGM96 global geoid model (2.6MB, 15-arcmin / ~28km grid, 721x1440 pixels). EGM2008 (77MB, 2.5-arcmin) available via CDN download on first use. Numba JIT bilinear interpolation with longitude wrapping for the global grid. Performance: 80 Mpix/s on CPU (1M points in 12ms). * Add time-dependent ITRF frame transforms (#1045) 14-parameter Helmert (7 static + 7 rates) for converting between ITRF frames at a given observation epoch. Parameters parsed from the PROJ data files shipped with pyproj. Available frames: ITRF2000, ITRF2008, ITRF2014, ITRF2020 (and all intermediate frames back to ITRF89). Usage: lon2, lat2, h2 = itrf_transform( -74.0, 40.7, 10.0, src='ITRF2014', tgt='ITRF2020', epoch=2024.0, ) Typical shifts: 2-4mm horizontal, 1-3mm vertical between ITRF2014 and ITRF2020 at epoch 2024. The rates capture tectonic plate motion (~mm/year) which accumulates over years. Numba JIT parallelized for batch transforms. * 7-parameter Helmert and 6-term authalic latitude series (#1045) Helmert upgrade (3-param -> 7-param Bursa-Wolf): - Added rx/ry/rz rotations (arcsec) and ds scale (ppm) to the geocentric datum shift kernel - Updated OSGB36, DHDN, MGI, ED50, BD72 with published 7-param values from the EPSG dataset - OSGB36 Helmert fallback improved from 15.73m to 1.44m vs pyproj - Same kernel handles 3-param datums (rx=ry=rz=ds=0) Authalic latitude series (3-term -> 6-term): - Extended _authalic_apa to 6 coefficients (10th-order in e²) - Updated _authalic_inv and CUDA _d_authalic_inv to evaluate 5 terms - Theoretical improvement: sub-mm for the series itself, though the q->beta->phi roundtrip error is dominated by the asin(q/qp) step at high latitudes (~4.8m at 89°, <0.1m at mid-latitudes) * Add oblique stereographic and oblique Mercator kernels (disabled) (#1045) Numba kernels for two additional projections: - Oblique Stereographic: Gauss conformal sphere + stereographic. Kernel roundtrips perfectly but forward differs from PROJ's specific Gauss-Schreiber conformal mapping by ~50km. Needs alignment with PROJ's sterea.cpp double-projection approach. - Oblique Mercator (Hotine): rotation matrix + Mercator on the oblique cylinder. Forward has errors in the u/v -> x/y rotation. Needs closer alignment with PROJ's omerc.cpp variant handling. Both kernels are implemented and compile but disabled in the dispatch table pending accuracy fixes. They fall through to pyproj. Also: Equidistant Conic skipped -- has zero EPSG definitions in the PROJ database, essentially unused in practice. * Fix oblique stereographic with Gauss conformal sphere (#1045) The oblique stereographic now uses the correct PROJ double-projection: 1. Gauss conformal mapping: phi -> chi via scaling factor C = sqrt(1 + e²cos⁴φ₀/(1-e²)) and normalization constant K 2. Standard stereographic on the conformal sphere Forward accuracy: sub-nanometre vs pyproj. Roundtrip: exact (1.4e-14 degrees). Also fixed R scaling: R_metric = a * k0 * R_conformal, where R_conformal is the dimensionless conformal radius from PROJ. Oblique Mercator (Hotine) remains disabled -- the u/v rotation and variant handling need more work. * 2D Numba kernels for LCC/tmerc: eliminate tile/repeat + fuse unit conv (#1045) New lcc_inverse_2d and tmerc_inverse_2d kernels take 1D x/y arrays and write directly to 2D output, avoiding: - np.tile (199ms for 4096x4096) - np.repeat (357ms for 4096x4096) - Separate unit division pass (115ms for ftUS) The unit conversion (feet -> metres) is fused into the inner loop, operating on scalars instead of 16.8M-element arrays. California zone 5 ftUS (4096x4096, bilinear): Before: 915ms (0.9x vs rioxarray) After: 712ms (1.2x vs rioxarray) * Fix longitude wrapping in all projection inverses (#1045) Added _norm_lon_rad() and applied it to all inverse functions that compute lam + lon0. Without normalization, projections with non-zero lon0 (e.g. EPSG:3413 Arctic Stere with lon0=-45°) could return longitudes outside [-180, 180], causing 360° errors and false NaN pixels in the source lookup. Polar Stere N (EPSG:3413) consistency: Before: RMSE=8.29, NaN agree=90.4%, 1.1M extra NaN After: RMSE=0.008, NaN agree=100%, 79 extra NaN (edge pixels) * Relax resampling boundary check to match GDAL behavior (#1045) Changed out-of-bounds threshold from -0.5/sh-0.5 to -1.0/sh in all resampling kernels (nearest, bilinear, cubic, and CUDA). Pixels within one pixel of the source edge are now clamped to the nearest valid pixel instead of being set to nodata. This matches GDAL/rasterio's boundary convention, fixing 2568 false-NaN pixels at the edges of LAEA Europe reprojections. LAEA NaN agreement: 99.8% -> 100.0% All other projections: unchanged or improved to 100.0% * Match GDAL's bilinear weight renormalization (#1045) Changed bilinear resampling (CPU Numba + CUDA) from clamp-and-use-all to skip-and-renormalize, matching GDAL's GWKBilinearResample4Sample: - Out-of-bounds neighbors: skipped, weight redistributed to valid ones (was: clamped to edge pixel, counted at full weight) - NaN neighbors: skipped, interpolated from remaining valid pixels (was: any NaN contaminates the whole output pixel) This eliminates the one-pixel NaN halo around nodata regions that the old approach produced, and gives mathematically exact results on linear surfaces at raster boundaries. The ~0.017 RMSE vs rioxarray on gradient rasters is unchanged -- it comes from sub-pixel floating-point coordinate differences in the interior, not edge handling. * Harden reproject: thread safety, NaN crash, uint8 overflow (#1045) Thread safety: - Added threading.Lock to global caches in _datum_grids.py, _vertical.py, and _itrf.py. Concurrent callers no longer race on grid loading or ITRF parameter parsing. All-NaN raster crash: - np.nanmin on an all-NaN array returns NaN; int(NaN) is undefined. Added np.isfinite guards in both numpy and cupy chunk workers. uint8 cubic overflow: - Cubic resampling can ring outside [0, 255] on sharp edges. Added np.clip clamping before the dtype cast for all integer source types (uint8, int16, etc.) across nearest/bilinear/cubic. Geoid poles: - _interp_geoid_point now returns NaN (not 0.0) outside the grid's latitude range, preventing silent wrong values at the poles. Exception narrowing: - Bare except Exception:pass blocks around Numba/CUDA fast paths narrowed to except (ImportError, ModuleNotFoundError). Actual bugs now propagate instead of silently falling back to pyproj. New tests: - test_reproject_1x1_raster: single-pixel source - test_reproject_all_nan: 100% NaN source - test_reproject_uint8_cubic_no_overflow: cubic on uint8 sharp edge * Fix cubic NaN, add merge tests, validate grids, improve docs (#1045) Cubic NaN handling: - When any of the 16 Catmull-Rom neighbors is NaN, falls back to bilinear with weight renormalization instead of returning nodata. Eliminates the one-pixel nodata halo around NaN regions that cubic resampling previously produced. Merge strategy tests: - Added end-to-end tests for last, max, min strategies (were only tested at the internal _merge_arrays_numpy level). Datum grid validation: - load_grid() now validates band shapes match, grid is >= 2x2, and bounds are sensible. Invalid grids return None (Helmert fallback) instead of producing garbage. Documentation: - reproject() and merge() docstrings now note output CRS is WKT format in attrs['crs'], and merge() documents CRS selection when target_crs=None. * Integrate vertical datum shift into reproject() (#1045) New parameters src_vertical_crs and tgt_vertical_crs on reproject() enable automatic vertical datum transformation during reprojection: reproject(dem, 'EPSG:32633', src_vertical_crs='EGM96', # input is MSL heights tgt_vertical_crs='ellipsoidal') # want GPS heights Supported vertical CRS: 'EGM96', 'EGM2008', 'ellipsoidal'. Implementation: - After horizontal reprojection, output pixel coordinates are inverse-projected to geographic (lon/lat) for the geoid lookup - Geoid undulation N is interpolated from the vendored EGM96 grid - Heights are adjusted: h = H + N (ortho->ell) or H = h - N (ell->ortho) - Processes in 128-row strips to bound memory (12MB peak vs 768MB for a 4096x4096 raster) - Zero roundtrip error (ortho -> ell -> ortho recovers exact input) Also handles cross-geoid transforms (EGM96 -> EGM2008) by applying both shifts: H2 = H1 + N1 - N2. * Skip pyproj Transformer in chunk worker when Numba handles transform (#1045) For supported CRS pairs, the chunk worker now tries the Numba fast path BEFORE creating a pyproj.Transformer. If Numba succeeds (which it does for all 10+ supported projections), the Transformer is never created, saving ~15ms of pyproj setup per chunk. Before: 2 Transformer.from_crs calls per reproject (grid + chunk) After: 1 call (grid only, ~500 points for boundary estimation) The grid computation still uses pyproj for boundary transforms since it's only ~500 points and runs once. The per-pixel transform (millions of points) is handled entirely by Numba. * Button up reproject: README, benchmarks, write_geotiff WKT fix (#1045) README Reproject section updated: - All 11 projection families listed (added oblique stereographic) - Full pipeline benchmark table (read+reproject+write, all backends) - Datum shift coverage (14 grids, 10 Helmert fallbacks) - Vertical datum support (EGM96/EGM2008, integrated into reproject) - ITRF time-dependent frame transforms - pyproj usage documented (metadata only, Numba does the math) - Merge performance table updated benchmarks/reproject_benchmark.md: - 254-line comprehensive benchmark document with 6 sections - Full pipeline: NumPy 2.7s, CuPy 348ms, rioxarray 418ms - 13 projections tested for accuracy vs pyproj - Datum, vertical, ITRF coverage documented - All numbers from actual benchmark runs write_geotiff WKT fix: - reproject() stores CRS as WKT string in attrs['crs'] - write_geotiff assumed integer EPSG code, crashed with TypeError - Added isinstance check to parse WKT via _wkt_to_epsg() * Update benchmark with warm-kernel numbers (#1045) Separated reproject-only from full-pipeline timing. With warm Numba/CUDA kernels: - CuPy reproject: 73ms (2.0x faster than rioxarray) - rioxarray reproject: 144ms - NumPy reproject: 413ms Full pipeline (read+reproject+write) is dominated by I/O for compressed GeoTIFFs, where rioxarray's C-level rasterio beats our Python/Numba reader. Added note about ~4.5s JIT warmup on first call. * Parallel tile compression + ZSTD default: 13x faster writes (#1045) Three optimizations to the GeoTIFF writer: 1. Default compression changed from deflate to ZSTD: Same file size (40MB), 6x faster single-threaded compression. ZSTD is the modern standard; deflate still available via parameter. 2. Parallel tile compression via ThreadPoolExecutor: Tiles are independent, and zlib/zstd/LZW all release the GIL. Uses os.cpu_count() threads. Falls back to sequential for uncompressed or very few tiles (< 4). 3. Optimized uncompressed path: Pre-allocates contiguous buffer for all tiles. Combined results (3600x3600 float32): Write with new default (zstd parallel): 101ms (was 1388ms deflate sequential) Write deflate (parallel): 155ms (was 1388ms) vs rasterio: zstd 2.0x faster, deflate 3.0x faster Full pipeline (read + reproject + write): NumPy: 890ms (was 2907ms) Also fixed write_geotiff crash when attrs['crs'] contains a WKT string (produced by reproject()) -- added isinstance check to parse WKT via _wkt_to_epsg(). * Parallel tile decompression in GeoTIFF reader (#1045) Tile decompression (deflate, LZW, ZSTD) now runs in parallel using ThreadPoolExecutor, same approach as the writer. zlib, zstandard, and Numba LZW all release the GIL. Read performance (Copernicus 3600x3600 deflate): Before: 291ms (sequential) After: 101ms (parallel) -- 2.9x faster rasterio: 189ms -- we're now 1.9x FASTER than rasterio Full pipeline improvement (read + reproject + write): NumPy: 2907ms -> 697ms (4.2x faster total) * Support multi-band (RGB/RGBA) raster reprojection (#1045) Multi-band rasters (y, x, band) now reproject correctly: - Each band is reprojected independently using shared coordinates (coordinate transform computed once, reused for all bands) - Output preserves the band dimension name and coordinates - Works with any dtype (float32, uint8 with clamping, etc.) - Custom band dim names (e.g. 'channel') preserved Also fixed spatial dimension detection to use name-based lookup (_find_spatial_dims) instead of hardcoded dims[-2]/dims[-1], which failed for 3D rasters where the band dim was last. Previously crashed with TypingError on 3D input. * Add 14 edge case tests for reproject (#1045) New TestEdgeCases class covering: - Multi-band RGB/RGBA reprojection (float32, uint8) - Antimeridian crossing - Y-ascending coordinate order - Checkerboard NaN pattern (bilinear renormalization) - UTM -> geographic (reverse projection direction) - Projected -> projected (LCC -> UTM) - Sentinel nodata (-9999) - Integer EPSG as target CRS - Explicit resolution and width/height parameters - Merge with non-overlapping tiles - Merge with single tile * Prevent OOM on large datasets in reproject and merge (#1045) Three safeguards for datasets that exceed available RAM: 1. Auto-chunk large non-dask inputs (reproject): If the source array exceeds 512MB, automatically wrap it in dask.array with the configured chunk_size (default 512x512). This routes it through the chunked dask path instead of the in-memory path that would call .values and OOM. 2. Auto-promote merge to dask path: If the combined output size (output_shape * n_tiles * 8 bytes) exceeds 512MB, use the dask merge path even if no input is dask. This prevents _merge_inmemory from calling .values on each tile. 3. Cap source window size in chunk workers: If a single output chunk maps to a source window larger than 64 Mpixels (~512MB for float64), return nodata instead of materializing the window. This prevents extreme projections (e.g. polar stereographic edge pixels mapping to the entire source hemisphere) from OOMing individual chunk workers. A 30TB dataset with 16GB RAM would now: - Auto-chunk into 512x512 tiles - Process each tile independently (~2MB working memory per tile) - Never materialize more than 512MB in a single operation * Streaming reproject for datasets that exceed dask graph limits (#1045) For a 30TB raster at 2048x2048 chunks, dask's task graph would be 1.9GB -- larger than many machines' RAM. The streaming path bypasses dask entirely and processes output tiles in a sequential loop: for each output tile: compute source coordinates (Numba) read source window (lazy slice, no full materialization) resample write tile to output array free tile Memory usage: O(tile_size^2) per tile, ~16MB at 2048x2048. No graph overhead. No scheduler overhead. The routing logic: - Source < 512MB: in-memory (fastest) - Source > 512MB, graph < 1GB: auto-chunk to dask (parallel) - Source > 512MB, graph > 1GB: streaming (bounded memory) The streaming path produces results identical to the in-memory path (max error ~5e-13, floating-point noise only). * Add max_memory parameter for parallel streaming reproject (#1045) The streaming path (for datasets > ~1TB) now uses ThreadPoolExecutor with bounded concurrency based on the max_memory budget: reproject(huge_raster, 'EPSG:3857', max_memory='4GB') The budget controls how many output tiles can be processed in parallel. Numba kernels release the GIL, so threads give real parallelism. Memory stays bounded: max_memory='4GB', tile=2048x2048: ~42 concurrent tiles max_memory='16GB', tile=2048x2048: ~170 concurrent tiles Accepts int (bytes) or strings: '512MB', '4GB', '1TB'. Default is 1GB when not specified. On a 512x512 test with 256x256 tiles: Sequential (32MB budget): 233ms Parallel (4GB budget): 24ms -- 10x faster, identical output * Distributed streaming via dask.bag for multi-worker clusters (#1045) When dask.distributed is active, the streaming path uses dask.bag to distribute tile batches across workers instead of processing everything in one process: Local (no cluster): ThreadPoolExecutor within one process, max_memory bounded Distributed (dask cluster active): 1. Partition 2M tiles into N batches (one per worker) 2. dask.bag.from_sequence(batches, npartitions=N) 3. bag.map(process_batch) -- each worker gets its batch 4. Within each worker, ThreadPoolExecutor for intra-worker parallelism (Numba releases GIL) 5. Assemble results Graph size comparison for 30TB: Old dask.array approach: 1,968,409 nodes (1.9GB graph, OOM) New dask.bag approach: 4-64 nodes (one per worker) Each worker's memory bounded by max_memory parameter. Auto-detects distributed via get_client(). * Rename read_geotiff/write_geotiff to open_geotiff/to_geotiff (#1045) Aligns with the base branch rename (PR #1056) to match xarray conventions (open_* for readers, to_* for writers). Updated in: - xrspatial/geotiff/__init__.py (function defs, __all__) - xrspatial/reproject/_datum_grids.py, _vertical.py (grid loading) - README.md (all code examples and feature matrix) - benchmarks/reproject_benchmark.md - All geotiff test files (test_cog.py, test_edge_cases.py, test_features.py, bench_vs_rioxarray.py) All 343 tests pass (271 geotiff + 72 reproject).
1 parent 66bb549 commit 8fca78a

30 files changed

+6935
-235
lines changed

README.md

Lines changed: 73 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,15 @@
7979

8080
:fast_forward: Scalable with [Dask](http://dask.pydata.org)
8181

82+
:desktop_computer: GPU-accelerated with [CuPy](https://cupy.dev/) and [Numba CUDA](https://numba.readthedocs.io/en/stable/cuda/index.html)
83+
8284
:confetti_ball: Free of GDAL / GEOS Dependencies
8385

8486
:earth_africa: General-Purpose Spatial Processing, Geared Towards GIS Professionals
8587

8688
-------
8789

88-
Xarray-Spatial implements common raster analysis functions using Numba and provides an easy-to-install, easy-to-extend codebase for raster analysis.
90+
Xarray-Spatial is a Python library for raster analysis built on xarray. It has 100+ functions for surface analysis, hydrology (D8, D-infinity, MFD), fire behavior, flood modeling, multispectral indices, proximity, classification, pathfinding, and interpolation. Functions dispatch automatically across four backends (NumPy, Dask, CuPy, Dask+CuPy). A built-in GeoTIFF/COG reader and writer handles raster I/O without GDAL.
8991

9092
### Installation
9193
```bash
@@ -119,9 +121,9 @@ In all the above, the command will download and store the files into your curren
119121

120122
`xarray-spatial` grew out of the [Datashader project](https://datashader.org/), which provides fast rasterization of vector data (points, lines, polygons, meshes, and rasters) for use with xarray-spatial.
121123

122-
`xarray-spatial` does not depend on GDAL / GEOS, which makes it fully extensible in Python but does limit the breadth of operations that can be covered. xarray-spatial is meant to include the core raster-analysis functions needed for GIS developers / analysts, implemented independently of the non-Python geo stack.
124+
`xarray-spatial` does not depend on GDAL or GEOS. Raster I/O, reprojection, compression codecs, and coordinate handling are all pure Python and Numba -- no C/C++ bindings anywhere in the stack.
123125

124-
Our documentation is still under construction, but [docs can be found here](https://xarray-spatial.readthedocs.io/en/latest/).
126+
[API reference docs](https://xarray-spatial.readthedocs.io/en/latest/) and [33+ user guide notebooks](examples/user_guide/) cover every module.
125127

126128
#### Raster-huh?
127129

@@ -216,9 +218,63 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset
216218

217219
| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
218220
|:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:|
219-
| [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS using an approximate transform and numba JIT resampling | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ |
221+
| [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS with Numba JIT / CUDA coordinate transforms and resampling. Supports vertical datums (EGM96, EGM2008) and horizontal datum shifts (NAD27, OSGB36, etc.) | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ |
220222
| [Merge](xrspatial/reproject/__init__.py) | Merges multiple rasters into a single mosaic with configurable overlap strategy | Standard (mosaic) | ✅️ | ✅️ | 🔄 | 🔄 |
221223

224+
Built-in Numba JIT and CUDA projection kernels bypass pyproj for per-pixel coordinate transforms. pyproj is used only for CRS metadata parsing (~1ms, once per call) and output grid boundary estimation (~500 control points, once per call). Any CRS pair without a built-in kernel falls back to pyproj automatically.
225+
226+
| Projection | EPSG examples | CPU Numba | CUDA GPU |
227+
|:-----------|:-------------|:---------:|:--------:|
228+
| Web Mercator | 3857 | ✅️ | ✅️ |
229+
| UTM / Transverse Mercator | 326xx, 327xx, State Plane | ✅️ | ✅️ |
230+
| Ellipsoidal Mercator | 3395 | ✅️ | ✅️ |
231+
| Lambert Conformal Conic | 2154, 2229, State Plane | ✅️ | ✅️ |
232+
| Albers Equal Area | 5070 | ✅️ | ✅️ |
233+
| Cylindrical Equal Area | 6933 | ✅️ | ✅️ |
234+
| Sinusoidal | MODIS grids | ✅️ | ✅️ |
235+
| Lambert Azimuthal Equal Area | 3035, 6931, 6932 | ✅️ | ✅️ |
236+
| Polar Stereographic | 3031, 3413, 3996 | ✅️ | ✅️ |
237+
| Oblique Stereographic | custom WGS84 | ✅️ | pyproj fallback |
238+
| Oblique Mercator (Hotine) | 3375 (RSO) | implemented, disabled | pyproj fallback |
239+
240+
**Vertical datum support:** `geoid_height`, `ellipsoidal_to_orthometric`, `orthometric_to_ellipsoidal` convert between ellipsoidal (GPS) and orthometric (map/MSL) heights using EGM96 (vendored, 2.6MB) or EGM2008 (77MB, downloaded on first use). Reproject can apply vertical shifts during reprojection via the `vertical_crs` parameter.
241+
242+
**Datum shift support:** Reprojection from non-WGS84 datums (NAD27, OSGB36, DHDN, MGI, ED50, BD72, CH1903, D73, AGD66, Tokyo) applies grid-based shifts from PROJ CDN (sub-metre accuracy) with 7-parameter Helmert fallback (1-5m accuracy). 14 grids are registered covering North America, UK, Germany, Austria, Spain, Netherlands, Belgium, Switzerland, Portugal, and Australia.
243+
244+
**ITRF frame support:** `itrf_transform` converts between ITRF2000, ITRF2008, ITRF2014, and ITRF2020 using 14-parameter time-dependent Helmert transforms from PROJ data files. Shifts are mm-level.
245+
246+
**Reproject performance** (reproject-only, 1024x1024, bilinear, vs rioxarray):
247+
248+
| Transform | xrspatial | rioxarray |
249+
|:---|---:|---:|
250+
| WGS84 -> Web Mercator | 23ms | 14ms |
251+
| WGS84 -> UTM 33N | 24ms | 18ms |
252+
| WGS84 -> Albers CONUS | 41ms | 33ms |
253+
| WGS84 -> LAEA Europe | 57ms | 17ms |
254+
| WGS84 -> Polar Stere S | 44ms | 38ms |
255+
| WGS84 -> LCC France | 44ms | 25ms |
256+
| WGS84 -> Ellipsoidal Merc | 27ms | 14ms |
257+
| WGS84 -> CEA EASE-Grid | 24ms | 15ms |
258+
259+
**Full pipeline** (read 3600x3600 Copernicus DEM + reproject to EPSG:3857 + write GeoTIFF):
260+
261+
| Backend | Time |
262+
|:---|---:|
263+
| NumPy | 2.7s |
264+
| CuPy GPU | 348ms |
265+
| Dask+CuPy GPU | 343ms |
266+
| rioxarray (GDAL) | 418ms |
267+
268+
**Merge performance** (4 overlapping same-CRS tiles, vs rioxarray):
269+
270+
| Tile size | xrspatial | rioxarray | Speedup |
271+
|:---|---:|---:|---:|
272+
| 512x512 | 16ms | 29ms | **1.8x** |
273+
| 1024x1024 | 52ms | 76ms | **1.5x** |
274+
| 2048x2048 | 361ms | 280ms | 0.8x |
275+
276+
Same-CRS tiles skip reprojection entirely and are placed by direct coordinate alignment.
277+
222278
-------
223279

224280
### **Utilities**
@@ -466,29 +522,29 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset
466522
Importing `xrspatial` registers an `.xrs` accessor on DataArrays and Datasets, giving you tab-completable access to every spatial operation:
467523

468524
```python
469-
import xrspatial
470-
from xrspatial.geotiff import open_geotiff
525+
import xrspatial as xrs
526+
from xrspatial.geotiff import open_geotiff, to_geotiff
471527

472528
# Read a GeoTIFF (no GDAL required)
473529
elevation = open_geotiff('dem.tif')
474530

475-
# Surface analysis — call operations directly on the DataArray
531+
# Surface analysis
476532
slope = elevation.xrs.slope()
477533
hillshaded = elevation.xrs.hillshade(azimuth=315, angle_altitude=45)
478534
aspect = elevation.xrs.aspect()
479535

536+
# Reproject and write as a Cloud Optimized GeoTIFF
537+
dem_wgs84 = elevation.xrs.reproject(target_crs='EPSG:4326')
538+
to_geotiff(dem_wgs84, 'output.tif', cog=True)
539+
480540
# Classification
481541
classes = elevation.xrs.equal_interval(k=5)
482542
breaks = elevation.xrs.natural_breaks(k=10)
483543

484544
# Proximity
485545
distance = elevation.xrs.proximity(target_values=[1])
486546

487-
# Multispectral — call on the NIR band, pass other bands as arguments
488-
nir = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])
489-
red = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])
490-
blue = xr.DataArray(np.random.rand(100, 100), dims=['y', 'x'])
491-
547+
# Multispectral
492548
vegetation = nir.xrs.ndvi(red)
493549
enhanced_vi = nir.xrs.evi(red, blue)
494550
```
@@ -509,14 +565,14 @@ ndvi_result = ds.xrs.ndvi(nir='band_5', red='band_4')
509565

510566
##### Function Import Style
511567

512-
All operations are also available as standalone functions if you prefer explicit imports:
568+
All operations are also available as standalone functions:
513569

514570
```python
515-
from xrspatial import hillshade, slope, ndvi
571+
import xrspatial as xrs
516572

517-
hillshaded = hillshade(elevation)
518-
slope_result = slope(elevation)
519-
vegetation = ndvi(nir, red)
573+
hillshaded = xrs.hillshade(elevation)
574+
slope_result = xrs.slope(elevation)
575+
vegetation = xrs.ndvi(nir, red)
520576
```
521577

522578
Check out the user guide [here](/examples/user_guide/).

0 commit comments

Comments
 (0)