Commit c70cae4
authored
Add lightweight GeoTIFF/COG reader and writer (#1035)
* Add lightweight GeoTIFF/COG reader and writer
Reads and writes GeoTIFF and Cloud Optimized GeoTIFF files using only
numpy, numba, xarray, and the standard library. No GDAL required.
What it does:
- Parses TIFF/BigTIFF headers and IFDs via struct
- Reads GeoTIFF tags: CRS (EPSG), affine transforms, GeoKeys,
PixelIsArea vs PixelIsPoint
- Deflate (zlib), LZW (Numba JIT), horizontal and floating-point
predictor codecs
- Strip and tiled layouts, windowed reads
- COG writer with overview generation and IFD-first layout
- HTTP range-request reader for remote COGs
- mmap for local file access (zero-copy)
- Nodata sentinel masking to NaN on read
- Metadata round-trip: CRS, transform, nodata, raster type, pixels
Read performance vs rioxarray/GDAL:
- 5-7x faster on uncompressed data
- On par for compressed COGs
- 100% pixel-exact across all tested formats
Write performance:
- On par for uncompressed
- 2-4x slower for compressed (zlib is C-native in GDAL)
Tested against Landsat 8, Copernicus DEM, USGS 1-arc-second, and
USGS 1-meter DEMs. 154 tests cover codecs, header parsing, geo
metadata, round-trips across 8 dtypes x 3 compressions, edge cases
(corrupt files, NaN/Inf, extreme shapes, PixelIsPoint), and the
public API.
* Add multi-band write, integer nodata, PackBits, dask reads, BigTIFF write
Six features filling the main gaps for real-world use:
1. Multi-band write: 3D arrays (height, width, bands) now write as
multi-band GeoTIFFs with correct BitsPerSample, SampleFormat, and
PhotometricInterpretation (RGB for 3+ bands). Overviews work for
multi-band too. read_geotiff returns all bands by default (band=None)
with a 'band' dimension.
2. Integer nodata masking: uint8/uint16/int16 arrays with nodata values
are promoted to float64 and masked with NaN on read, matching
rioxarray behavior. Previously only float arrays were masked.
3. PackBits compression (tag 32773): simple RLE codec, both read and
write. Common in older TIFF files.
4. JPEG decompression (tag 7): read support via Pillow for
JPEG-compressed tiles/strips. Import is optional and lazy.
5. BigTIFF write: auto-detects when output exceeds ~4GB and switches
to BigTIFF format (16-byte header, 20-byte IFD entries, 8-byte
offsets). Prevents silent offset overflow corruption on large files.
6. Dask lazy reads: read_geotiff_dask() returns a dask-backed
DataArray using windowed reads per chunk. Works for single-band
and multi-band files with nodata masking per chunk.
178 tests passing.
* Skip unneeded strips in windowed reads
Strip-based windowed reads now only decompress strips that overlap the
requested row range. Previously, all strips were decompressed into a
full image buffer and then sliced.
For a 4096x512 deflate file with 256-row strips, reading a 10x10
window from the top-left goes from 31 ms to 1.9 ms (16x). On a
100,000-row file the savings scale linearly with the number of
strips skipped.
* Add ZSTD compression support (tag 50000)
Read and write Zstandard-compressed GeoTIFFs using the zstandard
package (lazy import, clear error if missing).
On a 2048x2048 float32 raster, ZSTD vs deflate:
- Write: 39 ms vs 420 ms (10.7x faster)
- Read: 14 ms vs 66 ms (4.7x faster)
- Size: 15.5 MB vs 15.5 MB (comparable)
9 new tests covering codec round-trips, stripped/tiled layouts,
uint16, predictor, multi-band, and the public API.
* Handle planar configuration (separate band planes) on read
TIFF PlanarConfiguration=2 stores each band as a separate set of
strips or tiles (RRR...GGG...BBB) instead of interleaved
(RGBRGB...). The reader now detects this from the IFD and iterates
band-by-band through the strip/tile offset array, placing each
single-band chunk into the correct slice of the output.
Both strip and tile layouts are handled. Windowed reads and single-
band selection work correctly with planar files.
6 new tests: planar strips (RGB, 2-band), planar tiles, windowed
read, band selection, and public API.
* Handle sub-byte bit depths: 1-bit, 2-bit, 4-bit, 12-bit
Adds read support for non-byte-aligned pixel data, common in bilevel
masks (1-bit), palette images (4-bit), and medical/scientific sensors
(12-bit).
Changes:
- _dtypes.py: Map 1/2/4-bit to uint8 and 12-bit to uint16
- _compression.py: Add unpack_bits() for MSB-first bit unpacking
at 1, 2, 4, and 12 bits per sample
- _reader.py: Add _decode_strip_or_tile() helper that handles the
full decompress -> predictor -> unpack -> reshape pipeline,
detecting sub-byte depths automatically. Both strip and tile
readers refactored to use it.
7 new tests: 1-bit bilevel, 1-bit non-byte-aligned width, 4-bit,
4-bit odd width, 12-bit, and direct codec tests.
* Add palette/indexed-color TIFF support with automatic colormap
Reads TIFF files with Photometric=3 (Palette) and ColorMap tag (320).
The TIFF color table (uint16 R/G/B arrays) is converted to a
matplotlib ListedColormap and stored in da.attrs['cmap'].
New plot_geotiff() convenience function uses the embedded colormap
with BoundaryNorm so that integer class indices map to the correct
palette colors when plotted. Works out of the box:
da = read_geotiff('landcover.tif')
plot_geotiff(da) # colors match the TIFF's palette
Also stores raw RGBA tuples in attrs['colormap_rgba'] for custom use.
Supports both 8-bit (256-color) and 4-bit (16-color) palettes.
5 new tests: 8-bit palette read, 4-bit palette, colormap object
verification, plot_geotiff smoke test, and non-palette attr check.
* Move palette plot to da.xrs.plot() accessor
The .xrs accessor (registered on all DataArrays by xrspatial) now has
a plot() method that checks for an embedded TIFF colormap in attrs.
If present, it applies BoundaryNorm with the ListedColormap so that
integer class indices map to the correct palette colors.
da = read_geotiff('landcover.tif')
da.xrs.plot() # palette colors applied automatically
For non-palette DataArrays, falls through to the standard da.plot().
The old plot_geotiff() function is kept as a thin wrapper.
* Thread-safe reads via reference-counted mmap cache
Multiple threads reading the same file now share a single read-only
mmap instead of each opening their own. A module-level _MmapCache
protected by a threading.Lock manages reference counts per file path.
The mmap is closed when the last reader releases it.
Read-only mmap slicing (which is what the strip/tile readers do) is
thread-safe at the OS level -- no seek or file position involved.
Tested with 16 concurrent threads reading different windows from the
same deflate+tiled file, and a stress test of 400 reads across 8
threads. Zero errors, cache drains properly.
For dask lazy reads, this means all chunk-read tasks for the same
file share one mmap instead of opening/closing the file per chunk.
* Atomic writes via temp file + os.replace
Writes now go to a temporary file in the same directory, then
os.replace() atomically swaps it over the target path. This gives:
- No interleaved output when multiple threads write the same path
- Readers never see a half-written file
- No corrupt file left behind if the process crashes mid-write
- Temp file cleaned up on any exception
os.replace is atomic on POSIX (single rename syscall) and
near-atomic on Windows (ReplaceFile).
* Add overview resampling options: nearest, min, max, median, mode, cubic
_make_overview() now accepts a method parameter instead of hardcoding
2x2 block averaging. Available methods:
- mean (default): nanmean of each 2x2 block, same as before
- nearest: top-left pixel of each block (no interpolation)
- min/max: nanmin/nanmax of each block
- median: nanmedian of each block
- mode: most frequent value per block (for classified rasters)
- cubic: scipy.ndimage.zoom with order=3 (requires scipy)
All methods work on both 2D and 3D (multi-band) arrays. Exposed via
overview_resampling= parameter on write() and write_geotiff().
12 new tests covering each method, NaN handling, multi-band, COG
round-trips with nearest and mode, the public API, and error on
invalid method names.
* Read and write resolution/DPI tags (282, 283, 296)
Adds support for TIFF resolution metadata used in print and
cartographic workflows:
- Tag 282 (XResolution): pixels per unit, stored as RATIONAL
- Tag 283 (YResolution): pixels per unit, stored as RATIONAL
- Tag 296 (ResolutionUnit): 1=none, 2=inch, 3=centimeter
Read: resolution values are stored in DataArray attrs as
x_resolution, y_resolution (float), and resolution_unit (string:
'none', 'inch', or 'centimeter').
Write: accepted as keyword args on write() and write_geotiff(),
or extracted automatically from DataArray attrs. Written as
RATIONAL tags (numerator/denominator pairs).
Also adds RATIONAL type serialization to the writer's tag encoder.
5 new tests: DPI round-trip, centimeter unit, no-resolution check,
DataArray attrs preservation, unit='none'.
* Expose full GeoKey metadata: CRS names, units, datum, ellipsoid, vertical CRS
GeoInfo and DataArray attrs now include all commonly-used GeoKeys
parsed from the GeoKeyDirectory:
- crs_name: full CRS name (e.g. "NAD83 / UTM zone 18N")
- geog_citation: geographic CRS name (e.g. "WGS 84", "NAD83")
- datum_code: EPSG geodetic datum code
- angular_units / angular_units_code: e.g. "degree" (9102)
- linear_units / linear_units_code: e.g. "metre" (9001)
- semi_major_axis, inv_flattening: ellipsoid parameters
- projection_code: EPSG projection method code
- vertical_crs, vertical_citation, vertical_units: for compound CRS
EPSG unit codes are resolved to human-readable names via lookup
tables (ANGULAR_UNITS, LINEAR_UNITS). Raw geokeys dict is still
available for anything not covered by a named field.
6 new tests covering geographic and projected CRS extraction,
real-file verification, no-CRS baseline, and unit lookups.
* Reuse HTTP connections via urllib3 pool for COG range requests
_HTTPSource now uses a module-level urllib3.PoolManager that reuses
TCP connections (and TLS sessions) across range requests to the same
host. For a COG with 64 tiles, this eliminates 63 TCP handshakes.
On localhost: 1.7x faster for 16 range requests. Over real networks
the benefit is much larger since each TLS handshake adds 50-200ms.
Falls back to stdlib urllib.request if urllib3 is not installed.
The pool is created lazily on first use with retry support
(2 retries, 0.1s backoff).
* Add WKT/PROJ CRS support via pyproj
CRS can now be specified as WKT strings, PROJ strings, or EPSG
integers. pyproj (lazy import) resolves between them:
Read side:
- crs_wkt attr is populated by resolving EPSG -> WKT via pyproj
- Falls back gracefully if pyproj is not installed (EPSG still works)
Write side:
- crs= parameter on write_geotiff accepts int (EPSG), WKT string,
or PROJ string. String inputs are resolved to EPSG via
pyproj.CRS.from_user_input().to_epsg().
- DataArray with crs_wkt attr (no integer crs) is also handled:
the WKT is resolved to EPSG for the GeoKeyDirectory.
This means files with user-defined CRS no longer lose their spatial
reference when round-tripped, as long as pyproj can resolve the
WKT/PROJ string to an EPSG code.
5 new tests: WKT from EPSG, write with WKT string, write with PROJ
string, crs_wkt attr round-trip, and no-CRS baseline.
* Preserve GDALMetadata XML (tag 42112) through read/write
Band names, statistics, and other GDAL-specific metadata stored in
the GDALMetadata XML tag (42112) are now read, exposed, and written
back.
Read: the XML is parsed into a dict stored in attrs['gdal_metadata'].
Dataset-level items use string keys ('DataType'), per-band items use
(name, band_int) tuple keys (('STATISTICS_MAXIMUM', 0)). The raw XML
is also available in attrs['gdal_metadata_xml'].
Write: accepts gdal_metadata_xml on write(), or extracts from
DataArray attrs on write_geotiff(). If attrs has a gdal_metadata
dict but no raw XML, it's re-serialized automatically.
Round-trip verified on the USGS 1-meter DEM which has statistics,
layer type, and data type metadata -- all items survive intact.
7 new tests: XML parsing, dict serialization, file round-trip,
DataArray attrs preservation, no-metadata baseline, real-file read,
and real-file round-trip.
* Preserve arbitrary TIFF tags through read/write round-trip
Any IFD tag that the writer doesn't explicitly manage (Software,
DateTime, ImageDescription, Copyright, custom private tags, etc.)
is now collected on read, stored in attrs['extra_tags'], and
re-emitted on write.
Read: extract_geo_info collects (tag_id, type_id, count, value)
tuples for all tags not in the _MANAGED_TAGS set (structural tags
that the writer builds from scratch: dimensions, compression,
offsets, geo tags, etc.). Stored in attrs['extra_tags'].
Write: extra_tags are appended to the IFD, skipping any tag_id
that was already written to avoid duplicates. The tag values are
serialized using the same type-aware encoder as built-in tags.
Tested with a hand-crafted TIFF containing Software (305) and
DateTime (306) tags. Both survive read -> write -> read intact.
3 new tests: read detection, round-trip preservation, and
no-extra-tags baseline.
* Fix BigTIFF auto-detection and add bigtiff= parameter
The auto-detection now estimates total file size (header + IFDs +
overflow + pixel data) instead of only checking compressed pixel
data size, and compares against UINT32_MAX (4,294,967,295) instead
of a hardcoded 3.9 GB threshold.
Also adds a bigtiff= parameter to write() and write_geotiff():
- bigtiff=None (default): auto-detect based on estimated file size
- bigtiff=True: force BigTIFF even for small files
- bigtiff=False: force classic TIFF (user's responsibility if >4GB)
3 new tests: force BigTIFF via public API, small file stays classic,
force classic via bigtiff=False.
* Handle big-endian pixel data correctly on read
Big-endian TIFFs (byte order marker 'MM') now byte-swap pixel data
to native order after decompression. Previously, the reader did
.view(dtype) with a native-order dtype, producing garbage values
for multi-byte types (uint16, int32, float32, float64).
Fix: _decode_strip_or_tile uses dtype.newbyteorder(file_byte_order)
for the view, then .astype(native_dtype) if a swap is needed.
Single-byte types (uint8) need no swap. The COG HTTP reader path
has the same fix.
Also fixed the test conftest: make_minimal_tiff(big_endian=True) now
actually writes pixel bytes in big-endian order.
7 new tests: float32, uint16, int32, float64, uint8 (no swap),
windowed read, and public API -- all with big-endian TIFFs.
* Add cloud storage support via fsspec (S3, GCS, Azure)
Read and write GeoTIFFs to/from cloud storage using fsspec as the
filesystem abstraction layer. Any URI with a :// scheme (that isn't
http/https) is routed through fsspec, which delegates to the
appropriate backend:
- s3://bucket/key.tif (requires s3fs)
- gs://bucket/key.tif (requires gcsfs)
- az://container/blob.tif (requires adlfs)
- abfs://container/blob.tif (requires adlfs)
- Any other fsspec-supported scheme (memory://, ftp://, etc.)
Read: _CloudSource uses fsspec.core.url_to_fs() then fs.open()
for full reads and range reads. Falls through to the same TIFF
parsing pipeline as local files.
Write: _write_bytes detects fsspec URIs and writes via fs.open()
instead of the local atomic-rename path (which doesn't apply to
cloud storage).
If fsspec or the backend library isn't installed, a clear ImportError
is raised with install instructions.
5 new tests using fsspec's memory:// filesystem for integration
testing without real cloud credentials.
* Add VRT (Virtual Raster Table) reader
Reads GDAL .vrt files by parsing the XML and assembling pixel data
from the referenced source GeoTIFFs using windowed reads.
Supported VRT features:
- SimpleSource: direct pixel copy with source/destination rects
- ComplexSource: scaling (ScaleRatio) and offset (ScaleOffset)
- Source nodata masking
- Multiple bands
- GeoTransform and SRS/CRS propagation
- Relative and absolute source file paths
- Windowed reads (only fetches overlapping source regions)
Usage:
da = read_geotiff('mosaic.vrt') # auto-detected by extension
da = read_vrt('mosaic.vrt') # explicit function
da = read_vrt('mosaic.vrt', window=(0, 0, 100, 100)) # windowed
read_geotiff auto-detects .vrt files and routes them through the
VRT reader. The DataArray gets coordinates from the VRT's
GeoTransform and CRS from the SRS tag.
New module: _vrt.py with parse_vrt() and read_vrt() functions.
8 new tests: single tile, 2x1 mosaic, 2x2 mosaic, windowed read,
CRS propagation, nodata, read_vrt API, and XML parser unit test.
* Fix 8 remaining gaps for production readiness
1. Band-first DataArray (CRITICAL): write_geotiff now detects
(band, y, x) dimension order and transposes to (y, x, band).
Prevents silent data corruption from rasterio-style arrays.
2. HTTP COG sub-byte support (CRITICAL): the COG HTTP reader now
routes through _decode_strip_or_tile like the local readers,
so 1-bit/4-bit/12-bit COGs over HTTP work correctly.
3. Dask VRT support (USEFUL): read_geotiff_dask detects .vrt files
and reads eagerly then chunks, since VRT windowed reads need
the virtual dataset's source layout.
4. VRT writer (USEFUL): write_vrt() generates a VRT XML file from
multiple source GeoTIFFs, computing the mosaic layout from their
geo transforms. Supports relative paths and CRS/nodata.
5. ExtraSamples tag (USEFUL): RGBA writes now include tag 338 with
value 2 (unassociated alpha). Multi-band with >3 bands also
gets ExtraSamples for bands beyond RGB.
6. MinIsWhite (USEFUL): photometric=0 (MinIsWhite) single-band
files are now inverted on read so 0=black, 255=white. Integer
values are inverted via max-value, floats via negation.
7. Post-write validation (POLISH): after writing, the header bytes
are parsed to verify the output is a valid TIFF. Emits a
warning if the header is corrupt.
8. Float16/bool auto-promotion (POLISH): float16 arrays are
promoted to float32, bool arrays to uint8, instead of raising
ValueError.
275 tests passing. 7 new tests for the fixes plus updated edge
case tests.
* Replace rioxarray with xrspatial.geotiff in examples
Removes the rioxarray dependency from all example notebooks:
- multispectral.ipynb: rioxarray.open_rasterio -> read_geotiff
- classification-methods.ipynb: same
- viewshed_gpu.ipynb: same
- 25_GLCM_Texture.ipynb: rioxarray.open_rasterio COG read ->
read_geotiff with window= and band= parameters. Also removes
GDAL-specific env vars (AWS_NO_SIGN_REQUEST, etc.) since our
reader doesn't use GDAL.
Also updates reproject/_crs_utils.py to check attrs['crs'] and
attrs['crs_wkt'] (xrspatial.geotiff convention) before falling
back to .rio.crs (rioxarray). This means DataArrays from
read_geotiff work directly with xrspatial.reproject without
needing rioxarray installed.
The rioxarray fallback is kept in _crs_utils.py for backwards
compatibility with users who pass rioxarray-decorated DataArrays.
* Add matplotlib and zstandard as core dependencies
Both are now required (not optional):
- matplotlib: needed for palette colormap (ListedColormap) and
da.xrs.plot() with palette TIFFs
- zstandard: needed for ZSTD compression (tag 50000), increasingly
common in modern COGs
This fixes the CI failures where these packages weren't installed.
* Add GPU-accelerated TIFF reader via Numba CUDA
read_geotiff_gpu() decodes tiled GeoTIFFs on the GPU and returns a
CuPy-backed DataArray that stays on device memory. No CPU->GPU
transfer needed for downstream xrspatial GPU operations (slope,
aspect, hillshade, etc.).
CUDA kernels implemented:
- LZW decode: one thread block per tile, LZW table in shared memory
(20KB per block, fast on-chip SRAM)
- Predictor decode (pred=2): one thread per row, horizontal cumsum
- Float predictor (pred=3): one thread per row, byte-lane undiff +
un-transpose
- Tile assembly: one thread per pixel, copies from decompressed
tile buffer to output image
Supports LZW and uncompressed tiled TIFFs. Falls back to CPU for
unsupported compression types or stripped files.
100% pixel-exact match with CPU reader on all tested files (USGS
LZW+pred3 3612x3612, synthetic LZW tiled).
Performance: GPU LZW is comparable to CPU (~330ms vs 270ms for
3612x3612) because LZW is inherently sequential per-stream. The
value is in keeping data on GPU for end-to-end pipelines without
CPU->GPU transfer overhead. Future work: CUDA inflate (deflate)
kernel would unlock the parallel decompression win since deflate
tiles are much more common in COGs.
* Add CUDA inflate (deflate decompression) kernel
Implements RFC 1951 deflate decompression as a Numba @cuda.jit kernel
for GPU-accelerated TIFF tile decoding. One thread block per tile,
all tiles decompress in parallel.
Supports all three deflate block types:
- BTYPE=0: stored (no compression)
- BTYPE=1: fixed Huffman codes
- BTYPE=2: dynamic Huffman codes (most common in real files)
Uses a two-level Huffman decode:
- Fast path: 10-bit shared-memory lookup table (1024 entries)
- Slow path: overflow array scan for codes > 10 bits (up to 15)
Fixes the infinite loop bug where 14-bit lit/len codes exceeded
the original 10-bit table size.
Tested: 100% pixel-exact match on Copernicus deflate+pred3 COG
(3600x3600, 16 tiles) vs CPU zlib.
Performance: GPU inflate is ~20x slower than CPU zlib for this file
size (16 tiles). Deflate is inherently sequential per-stream, so
each thread block runs a long serial loop while most SMs sit idle.
The value is keeping data on GPU for end-to-end pipelines. For
files with hundreds of tiles, the parallelism would help more.
* Add nvCOMP batch decompression fast path for GPU reads
gpu_decode_tiles() now tries kvikio.nvcomp.DeflateManager for batch
deflate decompression before falling back to the Numba CUDA inflate
kernel. nvCOMP is NVIDIA's optimized batched compression library
that decompresses all tiles in a single GPU API call.
Fallback chain for GPU decompression:
1. nvCOMP via kvikio (if installed) -- optimized CUDA kernels
2. Numba @cuda.jit inflate kernel -- pure Python/Numba implementation
3. CPU zlib fallback -- if GPU decode raises any error
kvikio is an optional dependency (pip install kvikio-cu12 or
conda install -c rapidsai kvikio). When not installed, the Numba
kernels are used transparently.
* Fix nvCOMP ctypes binding: ZSTD batch decompress working
Fixed the nvCOMP C API ctypes binding to pass opts structs by value
using proper ctypes.Structure subclasses. The previous byte-array
approach caused the struct to be misinterpreted by nvCOMP.
Working: nvCOMP ZSTD batch decompress (nvcompBatchedZstdDecompressAsync)
- 100% pixel-exact match on all tested files
- 1.5x end-to-end speedup on 8192x8192 ZSTD with 1024 tiles
(GPU pipeline: 404ms vs CPU+transfer: 620ms)
Not working on Ampere: nvCOMP deflate returns nvcompErrorNotSupported
(status 11). Deflate GPU decompression requires Ada Lovelace or
newer GPU with HW decompression engine. Falls back to the Numba
CUDA inflate kernel on Ampere.
nvCOMP is auto-detected by searching for libnvcomp.so in
CONDA_PREFIX and sibling conda environments. When found, ZSTD
tiles are batch-decompressed in a single GPU API call.
* Add KvikIO GDS (GPUDirect Storage) path for GPU reads
When kvikio is installed, read_geotiff_gpu() can read compressed tile
bytes directly from NVMe SSD to GPU VRAM via GPUDirect Storage,
bypassing CPU memory entirely:
Normal: SSD -> CPU (mmap) -> cupy.asarray (CPU->GPU copy)
With GDS: SSD -> GPU VRAM (direct DMA, no CPU involved)
The full pipeline for a ZSTD COG with GDS + nvCOMP:
SSD --(GDS)--> GPU compressed tiles --(nvCOMP)--> GPU decompressed
--> GPU predictor decode --> GPU tile assembly --> CuPy DataArray
Fallback chain in read_geotiff_gpu:
1. KvikIO GDS file read + nvCOMP batch decompress (fastest)
2. CPU mmap tile extract + nvCOMP batch decompress
3. CPU mmap tile extract + Numba CUDA kernels
4. CPU read_to_array + cupy.asarray transfer (slowest)
Also adds:
- gpu_decode_tiles_from_file(): accepts file path + offsets
instead of pre-extracted bytes, enabling the GDS path
- _try_nvcomp_from_device_bufs(): nvCOMP on tiles already in GPU
memory (from GDS), avoiding a device-to-host round-trip
- _apply_predictor_and_assemble(): shared GPU post-processing
used by both GDS and mmap paths
KvikIO is optional: conda install -c rapidsai kvikio
GDS requires: NVMe SSD + NVIDIA kernel module (nvidia-fs)
* Fix KvikIO GDS error handling and ZSTD GPU fallback
- GDS tile read: added sync + verification after each pread to catch
partial reads and CUDA errors early. Catches exception and tries
to reset CUDA state before falling back.
- gpu_decode_tiles: unsupported GPU codecs (ZSTD without nvCOMP, etc.)
now decompress on CPU then transfer to GPU instead of raising
ValueError. This keeps the predictor + assembly on GPU.
- Fixes cudaErrorIllegalAddress from kvikio version mismatch
(26.02 C lib vs 26.06 Python bindings) by catching the error
gracefully instead of poisoning the GPU state.
* Fix nvCOMP deflate: use CUDA backend (backend=2) instead of DEFAULT
nvCOMP deflate decompression now works on all CUDA GPUs by using
backend=2 (CUDA software implementation) instead of backend=0
(DEFAULT, which tries hardware decompression first and fails on
pre-Ada GPUs).
Benchmarks (read + slope, A6000 GPU, nvCOMP via libnvcomp.so):
Deflate:
8192x8192 (1024 tiles): GPU 769ms vs CPU 1364ms = 1.8x
16384x16384 (4096 tiles): GPU 2417ms vs CPU 5788ms = 2.4x
ZSTD:
8192x8192 (1024 tiles): GPU 349ms vs CPU 404ms = 1.2x
16384x16384 (4096 tiles): GPU 1325ms vs CPU 2087ms = 1.6x
Both codecs decompress entirely on GPU via nvCOMP batch API.
No CPU decompression fallback needed when nvCOMP is available.
100% pixel-exact match verified.
* Update README with GeoTIFF I/O feature matrix and GPU benchmarks
Adds a GeoTIFF / COG I/O section to the feature matrix covering:
- read_geotiff, write_geotiff, read_geotiff_gpu, VRT, open_cog
- Compression codecs (deflate, LZW, ZSTD, PackBits, JPEG)
- GPU decompression via nvCOMP (2.4x speedup at 16K x 16K)
- Cloud storage, GDS, metadata preservation, sub-byte support
- Overview resampling modes
Updates Quick Start to use read_geotiff instead of synthetic data.
Updates Notes on GDAL to reflect native reader capabilities.
Updates Dependencies to list core and optional packages.
* Reorder README feature matrix by GIS workflow frequency
Sections now go from most-used to least-used in a typical workflow:
1. GeoTIFF / COG I/O (read your data first)
2. Surface (slope, aspect, hillshade -- the basics)
3. Hydrology (flow direction, accumulation, watersheds)
4. Flood (downstream from hydrology)
5. Multispectral (satellite imagery)
6. Classification (binning results)
7. Focal (neighborhood analysis)
8. Proximity (distance)
9. Zonal (zonal stats)
10. Reproject / Merge
11. Interpolation
12. Morphological
13. Fire
14. Raster / Vector Conversion
15. Utilities
16. Multivariate, Pathfinding, Diffusion, Dasymetric (specialized)
Previously alphabetical, which put Classification first and buried
Surface and Hydrology in the middle.
* Move Reproject to #2 and Utilities to #3 in feature matrix
* Add GPU-accelerated GeoTIFF write via nvCOMP batch compress
write_geotiff_gpu() compresses tiles on the GPU and writes a valid
GeoTIFF. The CuPy array stays on device throughout -- only the
compressed bytes transfer to CPU for file assembly.
GPU pipeline:
CuPy array → tile extraction (CUDA kernel) → predictor encode
(CUDA kernel) → nvCOMP batch compress → CPU file assembly
CUDA kernels added:
- _extract_tiles_kernel: image → per-tile buffers (1 thread/pixel)
- _predictor_encode_kernel: horizontal differencing (1 thread/row)
- _fp_predictor_encode_kernel: float predictor (1 thread/row)
- _nvcomp_batch_compress: deflate + ZSTD via nvCOMP C API
Deflate write performance (tiled 256, A6000):
2048x2048: GPU 135ms vs CPU 424ms = 3.1x faster
4096x4096: GPU 302ms vs CPU 1678ms = 5.6x faster
8192x8192: GPU 1114ms vs CPU 6837ms = 6.1x faster
GPU deflate is also 1.5-1.8x faster than rioxarray/GDAL at 4K+.
All round-trips verified pixel-exact (deflate, ZSTD, uncompressed).
* Update README benchmarks and enable all backend write support
README:
- Updated feature matrix: write_geotiff now shows Dask ✅ and
CuPy 🔄 (fallback). Added write_geotiff_gpu and read_geotiff_dask
rows. Updated VRT to show Dask support.
- Added comprehensive benchmark tables for read (real-world + synthetic)
and write (CPU vs GPU vs rioxarray) across all sizes and codecs.
- 100% consistency verified across all tested files.
Backend support for write_geotiff:
- NumPy: direct write (existing)
- Dask DataArray: .compute() then write (existing, now documented)
- CuPy raw array: .get() to numpy then write (new)
- CuPy DataArray: .data.get() then write (new)
- Dask+CuPy: .compute().get() then write (new)
- Python list: np.asarray() then write (existing)
For GPU-native compression (no CPU transfer), use write_geotiff_gpu.
* Enable Dask+CuPy for GPU read and write
read_geotiff_gpu:
- New chunks= parameter returns a Dask+CuPy DataArray
- read_geotiff_gpu('dem.tif', chunks=512) decompresses on GPU
then chunks the result for out-of-core GPU pipelines
write_geotiff_gpu:
- Accepts Dask+CuPy DataArrays (.compute() then compress on GPU)
- Accepts Dask+NumPy DataArrays (.compute() then transfer to GPU)
- Accepts raw CuPy, numpy, or list inputs
All 7 input combinations verified:
read_geotiff_gpu -> CuPy DataArray (existing)
read_geotiff_gpu(chunks=N) -> Dask+CuPy DataArray (new)
write_geotiff_gpu(cupy_array) (existing)
write_geotiff_gpu(cupy_DataArray) (existing)
write_geotiff_gpu(dask_cupy_DataArray) (new)
write_geotiff_gpu(numpy_array) (auto-transfer)
write_geotiff_gpu(dask_numpy_DataArray) (auto-compute+transfer)
Also fixed write_geotiff CuPy fallback for raw arrays and
Dask+CuPy DataArrays (compute then .get() to numpy).
* Unified API: read_geotiff/write_geotiff auto-dispatch CPU/GPU/Dask
read_geotiff and write_geotiff now dispatch to the correct backend
automatically:
read_geotiff('dem.tif') # NumPy (default)
read_geotiff('dem.tif', gpu=True) # CuPy via nvCOMP
read_geotiff('dem.tif', chunks=512) # Dask lazy
read_geotiff('dem.tif', gpu=True, chunks=512) # Dask+CuPy
write_geotiff(numpy_arr, 'out.tif') # CPU write
write_geotiff(cupy_arr, 'out.tif') # auto-detects CuPy -> GPU write
write_geotiff(data, 'out.tif', gpu=True) # force GPU write
Auto-detection: write_geotiff checks isinstance(data, cupy.ndarray)
to decide whether to use GPU compression. Falls back to CPU if
cupy is not installed or nvCOMP fails.
read_vrt also supports gpu= and chunks= parameters for all four
backend combinations.
Users no longer need to call read_geotiff_gpu/write_geotiff_gpu
directly -- the main functions handle everything.
* Update README: unified API with all 5 backends in feature matrix
* Pass chunks= and gpu= through open_cog to read_geotiff
* Deprecate open_cog -- read_geotiff handles all sources
read_geotiff already accepts HTTP URLs, cloud URIs (s3://, gs://,
az://), local files, and VRT files. open_cog is now a thin
deprecated wrapper. Users just use read_geotiff for everything:
read_geotiff('https://example.com/cog.tif')
read_geotiff('s3://bucket/cog.tif')
read_geotiff('/local/dem.tif')
read_geotiff('mosaic.vrt')
All with gpu=, chunks=, window=, band= options.
Removed open_cog from the README feature matrix.
* Simplify public API to 3 functions
The public API is now:
read_geotiff(source, ...) # read anything: file, URL, cloud, VRT
write_geotiff(data, path, ...) # write any backend
write_vrt(path, sources) # generate VRT mosaic XML
read_geotiff auto-detects:
- .vrt extension -> VRT reader
- http:// / https:// -> COG range-request reader
- s3:// / gs:// / az:// -> cloud via fsspec
- gpu=True -> nvCOMP GPU decompression
- chunks=N -> Dask lazy windowed reads
Removed from __all__: open_cog (deprecated wrapper),
read_vrt (called internally), read_geotiff_dask (use chunks=),
read_geotiff_gpu / write_geotiff_gpu (use gpu=True).
All these functions still exist for backwards compatibility but
are no longer the recommended entry points.1 parent 54f299a commit c70cae4
File tree
28 files changed
+11176
-225
lines changed- docs/source/user_guide
- examples
- user_guide
- xrspatial
- geotiff
- tests
- reproject
28 files changed
+11176
-225
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
| 44 | + | |
56 | 45 | | |
57 | 46 | | |
58 | 47 | | |
| |||
143 | 132 | | |
144 | 133 | | |
145 | 134 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
| 135 | + | |
163 | 136 | | |
164 | 137 | | |
165 | 138 | | |
| |||
362 | 335 | | |
363 | 336 | | |
364 | 337 | | |
365 | | - | |
| 338 | + | |
366 | 339 | | |
367 | 340 | | |
368 | 341 | | |
| |||
425 | 398 | | |
426 | 399 | | |
427 | 400 | | |
428 | | - | |
| 401 | + | |
429 | 402 | | |
430 | 403 | | |
431 | 404 | | |
| |||
436 | 409 | | |
437 | 410 | | |
438 | 411 | | |
439 | | - | |
| 412 | + | |
440 | 413 | | |
441 | 414 | | |
442 | 415 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
| 267 | + | |
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
| |||
282 | 282 | | |
283 | 283 | | |
284 | 284 | | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
296 | | - | |
297 | | - | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
| 285 | + | |
318 | 286 | | |
319 | 287 | | |
320 | 288 | | |
321 | 289 | | |
322 | 290 | | |
323 | 291 | | |
324 | 292 | | |
325 | | - | |
| 293 | + | |
326 | 294 | | |
327 | 295 | | |
328 | 296 | | |
| |||
485 | 453 | | |
486 | 454 | | |
487 | 455 | | |
488 | | - | |
| 456 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
| |||
64 | 66 | | |
65 | 67 | | |
66 | 68 | | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
| 69 | + | |
76 | 70 | | |
77 | 71 | | |
78 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
24 | 51 | | |
25 | 52 | | |
26 | 53 | | |
| |||
0 commit comments