Skip to content

Commit 26b6404

Browse files
committed
Fix nvCOMP deflate: use CUDA backend (backend=2) instead of DEFAULT
nvCOMP deflate decompression now works on all CUDA GPUs by using backend=2 (CUDA software implementation) instead of backend=0 (DEFAULT, which tries hardware decompression first and fails on pre-Ada GPUs). Benchmarks (read + slope, A6000 GPU, nvCOMP via libnvcomp.so): Deflate: 8192x8192 (1024 tiles): GPU 769ms vs CPU 1364ms = 1.8x 16384x16384 (4096 tiles): GPU 2417ms vs CPU 5788ms = 2.4x ZSTD: 8192x8192 (1024 tiles): GPU 349ms vs CPU 404ms = 1.2x 16384x16384 (4096 tiles): GPU 1325ms vs CPU 2087ms = 1.6x Both codecs decompress entirely on GPU via nvCOMP batch API. No CPU decompression fallback needed when nvCOMP is available. 100% pixel-exact match verified.
1 parent 339581f commit 26b6404

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

xrspatial/geotiff/_gpu_decode.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -821,7 +821,8 @@ class _NvcompDeflateDecompOpts(ctypes.Structure):
821821
raw_tiles = [t[2:-4] if len(t) > 6 else t for t in compressed_tiles]
822822
get_temp_fn = 'nvcompBatchedDeflateDecompressGetTempSizeAsync'
823823
decomp_fn = 'nvcompBatchedDeflateDecompressAsync'
824-
opts = _NvcompDeflateDecompOpts(backend=0, sort_before_hw_decompress=0,
824+
# backend=2 (CUDA) works on all GPUs; backend=1 (HW) needs Ada/Hopper
825+
opts = _NvcompDeflateDecompOpts(backend=2, sort_before_hw_decompress=0,
825826
reserved=b'\x00' * 56)
826827
elif compression == 50000: # ZSTD
827828
raw_tiles = list(compressed_tiles) # no header stripping

0 commit comments

Comments
 (0)