Fix nvCOMP deflate: use CUDA backend (backend=2) instead of DEFAULT

brendancol · brendancol · commit 26b640491791 · 2026-03-20T11:15:14.000-07:00
nvCOMP deflate decompression now works on all CUDA GPUs by using
backend=2 (CUDA software implementation) instead of backend=0
(DEFAULT, which tries hardware decompression first and fails on
pre-Ada GPUs).

Benchmarks (read + slope, A6000 GPU, nvCOMP via libnvcomp.so):

Deflate:
  8192x8192  (1024 tiles): GPU  769ms vs CPU 1364ms = 1.8x
  16384x16384 (4096 tiles): GPU 2417ms vs CPU 5788ms = 2.4x

ZSTD:
  8192x8192  (1024 tiles): GPU  349ms vs CPU  404ms = 1.2x
  16384x16384 (4096 tiles): GPU 1325ms vs CPU 2087ms = 1.6x

Both codecs decompress entirely on GPU via nvCOMP batch API.
No CPU decompression fallback needed when nvCOMP is available.
100% pixel-exact match verified.
diff --git a/xrspatial/geotiff/_gpu_decode.py b/xrspatial/geotiff/_gpu_decode.py
@@ -821,7 +821,8 @@ class _NvcompDeflateDecompOpts(ctypes.Structure):
             raw_tiles = [t[2:-4] if len(t) > 6 else t for t in compressed_tiles]
             get_temp_fn = 'nvcompBatchedDeflateDecompressGetTempSizeAsync'
             decomp_fn = 'nvcompBatchedDeflateDecompressAsync'
-            opts = _NvcompDeflateDecompOpts(backend=0, sort_before_hw_decompress=0,
+            # backend=2 (CUDA) works on all GPUs; backend=1 (HW) needs Ada/Hopper
+            opts = _NvcompDeflateDecompOpts(backend=2, sort_before_hw_decompress=0,
                                             reserved=b'\x00' * 56)
         elif compression == 50000:  # ZSTD
             raw_tiles = list(compressed_tiles)  # no header stripping