Commit 95c2a48

committed

Add CUDA inflate (deflate decompression) kernel

Implements RFC 1951 deflate decompression as a Numba @cuda.jit kernel for GPU-accelerated TIFF tile decoding. One thread block per tile, all tiles decompress in parallel. Supports all three deflate block types: - BTYPE=0: stored (no compression) - BTYPE=1: fixed Huffman codes - BTYPE=2: dynamic Huffman codes (most common in real files) Uses a two-level Huffman decode: - Fast path: 10-bit shared-memory lookup table (1024 entries) - Slow path: overflow array scan for codes > 10 bits (up to 15) Fixes the infinite loop bug where 14-bit lit/len codes exceeded the original 10-bit table size. Tested: 100% pixel-exact match on Copernicus deflate+pred3 COG (3600x3600, 16 tiles) vs CPU zlib. Performance: GPU inflate is ~20x slower than CPU zlib for this file size (16 tiles). Deflate is inherently sequential per-stream, so each thread block runs a long serial loop while most SMs sit idle. The value is keeping data on GPU for end-to-end pipelines. For files with hundreds of tiles, the parallelism would help more.

1 parent d69d34f commit 95c2a48Copy full SHA for 95c2a48

1 file changed

+451

-3

lines changed

xrspatial/geotiff
- _gpu_decode.py

1 file changed

+451

-3

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 95c2a48

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments