Skip to content

Commit 95c2a48

Browse files
committed
Add CUDA inflate (deflate decompression) kernel
Implements RFC 1951 deflate decompression as a Numba @cuda.jit kernel for GPU-accelerated TIFF tile decoding. One thread block per tile, all tiles decompress in parallel. Supports all three deflate block types: - BTYPE=0: stored (no compression) - BTYPE=1: fixed Huffman codes - BTYPE=2: dynamic Huffman codes (most common in real files) Uses a two-level Huffman decode: - Fast path: 10-bit shared-memory lookup table (1024 entries) - Slow path: overflow array scan for codes > 10 bits (up to 15) Fixes the infinite loop bug where 14-bit lit/len codes exceeded the original 10-bit table size. Tested: 100% pixel-exact match on Copernicus deflate+pred3 COG (3600x3600, 16 tiles) vs CPU zlib. Performance: GPU inflate is ~20x slower than CPU zlib for this file size (16 tiles). Deflate is inherently sequential per-stream, so each thread block runs a long serial loop while most SMs sit idle. The value is keeping data on GPU for end-to-end pipelines. For files with hundreds of tiles, the parallelism would help more.
1 parent d69d34f commit 95c2a48

File tree

1 file changed

+451
-3
lines changed

1 file changed

+451
-3
lines changed

0 commit comments

Comments
 (0)