Cap LERC and JPEG 2000 decompression output to block bomb attacks (#1625)#1629
Conversation
The deflate, zstd, lz4, and packbits wrappers all received a pre-decode output-size cap in #1533 so a crafted TIFF tile cannot expand to many GB before the post-decode size check fires. LERC and JPEG 2000 were missed: lerc_decompress_with_mask and jpeg2000_decompress called the underlying lerc.decode / glymur.Jp2k[:] with no bound, so the existing size check in _decode_strip_or_tile ran only after the full buffer had already been materialised by the external library. LERC compresses constant-value blocks at >700,000:1, so a 94-byte blob can request 64 MiB of host memory. A 1 KB on-disk LERC tile can ask for several GB before the reader rejects it. JPEG 2000 has similar amplification potential through codestream-declared dimensions. Fix: query each codestream's declared dimensions before decoding. - LERC: lerc.getLercBlobInfo(blob) returns nCols, nRows, nBands, and dataType from the header without decoding. Compute the projected output bytes and raise ValueError when it exceeds the same expected_size * 1.05 + 1 cap used by every other codec. - JPEG 2000: glymur.Jp2k(file).shape parses only the SIZ marker, so inspecting it before [:] is cheap. Pillow's JPEG decoder already has Image.MAX_IMAGE_PIXELS and raises DecompressionBombError, so no wrapper-level cap is needed there. Plumb expected_size through the decompress() dispatcher, _decode_strip_or_tile's LERC branch, and the _gpu_decode CPU fallbacks for both codecs. Tests cover bomb-rejection and legitimate round-trips at the codec level for both LERC and JPEG 2000, plus the existing zero-expected-size backward-compat path. Found by /sweep-security on the geotiff module.
There was a problem hiding this comment.
Pull request overview
This PR extends the GeoTIFF reader’s decompression-bomb defenses to the LERC and JPEG 2000 codecs by adding a pre-decode output-size cap (matching the expected_size * 1.05 + 1 pattern introduced for other codecs in #1533), and wires the cap through CPU/GPU decode paths. This prevents external libraries (lerc.decode, glymur.Jp2k[:]) from allocating attacker-controlled multi-GB buffers before the reader’s post-decode size validation runs.
Changes:
- Add
expected_sizeplumbing and pre-decode declared-size validation forlerc_decompress(_with_mask)andjpeg2000_decompress. - Forward
expected_sizethrough_decode_strip_or_tile(LERC mask path) and GPU CPU-fallback decode paths. - Add direct codec-level tests for LERC/JPEG2000 bomb rejection and backward-compat when
expected_size=0.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
xrspatial/geotiff/_compression.py |
Adds pre-decode declared-output checks for LERC and JPEG2000 and threads expected_size through dispatch. |
xrspatial/geotiff/_reader.py |
Passes per-tile expected size into lerc_decompress_with_mask to enforce pre-decode caps. |
xrspatial/geotiff/_gpu_decode.py |
Ensures CPU fallback for JPEG2000/LERC enforces the same expected_size cap during GPU decode. |
xrspatial/geotiff/tests/test_decompression_caps.py |
Adds direct bomb/roundtrip/no-cap tests for LERC and JPEG2000. |
.claude/sweep-security-state.csv |
Updates the sweep-security tracking entry for this issue/fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if shape is not None and dtype is not None: | ||
| declared = int(np.prod(shape)) * dtype.itemsize | ||
| cap = _max_output_with_margin(expected_size) |
| if expected_size > 0: | ||
| try: | ||
| shape = jp2.shape | ||
| dtype = np.dtype(getattr(jp2, 'dtype', np.uint8)) | ||
| except Exception: | ||
| shape = None | ||
| dtype = None | ||
| if shape is not None and dtype is not None: |
| # so it produces the canonical error rather than masking it here. | ||
| return | ||
| if len(info) < 7: | ||
| return |
| ``jp2[:]``. | ||
| """ | ||
| import glymur | ||
| import os |
Copilot review on #1629 flagged four issues. Fixes: - np.prod overflow (JPEG 2000): replace ``int(np.prod(shape))`` with ``math.prod(int(d) for d in shape)``. ``np.prod`` multiplies in fixed-width intp and can wrap to a small or negative value on attacker-declared dimensions (SIZ marker uses uint32 fields, so a malformed tile can declare shape whose product exceeds int64), silently under-counting the declared size and bypassing the cap. ``math.prod`` uses Python arbitrary-precision ints. - Fail-closed on unreadable SIZ marker (JPEG 2000): previously, an exception from ``jp2.shape`` or ``jp2.dtype`` disabled the cap and fell through to ``jp2[:]`` (the bomb path). ``getattr(jp2, 'dtype', np.uint8)`` also defaulted to the smallest dtype, undercounting declared bytes. Now both failure modes raise ValueError before any pixel decoding runs. Added test_jpeg2000_unreadable_shape_fails_closed to lock it in (monkeypatches Jp2k.shape to raise; asserts jp2[:] is never invoked). - LERC errCode not checked: ``lerc.getLercBlobInfo`` returns ``(errCode, ...)`` and when errCode != 0 the remaining fields can be garbage. ``_check_lerc_bomb`` now returns early in that case so the canonical error comes from ``lerc.decode``, not from a spurious declared-size computation. - Unused ``import os`` in test_jpeg2000_bomb_raises (flake8 F401): removed.
|
Addressed all four Copilot review comments in bb30f0e:
|
Summary
lerc_decompress_with_maskandjpeg2000_decompressdid not enforce the same pre-decode output-size cap that deflate, zstd, lz4, and packbits got in #1533. A crafted GeoTIFF tile compressed with LERC or JPEG 2000 could declare arbitrarily large pixel dimensions in the codestream, and the underlying library (lerc.decode/glymur.Jp2k[:]) materialised the full buffer before the existing post-decode size check in_decode_strip_or_tileever ran.LERC compresses constant-value blocks at >700,000:1, so a 94-byte blob requests 64 MiB; a ~1 KB on-disk tile can ask for several GB. JPEG 2000 has similar amplification through codestream-declared dimensions.
Fix
Pre-validate each codestream's declared output against the same
expected_size * 1.05 + 1cap the other codecs use:lerc.getLercBlobInfo(blob)returnsnCols,nRows,nBands, anddataTypefrom the header without decoding. The wrapper computesnCols * nRows * nBands * dtype_bytesand raisesValueErrorwhen it exceeds the cap.glymur.Jp2k(file).shapereads only the SIZ marker (sub-millisecond on a 2000x2000 file), so checking it before[:]is cheap.Pillow's JPEG decoder already has its own
Image.MAX_IMAGE_PIXELSguard, so no wrapper-level cap is needed for the JPEG codec.expected_sizeis plumbed through:decompress()->lerc_decompress/jpeg2000_decompress_decode_strip_or_tile's LERC branch (which bypassesdecompress()to capture the valid mask)_gpu_decode.py's CPU fallback paths for both codecsTest plan
pytest xrspatial/geotiff/tests/test_decompression_caps.py-- 24/24 pass (6 new tests covering LERC + JPEG 2000 bomb rejection, legitimate round-trip, and backward-compat withexpected_size=0)pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_lerc.py xrspatial/geotiff/tests/test_jpeg2000.py xrspatial/geotiff/tests/test_security.py-- 89/89 passRecursionErrorfailures inTestPaletteplot tests (no compression code touched)Closes #1625.