Skip to content

Cap LERC and JPEG 2000 decompression output to block bomb attacks (#1625)#1629

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-security-geotiff-2026-05-11-d
May 11, 2026
Merged

Cap LERC and JPEG 2000 decompression output to block bomb attacks (#1625)#1629
brendancol merged 2 commits into
mainfrom
deep-sweep-security-geotiff-2026-05-11-d

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

lerc_decompress_with_mask and jpeg2000_decompress did not enforce the same pre-decode output-size cap that deflate, zstd, lz4, and packbits got in #1533. A crafted GeoTIFF tile compressed with LERC or JPEG 2000 could declare arbitrarily large pixel dimensions in the codestream, and the underlying library (lerc.decode / glymur.Jp2k[:]) materialised the full buffer before the existing post-decode size check in _decode_strip_or_tile ever ran.

LERC compresses constant-value blocks at >700,000:1, so a 94-byte blob requests 64 MiB; a ~1 KB on-disk tile can ask for several GB. JPEG 2000 has similar amplification through codestream-declared dimensions.

Fix

Pre-validate each codestream's declared output against the same expected_size * 1.05 + 1 cap the other codecs use:

  • LERC: lerc.getLercBlobInfo(blob) returns nCols, nRows, nBands, and dataType from the header without decoding. The wrapper computes nCols * nRows * nBands * dtype_bytes and raises ValueError when it exceeds the cap.
  • JPEG 2000: glymur.Jp2k(file).shape reads only the SIZ marker (sub-millisecond on a 2000x2000 file), so checking it before [:] is cheap.

Pillow's JPEG decoder already has its own Image.MAX_IMAGE_PIXELS guard, so no wrapper-level cap is needed for the JPEG codec.

expected_size is plumbed through:

  • decompress() -> lerc_decompress / jpeg2000_decompress
  • _decode_strip_or_tile's LERC branch (which bypasses decompress() to capture the valid mask)
  • _gpu_decode.py's CPU fallback paths for both codecs

Test plan

  • pytest xrspatial/geotiff/tests/test_decompression_caps.py -- 24/24 pass (6 new tests covering LERC + JPEG 2000 bomb rejection, legitimate round-trip, and backward-compat with expected_size=0)
  • pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_lerc.py xrspatial/geotiff/tests/test_jpeg2000.py xrspatial/geotiff/tests/test_security.py -- 89/89 pass
  • Full geotiff suite: 1267 pass, 3 unrelated matplotlib RecursionError failures in TestPalette plot tests (no compression code touched)

Closes #1625.

The deflate, zstd, lz4, and packbits wrappers all received a pre-decode
output-size cap in #1533 so a crafted TIFF tile cannot expand to many
GB before the post-decode size check fires. LERC and JPEG 2000 were
missed: lerc_decompress_with_mask and jpeg2000_decompress called the
underlying lerc.decode / glymur.Jp2k[:] with no bound, so the existing
size check in _decode_strip_or_tile ran only after the full buffer had
already been materialised by the external library.

LERC compresses constant-value blocks at >700,000:1, so a 94-byte blob
can request 64 MiB of host memory. A 1 KB on-disk LERC tile can ask
for several GB before the reader rejects it. JPEG 2000 has similar
amplification potential through codestream-declared dimensions.

Fix: query each codestream's declared dimensions before decoding.
- LERC: lerc.getLercBlobInfo(blob) returns nCols, nRows, nBands, and
  dataType from the header without decoding. Compute the projected
  output bytes and raise ValueError when it exceeds the same
  expected_size * 1.05 + 1 cap used by every other codec.
- JPEG 2000: glymur.Jp2k(file).shape parses only the SIZ marker, so
  inspecting it before [:] is cheap.

Pillow's JPEG decoder already has Image.MAX_IMAGE_PIXELS and raises
DecompressionBombError, so no wrapper-level cap is needed there.

Plumb expected_size through the decompress() dispatcher,
_decode_strip_or_tile's LERC branch, and the _gpu_decode CPU
fallbacks for both codecs.

Tests cover bomb-rejection and legitimate round-trips at the codec
level for both LERC and JPEG 2000, plus the existing zero-expected-size
backward-compat path.

Found by /sweep-security on the geotiff module.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the GeoTIFF reader’s decompression-bomb defenses to the LERC and JPEG 2000 codecs by adding a pre-decode output-size cap (matching the expected_size * 1.05 + 1 pattern introduced for other codecs in #1533), and wires the cap through CPU/GPU decode paths. This prevents external libraries (lerc.decode, glymur.Jp2k[:]) from allocating attacker-controlled multi-GB buffers before the reader’s post-decode size validation runs.

Changes:

  • Add expected_size plumbing and pre-decode declared-size validation for lerc_decompress(_with_mask) and jpeg2000_decompress.
  • Forward expected_size through _decode_strip_or_tile (LERC mask path) and GPU CPU-fallback decode paths.
  • Add direct codec-level tests for LERC/JPEG2000 bomb rejection and backward-compat when expected_size=0.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xrspatial/geotiff/_compression.py Adds pre-decode declared-output checks for LERC and JPEG2000 and threads expected_size through dispatch.
xrspatial/geotiff/_reader.py Passes per-tile expected size into lerc_decompress_with_mask to enforce pre-decode caps.
xrspatial/geotiff/_gpu_decode.py Ensures CPU fallback for JPEG2000/LERC enforces the same expected_size cap during GPU decode.
xrspatial/geotiff/tests/test_decompression_caps.py Adds direct bomb/roundtrip/no-cap tests for LERC and JPEG2000.
.claude/sweep-security-state.csv Updates the sweep-security tracking entry for this issue/fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_compression.py Outdated
Comment on lines +1145 to +1147
if shape is not None and dtype is not None:
declared = int(np.prod(shape)) * dtype.itemsize
cap = _max_output_with_margin(expected_size)
Comment thread xrspatial/geotiff/_compression.py Outdated
Comment on lines +1138 to +1145
if expected_size > 0:
try:
shape = jp2.shape
dtype = np.dtype(getattr(jp2, 'dtype', np.uint8))
except Exception:
shape = None
dtype = None
if shape is not None and dtype is not None:
# so it produces the canonical error rather than masking it here.
return
if len(info) < 7:
return
``jp2[:]``.
"""
import glymur
import os
Copilot review on #1629 flagged four issues. Fixes:

- np.prod overflow (JPEG 2000): replace ``int(np.prod(shape))`` with
  ``math.prod(int(d) for d in shape)``. ``np.prod`` multiplies in
  fixed-width intp and can wrap to a small or negative value on
  attacker-declared dimensions (SIZ marker uses uint32 fields, so a
  malformed tile can declare shape whose product exceeds int64),
  silently under-counting the declared size and bypassing the cap.
  ``math.prod`` uses Python arbitrary-precision ints.

- Fail-closed on unreadable SIZ marker (JPEG 2000): previously, an
  exception from ``jp2.shape`` or ``jp2.dtype`` disabled the cap and
  fell through to ``jp2[:]`` (the bomb path). ``getattr(jp2, 'dtype',
  np.uint8)`` also defaulted to the smallest dtype, undercounting
  declared bytes. Now both failure modes raise ValueError before any
  pixel decoding runs. Added test_jpeg2000_unreadable_shape_fails_closed
  to lock it in (monkeypatches Jp2k.shape to raise; asserts jp2[:] is
  never invoked).

- LERC errCode not checked: ``lerc.getLercBlobInfo`` returns
  ``(errCode, ...)`` and when errCode != 0 the remaining fields can be
  garbage. ``_check_lerc_bomb`` now returns early in that case so the
  canonical error comes from ``lerc.decode``, not from a spurious
  declared-size computation.

- Unused ``import os`` in test_jpeg2000_bomb_raises (flake8 F401):
  removed.
@brendancol
Copy link
Copy Markdown
Contributor Author

Addressed all four Copilot review comments in bb30f0e:

  1. np.prod overflow (JPEG 2000) — replaced int(np.prod(shape)) with math.prod(int(d) for d in shape). np.prod multiplies in fixed-width intp and can wrap on attacker-declared dimensions (SIZ uses uint32 fields, so the product can exceed int64), silently under-counting declared and letting bombs through. math.prod uses Python arbitrary-precision ints.

  2. Fail-closed on unreadable SIZ marker (JPEG 2000) — previously, an exception from jp2.shape/jp2.dtype set both to None, skipped the cap, and fell through to jp2[:] (the bomb path). The dtype fallback getattr(jp2, 'dtype', np.uint8) also defaulted to the smallest dtype, undercounting declared bytes. Both failure modes now raise ValueError before any pixel decoding runs. Added test_jpeg2000_unreadable_shape_fails_closed (monkeypatches Jp2k.shape to raise and asserts jp2[:] is never invoked).

  3. LERC errCode not checkedlerc.getLercBlobInfo returns (errCode, ...); when errCode != 0 the remaining fields may be garbage. _check_lerc_bomb now returns early in that case so lerc.decode produces the canonical error.

  4. Unused import os in test_jpeg2000_bomb_raises — removed (flake8 F401).

pyflakes clean. 25/25 in test_decompression_caps.py, 89/89 in the broader compression+security suite.

@brendancol brendancol merged commit cdf7d43 into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LERC and JPEG2000 codecs lack decompression-bomb cap

2 participants