Skip to content

Add Hypothesis property/fuzz tests for geotiff (#1661)#1666

Merged
brendancol merged 1 commit into
mainfrom
issue-1661
May 12, 2026
Merged

Add Hypothesis property/fuzz tests for geotiff (#1661)#1666
brendancol merged 1 commit into
mainfrom
issue-1661

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1661.

Summary

  • Adds xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.py with three property groups: round-trip equality, IFD layout permutations, and single-byte mutation. hypothesis is gated via pytest.importorskip so it stays optional.
  • Fixes three bugs the byte-mutation property caught on its first run. Each turned a one-byte header corruption into a bare exception that violates the documented ValueError / TypeError parser contract.
  • Adds targeted regression tests next to the property tests that surfaced them.

Bugs surfaced by Hypothesis

The fuzzer ran an exhaustive sweep over the 198-byte le_strip_f32 corpus member (4x4 float32 stripped, no geo). Three offsets, all flipped to 0x00, produced non-typed exceptions:

Offset Field zeroed Old exception Fix
102 RowsPerStrip value ZeroDivisionError in _read_strips: r0 // rps Guard rps <= 0 -> ValueError
110 StripByteCounts count IndexError: tuple index out of range in _read_strips Length check on strip table after the existing dimension-safety check
122 SampleFormat count IndexError: tuple index out of range in IFD.sample_format Fall back to default sample_format = 1 when the tuple is empty

After the fixes, an exhaustive sweep across all four corpus members (le_strip_f32, be_strip_u16, le_tiled_f32, le_geo_f32) produces zero non-typed exceptions across every offset/byte combination.

Why this matters

These are the same bug shape as the recent commit log cited in the issue: a header field is read without a guard, a downstream operation assumes structure, and a corrupt or unusual input crashes with the wrong exception type. Hand-written tests cover the shapes the author thinks of; Hypothesis exercises the rest of the byte space cheaply.

Followups (not in this PR)

  • Add hypothesis to setup.cfg [tests] once the harness has run in CI for a release cycle without flaking.
  • Seed a real-file corpus (GDAL-generated COG, Sentinel-2-style fixture) for the property tests once we see what synthetic strategies miss.

Test plan

  • pytest xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.py -v - 10 passed in 2.77s
  • pytest xrspatial/geotiff/tests/ - 1476 passed, 7 skipped, 0 new failures (3 deselected pre-existing matplotlib RecursionErrors unrelated to this PR)
  • Exhaustive byte-mutation sweep across the 4-file corpus - 0 non-typed exceptions after fixes

…ps (#1661)

Adds xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.py with three
property groups: round-trip equality through to_geotiff/open_geotiff,
IFD layout permutations via make_minimal_tiff, and single-byte mutation
of a small corpus.  Hypothesis is gated with pytest.importorskip so
the suite still collects without the dependency installed.

The byte-mutation property surfaced three real bugs on first run.  All
three turned into bare IndexError / ZeroDivisionError on a corrupt
header byte instead of the documented ValueError / TypeError contract.
Fixed:

- _read_strips ZeroDivisionError when RowsPerStrip == 0
- _read_strips IndexError when StripByteCounts is shorter than the
  expected strip count (placed after the existing dimension safety
  check so a crafted huge-header still raises the safety-limit error
  first)
- IFD.sample_format IndexError when SampleFormat has count=0; falls
  back to the default sample_format = 1 (unsigned int)

Three targeted regression tests lock in the fixes alongside the
property tests that surfaced them.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol merged commit 58a8b3d into main May 12, 2026
12 checks passed
brendancol added a commit that referenced this pull request May 12, 2026
Merge origin/main into the SSRF-hardening branch and address the
Copilot review comments.

Redirect handling (review #1, #2):
- urllib3 path: stop delegating redirect-follow to urllib3.
  _HTTPSource now GETs with redirect=False and walks the 3xx chain
  in Python, re-validating each Location against _validate_http_url.
  Caps the chain at _HTTP_MAX_REDIRECTS (5). A public URL can no
  longer 3xx into a loopback or private IP.
- stdlib path: install _ValidatingRedirectHandler on a module-level
  opener so the stdlib fallback gets the same per-hop re-validation
  and redirect cap. read_range / read_all both go through it now.

Scheme allow-list (review #3):
- Drop XRSPATIAL_GEOTIFF_ALLOWED_SCHEMES. _HTTPSource is a urllib3 /
  urllib Range-GET implementation; widening the validator without
  widening the source just moved the failure to connect time. fsspec
  handles every other scheme:// via _open_source's _CloudSource branch.

Env-var clamp (review #4):
- _max_tile_bytes_from_env now falls back to the default for zero or
  negative values, matching the timeout helpers. The previous
  max(1, val) clamp silently rejected every tile on a typoed
  XRSPATIAL_COG_MAX_TILE_BYTES=-1.

Docs and comments (review #5, #6):
- "Security and I/O limits" docs: explain that non-http(s) schemes
  dispatch via fsspec and that the redirect cap re-validates each
  hop. Drop the misleading XRSPATIAL_GEOTIFF_ALLOWED_SCHEMES example.
- test_read_to_array_rejects_file_url: comment corrected to match
  _open_source routing file:// through fsspec, and the expected-
  exception list now includes ImportError.

Merge conflict resolution against origin/main:
- __init__.py and test_features.py: keep both GeoTIFFFallbackWarning
  (#1662) and UnsafeURLError in __all__ / TestPublicAPI.expected.
- _reader.py _read_strips: byte-cap (this PR) and RowsPerStrip<=0
  guard (#1666) both kept; cap runs first.
- docs/geotiff.rst: "Security and I/O limits" and "Strict mode" live
  as sibling sections.

xrspatial/geotiff/tests/: 1611 passed (3 pre-existing matplotlib
palette failures unchanged).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Hypothesis fuzzing + property tests for geotiff

1 participant