Skip to content

geotiff: canonical round-trip invariants module (#1986)#2007

Merged
brendancol merged 3 commits into
mainfrom
issue-1986
May 18, 2026
Merged

geotiff: canonical round-trip invariants module (#1986)#2007
brendancol merged 3 commits into
mainfrom
issue-1986

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

First slice of issue #1986. Adds xrspatial/geotiff/tests/test_round_trip_invariants.py -- one canonical module that enumerates the supported round-trip cases and pins the invariant per case.

The existing per-incident round-trip files (test_metadata_round_trip_1484.py, test_int_coords_round_trip_hotfix_1962.py, test_no_georef_writer_round_trip_1949.py, test_miniswhite_writer_roundtrip_1836.py) stay as regression markers for their bug numbers. This module is the canonical contract going forward.

Cases covered

Case Invariant Notes
int single-band, no nodata byte-equivalent int16 / int32 / uint16
float single-band, no nodata byte-equivalent float32 / float64; NaN-as-data preserved without synthetic GDAL_NODATA
int with declared nodata documented drift reader promotes int -> float64 + NaN; valid pixels byte-equal, sentinel survives in attrs
multiband chunky byte-equivalent 2 / 3 / 4 bands
no-georef semantic transform / crs attrs absent on read and through round-trip
PixelIsPoint byte-equivalent raster_type attr round-trips; default PixelIsArea leaves it absent

Each case also asserts fixed-point convergence: a second write -> read cycle reproduces the first read exactly (pixels, dtype, dims, attrs key set).

Deferred to follow-up PRs

Called out in the module docstring:

Test plan

  • pytest xrspatial/geotiff/tests/test_round_trip_invariants.py -v -- 14 passed
  • CI green on the existing geotiff suite (the per-incident round-trip files are untouched)

Closes part of #1986.

Adds xrspatial/geotiff/tests/test_round_trip_invariants.py with the
first slice of the round-trip contract from issue #1986. Cases covered:

* int single-band, no nodata: byte-equivalent for pixels and dtype.
* float single-band, no nodata: byte-equivalent for pixels, including
  NaN-as-data positions. Asserts no synthetic GDAL_NODATA tag.
* int single-band with declared nodata: documented drift -- reader
  promotes to float64 + NaN; valid pixels are byte-equal, sentinel
  survives in attrs['nodata'].
* multiband chunky (2/3/4 bands, uint8): byte-equivalent for pixels,
  dtype, dims.
* no-georef: semantic-equivalent. transform attr stays absent on read
  and through round-trip.
* PixelIsPoint: byte-equivalent for raster_type attr and pixels.
  Defaults (PixelIsArea) leave raster_type absent.

Each case asserts fixed-point convergence: a second write -> read cycle
reproduces the first read exactly.

Deferred to follow-up PRs (called out in the module docstring):

* float with non-NaN declared nodata -- needs the masked / declared
  nodata split from issue #1988 to state the invariant cleanly.
* planar multiband -- writer emits chunky only.
* overviews / COG.
* VRT.

Per-incident round-trip files (test_metadata_round_trip_1484.py,
test_int_coords_round_trip_hotfix_1962.py, etc.) stay as regression
markers for their bug numbers.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 17, 2026
…1986)

Review follow-up for #2007. Tightens the round-trip contract:

* _assert_fixed_point now compares values of canonical attrs (crs,
  transform, nodata, raster_type) when both sides of a write -> read
  cycle have the key. transform compares up to float precision; the
  others must match exactly.
* TestIntSingleBandNoNodata, TestMultibandChunky: assert 'nodata' is
  absent on read and after round-trip. The reader must not invent a
  sentinel for int / uint8 rasters that never declared one.
* TestNoGeorefSemantic: assert dtype on the source read so the array
  comparison cannot accidentally pass via cross-dtype broadcasting.
* TestIntWithDeclaredNodata: drop redundant .ravel() on a boolean mask
  result (already 1D).
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: canonical round-trip invariants module (#1986)

Test-only PR. Steps for algorithm correctness, backend dispatch, CUDA register pressure, and README feature matrix do not apply -- no production code changed, no new public API. Review focuses on test quality, the asserted invariants, and coverage gaps the module itself flags.

Blockers

None.

Suggestions

  • _assert_fixed_point compares the attrs key set but not the values of canonical attrs. A writer regression that flips crs from 32610 to a WKT string, or perturbs transform, would pass the current check. Lock value parity for crs, transform, nodata, raster_type when present on both sides. Fixed in df2273a.
  • TestIntSingleBandNoNodata and TestMultibandChunky do not assert 'nodata' not in attrs. The float-no-nodata test does. Add the symmetric assertion so the reader cannot quietly start emitting a synthetic sentinel. Fixed in df2273a.

Nits

  • arr[arr != -9999].ravel() in TestIntWithDeclaredNodata::test_int32_sentinel_promotes_and_masks -- a boolean-masked array is already 1D, .ravel() is a no-op. Fixed in df2273a.
  • TestNoGeorefSemantic::test_no_transform_survives_round_trip does not assert dtype on the read DataArray. Added a da1.dtype == np.float32 check so the array equality cannot accidentally pass via broadcasting. Fixed in df2273a.

What looks good

  • Module docstring states the two invariants explicitly and the deferred cases so a reader knows what is and is not promised.
  • Fixed-point convergence (_assert_fixed_point) is checked per case, which is the missing piece in the per-incident files.
  • The int-with-declared-nodata case is documented as drift rather than asserted-clean, which keeps the test honest while geotiff: split declared nodata from masked-data state in DataArray attrs #1988's masked / declared split is pending.
  • Filenames carry the _1986 suffix and live under tmp_path, so parallel test runs and other worktrees will not collide.

Checklist

  • Algorithm matches reference/paper -- N/A, test-only.
  • All implemented backends produce consistent results -- N/A, no backend dispatch changed; the reader/writer paths under test are numpy-only.
  • NaN handling is correct -- np.isnan mask, no == np.nan.
  • Edge cases are covered by tests -- multiple dtypes, NaN-as-data, no-georef, PixelIsPoint, default PixelIsArea absent.
  • Dask chunk boundaries handled correctly -- N/A, no dask in this slice.
  • No premature materialization or unnecessary copies -- N/A.
  • Benchmark exists or is not needed -- not needed; contract tests.
  • README feature matrix updated (if applicable) -- N/A.
  • Docstrings present and accurate -- module and class docstrings explain the invariant per case.

…1986)

Review follow-up for #2007. The PixelIsArea default test asserted
``da.attrs.get('raster_type') is None``, which passes both when the
key is absent and when it is present with value None. The canonical
signal for "default raster type" is absence -- mirror the
``'nodata' not in attrs`` style used in the int and multiband cases
so the contract is uniform across the module.
@brendancol brendancol merged commit c448ae6 into main May 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant