Skip to content

geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)#2030

Open
brendancol wants to merge 1 commit into
1987-pr1-conflicting-crs-writefrom
1987-pr2to7-remaining-fail-closed
Open

geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)#2030
brendancol wants to merge 1 commit into
1987-pr1-conflicting-crs-writefrom
1987-pr2to7-remaining-fail-closed

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Bundles four of the five remaining #1987 slices: UnparseableCRSError, RotatedTransformError, NonUniformCoordsError, and ConflictingNodataError. ConflictingCRSError already merged via #2021. MixedBandMetadataError is deferred to a follow-up so the ~35 existing VRT test fixtures can migrate to the new band_nodata='first' opt-out in their own commit.

Builds on

This branch was authored on top of 1987-pr1-conflicting-crs-write so the two stay clean together. Easiest if #2021 lands first.

Active checks added

UnparseableCRSError (#1987 PR 2)

  • Write: re-typed the existing _validate_crs_fallback raise from plain ValueError to UnparseableCRSError (subclass; no behaviour change).
  • Read: new _check_read_unparseable_crs that runs pyproj.CRS.from_user_input on crs_wkt. Tolerates EPSG:4326-style placeholders (the GDAL <SRS> convention).
  • Opt-out: allow_unparseable_crs=True kwarg, threaded through open_geotiff / read_geotiff_dask / read_geotiff_gpu / read_vrt.

RotatedTransformError (#1987 PR 3)

  • Read: rejects affine transforms with non-zero b / d terms. Downstream xrspatial ops assume axis-aligned grids and silently produce wrong results on a rotated grid.
  • Opt-out: allow_rotated=True, same threading.

NonUniformCoordsError (#1987 PR 4)

ConflictingNodataError (#1987 PR 7)

  • Write: rejects when attrs['nodata'] disagrees with every concrete entry in attrs['nodatavals']. None and NaN entries in the rioxarray tuple are skipped. NaN canonical + concrete tuple entry also raises (contradictory sentinels).
  • Opt-out: explicit nodata= writer kwarg overrides both attrs.

Shared infrastructure

  • _attrs.py gains _validate_read_geo_info(...), a one-call wrapper used from all four read backends so each backend has a single check call site.
  • Each writer entry point now builds a context dict carrying crs_kwarg / attrs_crs / attrs_crs_wkt / nodata_kwarg / attrs_nodata / attrs_nodatavals / coord_y / coord_x and feeds it to validate_write_metadata.

Deferred: MixedBandMetadataError (#1987 PR 5)

The check function and constants are in _validation.py but the registration is commented out. The band_nodata= kwarg is already threaded through read_vrt / _read_vrt_chunked so the follow-up is a one-line registration plus the test sweep over ~35 fixtures.

Test updates

File Change
test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases Split into a resolver-layer test (canonical still wins at the read chokepoint) and a write-layer test (now asserts the fail-closed raise + the nodata= kwarg opt-out).
test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases Updated to expect ConflictingNodataError plus the kwarg-bypass path.
test_reader_kwarg_order_1935.py New kwargs added to the canonical order; band_nodata allowed in read_vrt's tail; gpu alias moved back to last on read_geotiff_gpu.
test_ambiguous_metadata_hooks_1987.py Framework dispatch tests now use an opaque payload key (_dispatch_probe) instead of the crs_wkt / MALFORMED placeholders the new check would refuse.
test_remaining_fail_closed_1987.py New file. 19 tests across the four active checks plus the read-then-write round-trip.

Test plan

  • pytest xrspatial/geotiff/tests/ -k "not gpu and not cuda" — 3013 passed (+19 new tests vs pre-PR baseline).
  • pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py -v — 19 passed.

Refs #1987.

Bundles four of the five remaining #1987 slices (PRs 2, 3, 4, 7 in the
issue's numbering). PR 6 (``ConflictingCRSError``) landed earlier as
this branch's parent. PR 5 (``MixedBandMetadataError``) is intentionally
deferred to a follow-up that also migrates ~35 existing VRT test
fixtures via the new ``band_nodata='first'`` opt-out kwarg.

Active checks added in this PR
------------------------------

* ``UnparseableCRSError`` (#1987 PR 2):
  - Write: typed the existing ``_validate_crs_fallback`` raise from
    plain ``ValueError`` to ``UnparseableCRSError``. No behaviour
    change (subclass relationship).
  - Read: new ``_check_read_unparseable_crs`` that runs ``pyproj.CRS.
    from_user_input`` on ``crs_wkt`` and raises if pyproj cannot parse.
    Tolerates pyproj-parseable placeholders like ``"EPSG:4326"`` that
    the GDAL VRT ``<SRS>`` convention stashes into ``crs_wkt``.
  - Opt-out: ``allow_unparseable_crs=True`` kwarg, threaded through
    ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
    ``read_vrt``.

* ``RotatedTransformError`` (#1987 PR 3):
  - Read: new ``_check_read_rotated_transform`` that rejects affine
    transforms with non-zero b / d terms (rasterio Affine ordering).
    Downstream xrspatial ops (slope, aspect, hillshade, proximity,
    zonal) assume axis-aligned grids; a rotated transform silently
    produced wrong results.
  - Opt-out: ``allow_rotated=True`` kwarg, same threading as
    ``allow_unparseable_crs``.

* ``NonUniformCoordsError`` (#1987 PR 4):
  - Write: new ``_check_write_non_uniform_coords`` that diffs the
    ``y`` and ``x`` coord arrays against the first step and rejects
    when relative drift exceeds 1e-5 (mirrors the existing #1720
    coord-regularity tolerance). The int-dtype sentinel from #1969 is
    exempted (the no-georef fallback uses 0..N-1 ints which the writer
    treats specially).
  - No new kwarg: the fix is to resample, not to opt out.

* ``ConflictingNodataError`` (#1987 PR 7):
  - Write: new ``_check_write_conflicting_nodata`` that refuses when
    ``attrs['nodata']`` disagrees with every concrete entry in
    ``attrs['nodatavals']``. ``None`` and NaN entries in the rioxarray
    tuple are skipped (same convention as ``_resolve_nodata_attr``).
    NaN as the canonical value paired with a concrete numeric in the
    tuple also raises -- "NaN is the sentinel" and "X is the sentinel"
    contradict.
  - Opt-out: explicit ``nodata=`` writer kwarg overrides both attrs
    and bypasses the check.

Shared infrastructure
---------------------

* ``_attrs.py`` gains ``_validate_read_geo_info(geo_info, *, window,
  allow_rotated, allow_unparseable_crs)`` -- one helper called from
  the four read backends so the check site is uniform.
* Each writer entry point (``to_geotiff``, ``_write_vrt_tiled``,
  ``write_geotiff_gpu``) now builds a context dict carrying
  ``crs_kwarg`` / ``attrs_crs`` / ``attrs_crs_wkt`` /
  ``nodata_kwarg`` / ``attrs_nodata`` / ``attrs_nodatavals`` /
  ``coord_y`` / ``coord_x`` and feeds it to ``validate_write_metadata``.

Deferred: MixedBandMetadataError (#1987 PR 5)
---------------------------------------------

The mixed-band check function and its constants are defined in
``_validation.py`` but the registration call is commented out. The
``band_nodata=`` kwarg is threaded through both ``read_vrt`` and
``_read_vrt_chunked`` so the follow-up PR is a one-line registration
plus the test-fixture migration. About 35 existing VRT tests would
need to opt in to ``band_nodata='first'`` to keep their legacy
assertions, which is its own commit.

Test updates
------------

* ``test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases``
  split into a resolver-layer test (still asserts canonical wins) and a
  write-layer test (now asserts ``ConflictingNodataError``).
* ``test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases``
  updated to expect ``ConflictingNodataError`` on the disagreement and to
  show the ``nodata=`` kwarg opt-out.
* ``test_reader_kwarg_order_1935.py`` updated: the new
  ``allow_rotated`` / ``allow_unparseable_crs`` kwargs join the
  canonical order; ``band_nodata`` goes in ``read_vrt``'s
  ``allowed_tail`` since it is VRT-specific; ``read_geotiff_gpu``'s
  deprecated ``gpu`` alias moved back to the tail.
* ``test_ambiguous_metadata_hooks_1987.py`` framework tests now use an
  opaque ``_dispatch_probe`` payload key instead of ``"crs_wkt":
  "EPSG:4326"`` / ``"MALFORMED"`` placeholders that the newly registered
  ``_check_read_unparseable_crs`` would refuse.
* ``test_remaining_fail_closed_1987.py`` -- 19 new tests covering each
  active check plus the round-trip read-write contract.

Verification
------------

- ``pytest xrspatial/geotiff/tests/ -k 'not gpu and not cuda'`` --
  3013 passed (was 2994 pre-PR, +19 new).
- ``pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py
  -v`` -- 19 passed.

Refs #1987.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant