Skip to content

geotiff: ambiguous-metadata validator framework (#1987 PR 0)#2006

Merged
brendancol merged 4 commits into
mainfrom
1987-pr0-validation-layer
May 18, 2026
Merged

geotiff: ambiguous-metadata validator framework (#1987 PR 0)#2006
brendancol merged 4 commits into
mainfrom
1987-pr0-validation-layer

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

Lands the foundation for issue #1987 PRs 1-7. Subsequent per-case PRs plug a check into this framework rather than scattering raise sites across the read/write entry points.

  • New xrspatial/geotiff/_errors.py: typed error hierarchy. GeoTIFFAmbiguousMetadataError subclasses ValueError; seven case-specific subclasses (InvalidCRSCodeError, UnparseableCRSError, RotatedTransformError, NonUniformCoordsError, MixedBandMetadataError, ConflictingCRSError, ConflictingNodataError).
  • _validation.py: adds validate_read_metadata and validate_write_metadata hooks plus a register/unregister API. Hooks are no-ops until a per-case PR registers a check, so this PR changes no behaviour at any entry point.
  • New test file test_ambiguous_metadata_hooks_1987.py covers the hierarchy, the no-op-when-empty guarantee, idempotent registration, dispatch order, error propagation, short-circuit on raise, and tolerant unregister.

This matches PR 0's scope per the issue's implementation plan: "no raises yet -- the validator is opt-in per case." Call sites land in the per-case PRs where the relevant context is naturally available.

Test plan

  • pytest xrspatial/geotiff/tests/test_ambiguous_metadata_hooks_1987.py (20 passed)
  • pytest xrspatial/geotiff/tests/test_crs_fail_closed_1929.py xrspatial/geotiff/tests/test_coord_regularity_1720.py xrspatial/geotiff/tests/test_rotated_transform_attr_1764.py (54 passed)
  • CI

Refs #1987.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 17, 2026
Adds the foundation for issue #1987 PRs 1-7. Each per-case PR
plugs its check into this framework rather than scattering custom
raise sites across the read/write entry points.

- New `xrspatial/geotiff/_errors.py` defines the typed error
  hierarchy: `GeoTIFFAmbiguousMetadataError` (subclass of
  `ValueError`) plus seven subclasses for invalid CRS codes,
  unparseable CRS strings, rotated transforms, non-uniform coords,
  mixed band metadata, conflicting `crs`/`crs_wkt`, and conflicting
  nodata aliases. `except ValueError` and `except GeoTIFFAmbiguous-
  MetadataError` and `except <Specific>` all work as expected.

- `_validation.py` gains `validate_read_metadata` and
  `validate_write_metadata` hooks plus a register/unregister API.
  Hooks are no-ops until a per-case PR registers a check, so PR 0
  changes no behaviour at any existing entry point. Subsequent
  PRs add the call sites where the relevant context is available.

- New test file exercises the hierarchy (catchable by base and by
  ValueError, siblings isolated), the no-op-when-empty guarantee,
  idempotent registration, dispatch ordering, error propagation,
  short-circuit on raise, and tolerant unregister.

The PR plan in issue #1987 explicitly carves PR 0 as "no raises
yet -- the validator is opt-in per case"; this commit matches
that scope. Existing fail-closed tests still pass.
Review follow-ups on PR 2006:

- Add an autouse fixture that snapshots the process-global check
  registries before each test and restores them in teardown. A
  test that registers a check and crashes before its `try/finally
  unregister` would otherwise leak a stale callable into later
  tests. The fixture makes the file robust to per-test mistakes
  and to any future tests added without the manual cleanup
  boilerplate.

- Add symmetric write-side coverage: registration-order dispatch
  and short-circuit-on-raise mirror the read-side tests. The two
  hooks share no code so the symmetry is a behavioural guarantee
  worth pinning.

- Add a cross-registry independence test: a read-side check must
  not fire from the write hook (and vice versa). The hooks read
  separate module-level lists, but a future refactor merging them
  would silently break this contract.

No behaviour change in the framework itself; only test hardening.
Second-pass review follow-up on PR 2006.

`validate_read_metadata` and `validate_write_metadata` used to iterate
the live registry list. A check that registers or unregisters another
callable during dispatch (deliberately, via an import side effect, or
by self-unregistering after a one-shot trigger) would reshape the list
mid-loop and either skip a sibling or visit one twice. Both hooks now
iterate over a `tuple(...)` snapshot; the cost is one tuple per
dispatch, paid only when at least one check is registered.

Adds two regression tests (read and write) that self-unregister from a
check and assert the next registered check still runs. They would
fail on the previous live-list iteration.

No behaviour change at any entry point; PR 0 still wires nothing.
Second-pass review follow-up on PR 2006.

The user-facing ambiguity error hierarchy lived in
`xrspatial.geotiff._errors`. The leading underscore marks the module
as private, so callers writing `except GeoTIFFAmbiguousMetadataError`
were coupled to an implementation-detail import path. The neighbouring
user-facing errors and warnings (`UnsafeURLError`,
`GeoTIFFFallbackWarning`) are re-exported from `xrspatial.geotiff`;
the #1987 family now follows the same convention.

Adds `GeoTIFFAmbiguousMetadataError` plus the seven case subclasses
to `__all__`, updates the frozen-public-API guard test to pin the
new names, and adds a regression test that imports each one from
the public namespace.

No raise-site change; per-case PRs still wire the checks. This is a
public-API surface addition only.
@brendancol brendancol force-pushed the 1987-pr0-validation-layer branch from bb3656e to 73bbdbd Compare May 18, 2026 01:50

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm deciding to merge this but may drop before 1.0.0 release

@brendancol brendancol merged commit cd454da into main May 18, 2026
5 checks passed
brendancol added a commit that referenced this pull request May 18, 2026
… 1) (#2021)

First fail-closed slice for issue #1987. The framework PR (#2006)
landed the error hierarchy and the validator-hook registry; this PR
wires the first check.

Behaviour today
---------------
``to_geotiff`` / ``_write_vrt_tiled`` / ``write_geotiff_gpu`` all
consult ``data.attrs.get('crs')`` first and fall back to
``data.attrs.get('crs_wkt')`` only when the EPSG attr is missing.
When the two attrs encode different CRSes the writer silently emits
the EPSG and drops the WKT, producing a TIFF whose on-disk CRS does
not match what the DataArray advertised.

Change
------
* ``_validation.py`` gains ``_check_write_conflicting_crs``: when
  both attrs are populated and neither is suppressed by an explicit
  ``crs=`` kwarg, the check canonicalises each via pyproj and raises
  ``ConflictingCRSError`` if they disagree. ``ConflictingCRSError``
  is exported from ``xrspatial.geotiff`` already (the framework PR
  added it to ``_errors.py``).
* The check is registered at module load via
  ``register_write_metadata_check``. Tests that need to scope around
  it can call ``unregister_write_metadata_check`` in a fixture.
* The three write entry points (``to_geotiff``,
  ``_write_vrt_tiled``, ``write_geotiff_gpu``) now call
  ``validate_write_metadata`` with a context dict carrying
  ``crs_kwarg``, ``attrs_crs``, and ``attrs_crs_wkt``.

Soft preconditions kept lenient
-------------------------------
* No pyproj installed -> no-op (the writer's own WKT-fallback path
  decides what happens; pyproj is otherwise listed in setup.cfg).
* Either attr unparseable -> no-op (sibling
  ``UnparseableCRSError`` PR will catch those).
* Caller passes ``crs=`` explicitly -> short-circuit, since the
  kwarg overrides both attrs for this write.

Round-trip safety
-----------------
The reader emits ``attrs['crs']`` and ``attrs['crs_wkt']`` whenever
the file has a CRS, so a typical ``open_geotiff -> to_geotiff`` pair
hits the check with both attrs set. They derive from the same on-disk
CRS, so pyproj equality passes and the write proceeds. The
end-to-end ``test_read_back_roundtrip_does_not_raise`` pins this.

Tests
-----
14 new tests in ``test_conflicting_crs_write_1987.py`` cover:
* registration in the write-side hook registry
* exception-class hierarchy (ValueError + GeoTIFFAmbiguousMetadataError)
* disagreement raises
* agreement passes (EPSG int + WKT, EPSG string + WKT)
* only-one-attr-set cases pass
* explicit ``crs=`` kwarg bypasses the check
* ``crs=0`` corner does not mask the existing kwarg validation
* unparseable WKT defers to the (future) ``UnparseableCRSError``
* end-to-end read-back round-trip

Full geotiff test suite passes (2993 tests, no regressions).
@brendancol brendancol deleted the 1987-pr0-validation-layer branch May 27, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant