Skip to content

Commit 2d9f54c

Browse files
committed
geotiff: reject writes whose crs and crs_wkt attrs disagree (#1987 PR 1)
First fail-closed slice for issue #1987. The framework PR (#2006) landed the error hierarchy and the validator-hook registry; this PR wires the first check. Behaviour today --------------- ``to_geotiff`` / ``_write_vrt_tiled`` / ``write_geotiff_gpu`` all consult ``data.attrs.get('crs')`` first and fall back to ``data.attrs.get('crs_wkt')`` only when the EPSG attr is missing. When the two attrs encode different CRSes the writer silently emits the EPSG and drops the WKT, producing a TIFF whose on-disk CRS does not match what the DataArray advertised. Change ------ * ``_validation.py`` gains ``_check_write_conflicting_crs``: when both attrs are populated and neither is suppressed by an explicit ``crs=`` kwarg, the check canonicalises each via pyproj and raises ``ConflictingCRSError`` if they disagree. ``ConflictingCRSError`` is exported from ``xrspatial.geotiff`` already (the framework PR added it to ``_errors.py``). * The check is registered at module load via ``register_write_metadata_check``. Tests that need to scope around it can call ``unregister_write_metadata_check`` in a fixture. * The three write entry points (``to_geotiff``, ``_write_vrt_tiled``, ``write_geotiff_gpu``) now call ``validate_write_metadata`` with a context dict carrying ``crs_kwarg``, ``attrs_crs``, and ``attrs_crs_wkt``. Soft preconditions kept lenient ------------------------------- * No pyproj installed -> no-op (the writer's own WKT-fallback path decides what happens; pyproj is otherwise listed in setup.cfg). * Either attr unparseable -> no-op (sibling ``UnparseableCRSError`` PR will catch those). * Caller passes ``crs=`` explicitly -> short-circuit, since the kwarg overrides both attrs for this write. Round-trip safety ----------------- The reader emits ``attrs['crs']`` and ``attrs['crs_wkt']`` whenever the file has a CRS, so a typical ``open_geotiff -> to_geotiff`` pair hits the check with both attrs set. They derive from the same on-disk CRS, so pyproj equality passes and the write proceeds. The end-to-end ``test_read_back_roundtrip_does_not_raise`` pins this. Tests ----- 14 new tests in ``test_conflicting_crs_write_1987.py`` cover: * registration in the write-side hook registry * exception-class hierarchy (ValueError + GeoTIFFAmbiguousMetadataError) * disagreement raises * agreement passes (EPSG int + WKT, EPSG string + WKT) * only-one-attr-set cases pass * explicit ``crs=`` kwarg bypasses the check * ``crs=0`` corner does not mask the existing kwarg validation * unparseable WKT defers to the (future) ``UnparseableCRSError`` * end-to-end read-back round-trip Full geotiff test suite passes (2993 tests, no regressions).
1 parent d1bd260 commit 2d9f54c

5 files changed

Lines changed: 381 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- Decode TIFF predictor=3 un-transpose by file byte order so big-endian floating-point TIFFs read back exactly
1313
- Default internal `_vrt.read_vrt` `missing_sources` to `'raise'` so an unreadable VRT source no longer produces a silent zero-fill hole on integer rasters; pass `missing_sources='warn'` to opt back into the previous lenient behaviour (#1843)
1414
- Deprecate read-side emission of matplotlib colormap-derived attrs (cmap, colormap_rgba) on palette TIFFs; the writer cannot set Photometric=3 so they do not round-trip. Construct ListedColormap from attrs['colormap'] in caller code. These attrs still emit for now but trigger a DeprecationWarning. Removal planned for a future release. (#1984)
15+
- Reject writes whose `attrs['crs']` and `attrs['crs_wkt']` resolve to different CRSes (after pyproj canonicalisation) instead of silently emitting the EPSG and dropping the WKT. The new `ConflictingCRSError` (subclass of `ValueError` and `GeoTIFFAmbiguousMetadataError`) names the offending attrs; pass `crs=` explicitly to override both attrs and bypass the check. Read-back DataArrays carrying both attrs continue to round-trip because the reader's two attrs derive from the same on-disk CRS. (#1987)
1516

1617

1718
### Version 0.9.9 - 2026-05-05

xrspatial/geotiff/_validation.py

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import numpy as np
2727

2828
from ._coords import _BAND_DIM_NAMES
29+
from ._errors import ConflictingCRSError
2930
from ._runtime import _TIME_DIM_NAMES, _X_DIM_NAMES, _Y_DIM_NAMES
3031

3132

@@ -433,3 +434,83 @@ def validate_write_metadata(context: Mapping[str, Any] | None = None) -> None:
433434
# Snapshot for the same reason as the read hook above.
434435
for check in tuple(_WRITE_METADATA_CHECKS):
435436
check(ctx)
437+
438+
439+
# ---------------------------------------------------------------------------
440+
# Conflicting crs / crs_wkt write check (issue #1987 PR 1)
441+
# ---------------------------------------------------------------------------
442+
443+
444+
def _check_write_conflicting_crs(context: Mapping[str, Any]) -> None:
445+
"""Refuse writes whose ``attrs['crs']`` and ``attrs['crs_wkt']`` disagree.
446+
447+
The legacy writer consults both attrs (``crs`` takes priority; the
448+
WKT is a fallback). When the two encode different CRSes, the writer
449+
silently emits the EPSG and drops the WKT, producing a TIFF whose
450+
on-disk CRS does not match what the caller's DataArray advertised.
451+
452+
This check runs whenever both attrs are populated. It canonicalises
453+
each via :mod:`pyproj` and raises :class:`ConflictingCRSError` if
454+
they do not match. A read-back DataArray that legitimately carries
455+
both attrs (the reader emits both when the file has a CRS) passes
456+
because the two derive from the same on-disk CRS.
457+
458+
Soft preconditions kept lenient:
459+
460+
* ``pyproj`` not installed -> no-op (downstream paths handle the
461+
missing dependency).
462+
* Either attr unparseable -> no-op (a sibling check in this issue's
463+
series, :class:`UnparseableCRSError`, will refuse those on its
464+
own PR).
465+
466+
Context keys consumed:
467+
468+
* ``crs_kwarg`` -- the value of the explicit ``crs=`` writer kwarg.
469+
When this is not ``None`` the kwarg overrides both attrs, so the
470+
attrs disagreement does not affect this write and the check
471+
short-circuits.
472+
* ``attrs_crs`` -- the value of ``data.attrs.get('crs')``.
473+
* ``attrs_crs_wkt`` -- the value of ``data.attrs.get('crs_wkt')``.
474+
475+
All keys are optional; absence is treated as "nothing to compare".
476+
"""
477+
if context.get('crs_kwarg') is not None:
478+
return
479+
crs_attr = context.get('attrs_crs')
480+
crs_wkt_attr = context.get('attrs_crs_wkt')
481+
if crs_attr is None or not crs_wkt_attr:
482+
return
483+
try:
484+
from pyproj import CRS as _PyProjCRS
485+
except ImportError:
486+
return
487+
try:
488+
crs_from_attr = _PyProjCRS.from_user_input(crs_attr)
489+
crs_from_wkt = _PyProjCRS.from_wkt(crs_wkt_attr)
490+
except Exception:
491+
return
492+
if crs_from_attr.equals(crs_from_wkt):
493+
return
494+
# Truncate the WKT in the message; full WKTs are 1-3 kB and would
495+
# make the error unreadable. The user's attrs still carry the full
496+
# string so they can introspect it directly.
497+
wkt_excerpt = crs_wkt_attr if len(crs_wkt_attr) <= 60 else (
498+
crs_wkt_attr[:57] + '...'
499+
)
500+
raise ConflictingCRSError(
501+
f"attrs['crs']={crs_attr!r} and attrs['crs_wkt']={wkt_excerpt!r} "
502+
f"resolve to different CRSes after pyproj canonicalisation. The "
503+
f"writer would silently pick attrs['crs'] and drop the WKT, so "
504+
f"the on-disk CRS would not match what the DataArray advertises. "
505+
f"Reconcile the two attrs (drop the stale one, or pass the "
506+
f"intended CRS explicitly via the ``crs=`` kwarg, which overrides "
507+
f"both attrs) and retry. See issue #1987."
508+
)
509+
510+
511+
# Active by default: the conflicting-CRS write check is registered at
512+
# import time so every entry point that calls ``validate_write_metadata``
513+
# enforces the rule. Tests that need to scope around the check should
514+
# use ``unregister_write_metadata_check(_check_write_conflicting_crs)``
515+
# in a fixture rather than mutating the registry directly.
516+
register_write_metadata_check(_check_write_conflicting_crs)

xrspatial/geotiff/_writers/eager.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
_validate_3d_writer_dims,
4545
_validate_nodata_arg,
4646
_validate_tile_size_arg,
47+
validate_write_metadata,
4748
)
4849
from .._writer import write
4950
from .gpu import write_geotiff_gpu
@@ -270,6 +271,18 @@ def to_geotiff(data: xr.DataArray | np.ndarray,
270271

271272
_validate_nodata_arg(nodata)
272273

274+
# Issue #1987 ambiguous-metadata checks. Today only the
275+
# conflicting-crs/crs_wkt write check is registered; other cases
276+
# land in follow-up PRs. The hook is a no-op when no check is
277+
# registered, so this call is safe even if every check is later
278+
# unregistered for a specific entry point.
279+
_attrs = getattr(data, 'attrs', None) or {}
280+
validate_write_metadata({
281+
'crs_kwarg': crs,
282+
'attrs_crs': _attrs.get('crs'),
283+
'attrs_crs_wkt': _attrs.get('crs_wkt'),
284+
})
285+
273286
# Up-front validation: catch bad compression names before they reach
274287
# any of the deeper write paths (streaming, GPU, VRT, COG) where the
275288
# error surfaces from _compression_tag with a less obvious traceback.
@@ -785,6 +798,17 @@ def _write_vrt_tiled(data, vrt_path, *, crs=None, nodata=None,
785798
full array in RAM.
786799
"""
787800
_validate_nodata_arg(nodata)
801+
802+
# Issue #1987 ambiguous-metadata checks; mirrors the call in
803+
# ``to_geotiff`` so the dask-VRT write path enforces the same
804+
# crs/crs_wkt consistency rule.
805+
_attrs = getattr(data, 'attrs', None) or {}
806+
validate_write_metadata({
807+
'crs_kwarg': crs,
808+
'attrs_crs': _attrs.get('crs'),
809+
'attrs_crs_wkt': _attrs.get('crs_wkt'),
810+
})
811+
788812
# Validate compression_level against codec-specific range
789813
if compression_level is not None:
790814
level_range = _LEVEL_RANGES.get(compression.lower())

xrspatial/geotiff/_writers/gpu.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
_validate_3d_writer_dims,
3030
_validate_nodata_arg,
3131
_validate_tile_size_arg,
32+
validate_write_metadata,
3233
)
3334

3435

@@ -261,6 +262,15 @@ def write_geotiff_gpu(data: xr.DataArray | cupy.ndarray | np.ndarray,
261262
# keep parity with the public to_geotiff entry point.
262263
_validate_tile_size_arg(tile_size)
263264
_validate_nodata_arg(nodata)
265+
266+
# Issue #1987 ambiguous-metadata checks; mirrors ``to_geotiff`` so the
267+
# GPU writer enforces the same crs/crs_wkt consistency rule.
268+
_attrs = getattr(data, 'attrs', None) or {}
269+
validate_write_metadata({
270+
'crs_kwarg': crs,
271+
'attrs_crs': _attrs.get('crs'),
272+
'attrs_crs_wkt': _attrs.get('crs_wkt'),
273+
})
264274
if max_z_error < 0:
265275
raise ValueError(
266276
f"max_z_error must be >= 0, got {max_z_error}")

0 commit comments

Comments
 (0)