Skip to content

Commit 6de439d

Browse files
committed
geotiff: add attrs pass-through locking test (#1984)
Pin the current best-effort round-trip behaviour of every pass-through attr key so a later PR can decide which to promote to canonical. Reconstructible on round-trip (writer emits a TIFF tag the reader rebuilds the attr from): - image_description (tag 270, ImageDescription) - extra_samples (tag 338, ExtraSamples) - colormap (tag 320, raw uint16 ColorMap triples) Dropped on round-trip (writer never emits the GeoKey, reader has nothing to rebuild from -- build_geo_tags only writes the EPSG code into GeographicType / ProjectedCSType): - crs_name - geog_citation - datum_code - angular_units - linear_units - semi_major_axis - inv_flattening - projection_code - vertical_crs - vertical_citation - vertical_units - colormap_rgba (reader gates on Photometric == 3) - cmap (reader gates on Photometric == 3) Also pin two cross-cutting invariants: - pass-through keys are absent in attrs when the source file has no CRS at all - setting a pass-through key on a DataArray without crs / crs_wkt does not cause the writer to synthesise a CRS PR 6 of 7 on the attrs-contract roadmap. Tests only -- no production code touched.
1 parent 0570518 commit 6de439d

1 file changed

Lines changed: 243 additions & 0 deletions

File tree

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
"""Locking test for the best-effort pass-through tier of the attrs contract.
2+
3+
Issue #1984, PR 6 of 7.
4+
5+
The attrs contract has three tiers:
6+
7+
1. Canonical: writers consume the attr; round-trip is guaranteed.
8+
2. Pass-through: writers do not consume the attr. The reader rebuilds
9+
the value from the GeoKey directory (or another TIFF tag) on read.
10+
Round-trip is best-effort: it works only when the writer happens to
11+
emit a tag the reader can rebuild the attr from.
12+
3. Ignored: writer never touches; attr is dropped silently.
13+
14+
This file pins the *current* behaviour of every key in the pass-through
15+
tier so future writer changes have to decide whether to promote a key to
16+
canonical or to keep it dropping. The split between "reconstructible"
17+
and "dropped on round-trip" is captured in the parametrisation below and
18+
mirrored in PR 6's body so the next PR (canonical promotion) has a
19+
shopping list.
20+
21+
Current state of every pass-through key, as locked here:
22+
23+
* Reconstructible (writer puts a TIFF tag the reader rebuilds from):
24+
- ``image_description`` -> tag 270 (ImageDescription)
25+
- ``extra_samples`` -> tag 338 (ExtraSamples)
26+
- ``colormap`` -> tag 320 (ColorMap, raw uint16 triples)
27+
28+
* Dropped on round-trip (writer never emits the GeoKey, reader has
29+
nothing to rebuild from):
30+
- ``crs_name`` (GTCitationGeoKey / ProjCitationGeoKey)
31+
- ``geog_citation`` (GeogCitationGeoKey)
32+
- ``datum_code`` (GeogGeodeticDatumGeoKey)
33+
- ``angular_units`` (GeogAngularUnitsGeoKey)
34+
- ``linear_units`` (ProjLinearUnitsGeoKey)
35+
- ``semi_major_axis`` (GeogSemiMajorAxisGeoKey)
36+
- ``inv_flattening`` (GeogInvFlatteningGeoKey)
37+
- ``projection_code`` (ProjectionGeoKey)
38+
- ``vertical_crs`` (VerticalCSTypeGeoKey)
39+
- ``vertical_citation`` (VerticalCitationGeoKey)
40+
- ``vertical_units`` (VerticalUnitsGeoKey)
41+
- ``colormap_rgba`` (only set on read when Photometric == 3)
42+
- ``cmap`` (matplotlib ListedColormap; same gate)
43+
"""
44+
from __future__ import annotations
45+
46+
import warnings
47+
48+
import numpy as np
49+
import pytest
50+
import xarray as xr
51+
52+
from xrspatial.geotiff import open_geotiff, to_geotiff
53+
54+
55+
# Full set of pass-through keys defined by the contract.
56+
_ALL_PASSTHROUGH_KEYS = (
57+
'crs_name',
58+
'geog_citation',
59+
'datum_code',
60+
'angular_units',
61+
'linear_units',
62+
'semi_major_axis',
63+
'inv_flattening',
64+
'projection_code',
65+
'vertical_crs',
66+
'vertical_citation',
67+
'vertical_units',
68+
'image_description',
69+
'extra_samples',
70+
'colormap',
71+
'colormap_rgba',
72+
'cmap',
73+
)
74+
75+
76+
def _make_da(crs=None, attrs=None, shape=(4, 4), dtype=np.float32):
77+
"""Build a minimal georeferenced DataArray with optional CRS and attrs."""
78+
data = np.ones(shape, dtype=dtype)
79+
h, w = shape
80+
coords = {
81+
'y': np.arange(h, 0, -1, dtype=np.float64),
82+
'x': np.arange(w, dtype=np.float64),
83+
}
84+
a = dict(attrs) if attrs else {}
85+
if crs is not None:
86+
a['crs'] = crs
87+
return xr.DataArray(data, dims=('y', 'x'), coords=coords, attrs=a)
88+
89+
90+
def _roundtrip(tmp_path, da, name='roundtrip.tif'):
91+
"""Write ``da`` to ``tmp_path/name`` and read it back."""
92+
path = str(tmp_path / name)
93+
to_geotiff(da, path)
94+
return open_geotiff(path)
95+
96+
97+
# (key, crs_to_use, value_set_on_write_or_None, expected_outcome)
98+
#
99+
# ``crs_to_use``: 4326 for geographic GeoKey-derived keys, 32633 for
100+
# projected ones. The CRS pins which GeoKeys the writer would emit if it
101+
# emitted any; today it emits only the EPSG code, but the test brackets
102+
# both branches anyway.
103+
#
104+
# ``value_set_on_write``: the value the test sets on ``da.attrs`` before
105+
# write. ``None`` means "do not set this key" (the test only needs the
106+
# CRS to be present so any reconstruction would have a path to fire).
107+
#
108+
# ``expected``: one of ``'reconstructible'`` or ``'dropped'``.
109+
# - reconstructible: key must be present in read-back attrs AND, when a
110+
# value was supplied, must equal that value.
111+
# - dropped: key must be absent from read-back attrs.
112+
_PASSTHROUGH_CASES = [
113+
# GeoKey-derived geographic CRS attrs -- writer emits only EPSG, so
114+
# the reader has no GeoKey to reconstruct these from.
115+
('crs_name', 4326, 'WGS 84', 'dropped'),
116+
('geog_citation', 4326, 'WGS 84', 'dropped'),
117+
('datum_code', 4326, 6326, 'dropped'),
118+
('angular_units', 4326, 'degree', 'dropped'),
119+
('semi_major_axis', 4326, 6378137.0, 'dropped'),
120+
('inv_flattening', 4326, 298.257223563, 'dropped'),
121+
# GeoKey-derived projected CRS attrs.
122+
('linear_units', 32633, 'metre', 'dropped'),
123+
('projection_code', 32633, 16033, 'dropped'),
124+
# Vertical CRS attrs -- writer never emits the vertical GeoKey block.
125+
('vertical_crs', 4326, 5703, 'dropped'),
126+
('vertical_citation', 4326, 'NAVD88', 'dropped'),
127+
('vertical_units', 4326, 'metre', 'dropped'),
128+
# Non-GeoKey tag passthroughs. The writer folds these into extra_tags
129+
# via _merge_friendly_extra_tags, so the reader can rebuild them.
130+
('image_description', 4326, 'pr1984 fixture', 'reconstructible'),
131+
('extra_samples', 4326, (1,), 'reconstructible'),
132+
# ``colormap`` round-trips as the raw uint16 triple list; the
133+
# derived ``colormap_rgba`` / ``cmap`` attrs are only emitted when
134+
# Photometric == 3 on read, which the writer does not set just
135+
# because attrs carries a colormap.
136+
('colormap', 4326, tuple([0] * 256 + [128] * 256 + [255] * 256),
137+
'reconstructible'),
138+
('colormap_rgba', 4326, None, 'dropped'),
139+
('cmap', 4326, None, 'dropped'),
140+
]
141+
142+
143+
@pytest.mark.parametrize(
144+
'key,crs,value,expected',
145+
_PASSTHROUGH_CASES,
146+
ids=[c[0] for c in _PASSTHROUGH_CASES],
147+
)
148+
def test_passthrough_key_roundtrip(tmp_path, key, crs, value, expected):
149+
"""Lock per-key round-trip outcome for the pass-through attr tier."""
150+
attrs = {}
151+
if value is not None:
152+
attrs[key] = value
153+
# For colormap-derived keys we still need an attrs payload that
154+
# actually puts a colormap tag on disk; setting ``colormap`` itself
155+
# does that, so probe ``colormap_rgba`` / ``cmap`` via a separate
156+
# write that does NOT include a colormap. The contract for those two
157+
# keys is "absent on round-trip unless Photometric=3", and the
158+
# writer never selects Photometric=3 from attrs alone.
159+
da = _make_da(crs=crs, attrs=attrs)
160+
# Single-band uint8 needed for the colormap tag to be valid in TIFF.
161+
if key == 'colormap':
162+
da = _make_da(crs=crs, attrs=attrs, dtype=np.uint8)
163+
164+
rd = _roundtrip(tmp_path, da, name=f'{key}.tif')
165+
166+
if expected == 'reconstructible':
167+
assert key in rd.attrs, (
168+
f"pass-through key {key!r} was expected to round-trip but is "
169+
f"absent. attrs keys present: {sorted(rd.attrs.keys())}"
170+
)
171+
if value is not None:
172+
got = rd.attrs[key]
173+
if isinstance(value, tuple):
174+
assert tuple(got) == value, (
175+
f"{key!r}: value mismatch on round-trip\n"
176+
f" written: {value}\n"
177+
f" read : {got}"
178+
)
179+
else:
180+
assert got == value, (
181+
f"{key!r}: value mismatch on round-trip\n"
182+
f" written: {value!r}\n"
183+
f" read : {got!r}"
184+
)
185+
else: # 'dropped'
186+
assert key not in rd.attrs, (
187+
f"pass-through key {key!r} was expected to drop on round-trip "
188+
f"(writer does not emit a tag the reader can rebuild it from) "
189+
f"but it is present with value {rd.attrs[key]!r}. If a writer "
190+
f"change started emitting this key, decide whether to promote "
191+
f"the key to canonical (issue #1984) and update this test."
192+
)
193+
194+
195+
def test_passthrough_dropped_when_no_crs(tmp_path):
196+
"""Files without a CRS do not surface any pass-through attrs."""
197+
da = _make_da(crs=None)
198+
rd = _roundtrip(tmp_path, da, name='no_crs.tif')
199+
200+
present = sorted(k for k in _ALL_PASSTHROUGH_KEYS if k in rd.attrs)
201+
assert present == [], (
202+
f"pass-through keys leaked into a no-CRS round-trip: {present}. "
203+
f"All keys present: {sorted(rd.attrs.keys())}"
204+
)
205+
# Sanity: no CRS attrs either.
206+
assert 'crs' not in rd.attrs
207+
assert 'crs_wkt' not in rd.attrs
208+
209+
210+
def test_passthrough_does_not_promote_to_canonical(tmp_path):
211+
"""Setting pass-through attrs without a CRS must not inject a CRS."""
212+
# Mix of GeoKey-derived and tag-derived pass-through keys, but no
213+
# ``crs`` / ``crs_wkt``. If the writer ever started inferring a CRS
214+
# from these (e.g. picking 4326 because angular_units == 'degree')
215+
# this test would fail.
216+
attrs = {
217+
'crs_name': 'WGS 84',
218+
'geog_citation': 'WGS 84',
219+
'angular_units': 'degree',
220+
'linear_units': 'metre',
221+
'semi_major_axis': 6378137.0,
222+
'inv_flattening': 298.257223563,
223+
'datum_code': 6326,
224+
}
225+
da = _make_da(crs=None, attrs=attrs)
226+
227+
with warnings.catch_warnings():
228+
# The writer warns on user-defined-CRS WKT writes; we are not on
229+
# that path here, but suppress generously so a future warning
230+
# tweak does not turn this test into a warning regression test.
231+
warnings.simplefilter('ignore')
232+
rd = _roundtrip(tmp_path, da, name='no_crs_with_attrs.tif')
233+
234+
assert 'crs' not in rd.attrs, (
235+
f"pass-through attrs caused the writer to synthesise a CRS: "
236+
f"crs={rd.attrs.get('crs')!r}. The contract says pass-through "
237+
f"attrs are advisory only; the writer must rely on attrs['crs'] "
238+
f"or attrs['crs_wkt'] to emit georeferencing."
239+
)
240+
assert 'crs_wkt' not in rd.attrs, (
241+
f"pass-through attrs caused the writer to synthesise a CRS WKT: "
242+
f"crs_wkt={rd.attrs.get('crs_wkt')!r}."
243+
)

0 commit comments

Comments
 (0)