Skip to content

Commit b883a38

Browse files
authored
geotiff: guard streaming writer against MinIsWhite extra_tags override (#2080)
* geotiff: guard streaming writer against TAG_PHOTOMETRIC extra_tags override (#2073) The eager writer rejects extra_tags overrides that would require writer-side pixel inversion for single-band MinIsWhite. The streaming dask path lacked the same guard, so an override silently bypassed the inversion and the file round-tripped with inverted pixel values. * geotiff: consolidate photometric-override guard + add multi-band pin (#2073) Address PR #2080 review: * Factor the duplicated TAG_PHOTOMETRIC-override guard out of the eager and streaming writers into ``_reject_disagreeing_photometric_override``. Both writers now call the helper, so the message text, the gating condition, and the single-band scope live in one place. * Add ``test_streaming_extra_tags_miniswhite_override_multiband_not_rejected_2073`` to pin the ``samples == 1`` gate. A regression that dropped or flipped the gate (so the guard fired for multi-band rasters too) would otherwise surface only by accident. * Move the regression test from ``xrspatial/tests/`` to ``xrspatial/geotiff/tests/`` to match every other geotiff regression test's location. Filename also shortened: the ``geotiff_`` prefix was redundant inside the geotiff subtree. The third dismissed nit (``_extra_tags_photo_ds`` naming) goes away with the refactor because the helper takes its own local variable name.
1 parent 4e3abcd commit b883a38

2 files changed

Lines changed: 163 additions & 24 deletions

File tree

xrspatial/geotiff/_writer.py

Lines changed: 56 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,51 @@ def _resolve_photometric(photometric, samples_per_pixel: int):
209209
return photo_int, []
210210

211211

212+
def _reject_disagreeing_photometric_override(
213+
extra_tags, resolved_photo: int, samples: int, photometric
214+
) -> None:
215+
"""Reject an ``extra_tags`` entry that overrides ``TAG_PHOTOMETRIC``
216+
across the MinIsWhite boundary for a single-band raster.
217+
218+
The single-band MinIsWhite path requires the writer to pre-invert
219+
pixels (and the nodata sentinel) so the round-trip matches what the
220+
reader unconditionally inverts. An ``extra_tags`` entry that flips
221+
``TAG_PHOTOMETRIC`` between MinIsWhite (0) and anything else makes
222+
the on-disk tag advertise one model while the bytes were
223+
pre-processed for the other -- the round-trip silently corrupts.
224+
225+
The eager and streaming writers both call this guard before any
226+
pre-inversion runs. Only the MinIsWhite-crossing single-band case
227+
is rejected; multi-band rasters and non-crossing overrides (e.g.
228+
photometric='minisblack' with extra_tags=[(262, SHORT, 1, 1)])
229+
pass through unchanged. Issues #2073 / #1769 / #1836.
230+
"""
231+
if extra_tags is None:
232+
return
233+
override = None
234+
for _et in extra_tags:
235+
if _et[0] == TAG_PHOTOMETRIC:
236+
override = int(_et[3])
237+
break
238+
if override is None:
239+
return
240+
if override == resolved_photo:
241+
return
242+
if not (override == 0 or resolved_photo == 0):
243+
return
244+
if samples != 1:
245+
return
246+
raise ValueError(
247+
f"extra_tags TAG_PHOTOMETRIC override ({override}) "
248+
f"disagrees with photometric={photometric!r} for a "
249+
f"single-band raster where MinIsWhite (photometric=0) "
250+
f"requires writer-side pixel inversion. The override would "
251+
f"either pre-invert pixels for a non-MinIsWhite tag or skip "
252+
f"inversion for a MinIsWhite tag. Pass photometric= directly "
253+
f"instead, or drop the override."
254+
)
255+
256+
212257
# Byte order: always write little-endian
213258
BO = '<'
214259

@@ -1602,30 +1647,9 @@ def write(data: np.ndarray, path: str, *,
16021647
# them to NaN. Issue #1836.
16031648
_samples = data.shape[2] if data.ndim == 3 else 1
16041649
_resolved_photo, _ = _resolve_photometric(photometric, _samples)
1605-
# An ``extra_tags`` entry with TAG_PHOTOMETRIC silently overrides the
1606-
# kwarg-resolved value when the IFD is written (issue #1769). The
1607-
# pre-inversion decision below would otherwise transform pixels for
1608-
# one photometric value while the on-disk tag advertises a different
1609-
# one. Refuse the combination so callers do not end up with corrupt
1610-
# round-trips through the override path.
1611-
_extra_tags_photo = None
1612-
if extra_tags is not None:
1613-
for _et in extra_tags:
1614-
if _et[0] == TAG_PHOTOMETRIC:
1615-
_extra_tags_photo = int(_et[3])
1616-
break
1617-
if (_extra_tags_photo is not None
1618-
and _extra_tags_photo != _resolved_photo
1619-
and (_extra_tags_photo == 0 or _resolved_photo == 0)
1620-
and _samples == 1):
1621-
raise ValueError(
1622-
f"extra_tags TAG_PHOTOMETRIC override ({_extra_tags_photo}) "
1623-
f"disagrees with photometric={photometric!r} for a "
1624-
f"single-band raster where MinIsWhite (photometric=0) "
1625-
f"requires writer-side pixel inversion. The override would "
1626-
f"either pre-invert pixels for a non-MinIsWhite tag or skip "
1627-
f"inversion for a MinIsWhite tag. Pass photometric= directly "
1628-
f"instead, or drop the override.")
1650+
_reject_disagreeing_photometric_override(
1651+
extra_tags, _resolved_photo, _samples, photometric
1652+
)
16291653
if _resolved_photo == 0 and _samples == 1:
16301654
if cog or overview_levels is not None:
16311655
raise NotImplementedError(
@@ -1935,6 +1959,14 @@ def write_streaming(dask_data, path: str, *,
19351959
"match the reader's unconditional MinIsWhite inversion "
19361960
"(issue #1836). Call ``.compute()`` first to use the eager "
19371961
"writer, or write with photometric='minisblack' / 'auto'.")
1962+
# The kwarg guard above only catches photometric='miniswhite'. An
1963+
# ``extra_tags`` entry of ``(TAG_PHOTOMETRIC, ...)`` silently
1964+
# overrides the IFD tag further down, so the writer must reject the
1965+
# MinIsWhite-crossing single-band case the same way the eager
1966+
# writer does. Issue #2073.
1967+
_reject_disagreeing_photometric_override(
1968+
extra_tags, _resolved_photo_ds, samples, photometric
1969+
)
19381970

19391971
# Match the eager path's dtype promotion
19401972
out_dtype = dtype
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
"""Regression tests for issue #2073.
2+
3+
The eager writer rejects an ``extra_tags`` entry that overrides
4+
``TAG_PHOTOMETRIC`` across the MinIsWhite boundary for a single-band
5+
raster because the reader unconditionally inverts MinIsWhite single-band
6+
data and the writer must pre-invert pixels to keep the round-trip honest.
7+
The streaming dask path now shares the same guard via
8+
``_reject_disagreeing_photometric_override`` in ``_writer.py``.
9+
10+
Three pins:
11+
12+
* the MinIsWhite-crossing single-band override is rejected;
13+
* the non-MinIsWhite-crossing override still round-trips;
14+
* multi-band rasters do not trigger the guard (the writer never
15+
pre-inverts there).
16+
"""
17+
from __future__ import annotations
18+
19+
import os
20+
21+
import dask.array as da
22+
import numpy as np
23+
import pytest
24+
import xarray as xr
25+
26+
from xrspatial.geotiff import open_geotiff, to_geotiff
27+
28+
29+
TAG_PHOTOMETRIC = 262
30+
TYPE_SHORT = 3
31+
32+
33+
def test_streaming_extra_tags_miniswhite_override_rejected_2073(tmp_path):
34+
"""Dask write with extra_tags forcing photometric=0 must raise."""
35+
arr = xr.DataArray(
36+
da.from_array(
37+
np.array([[10, 20], [30, 40]], dtype=np.uint8),
38+
chunks=(1, 2),
39+
),
40+
)
41+
arr.attrs['extra_tags'] = [(TAG_PHOTOMETRIC, TYPE_SHORT, 1, 0)]
42+
43+
out = tmp_path / 'tmp_2073_streaming_miniswhite.tif'
44+
with pytest.raises(ValueError) as excinfo:
45+
to_geotiff(arr, str(out))
46+
47+
msg = str(excinfo.value)
48+
assert 'extra_tags' in msg
49+
assert 'photometric' in msg.lower() or 'MinIsWhite' in msg
50+
51+
52+
def test_streaming_extra_tags_minisblack_override_roundtrips_2073(tmp_path):
53+
"""The valid (non-MinIsWhite-crossing) override should still work."""
54+
src = np.array([[10, 20], [30, 40]], dtype=np.uint8)
55+
arr = xr.DataArray(
56+
da.from_array(src, chunks=(1, 2)),
57+
dims=('y', 'x'),
58+
coords={'y': [1.0, 0.0], 'x': [0.0, 1.0]},
59+
)
60+
# photometric=1 (MinIsBlack) matches what the writer picks for a
61+
# single-band raster anyway: no pre-inversion needed, so the guard
62+
# must not fire.
63+
arr.attrs['extra_tags'] = [(TAG_PHOTOMETRIC, TYPE_SHORT, 1, 1)]
64+
65+
out = tmp_path / 'tmp_2073_streaming_minisblack.tif'
66+
to_geotiff(arr, str(out))
67+
assert os.path.exists(out)
68+
69+
back = open_geotiff(str(out))
70+
np.testing.assert_array_equal(np.asarray(back.values), src)
71+
72+
73+
def test_streaming_extra_tags_miniswhite_override_multiband_not_rejected_2073(
74+
tmp_path,
75+
):
76+
"""The guard fires only on single-band rasters.
77+
78+
Multi-band rasters do not pre-invert MinIsWhite, so a
79+
``TAG_PHOTOMETRIC`` override that crosses the MinIsWhite boundary
80+
is not the kind of corruption the guard exists to prevent. Pins
81+
the ``samples == 1`` gate inside
82+
``_reject_disagreeing_photometric_override``: a regression that
83+
dropped or flipped the gate would surface as a spurious
84+
``ValueError`` here.
85+
86+
Whether a 3-band raster tagged MinIsWhite is semantically useful
87+
is a separate concern; this test only locks in the guard's scope.
88+
"""
89+
src = np.zeros((2, 2, 3), dtype=np.uint8)
90+
src[..., 0] = 10
91+
src[..., 1] = 20
92+
src[..., 2] = 30
93+
arr = xr.DataArray(
94+
da.from_array(src, chunks=(2, 2, 3)),
95+
dims=('y', 'x', 'band'),
96+
coords={'y': [1.0, 0.0], 'x': [0.0, 1.0]},
97+
)
98+
arr.attrs['extra_tags'] = [(TAG_PHOTOMETRIC, TYPE_SHORT, 1, 0)]
99+
100+
out = tmp_path / 'tmp_2073_streaming_miniswhite_multiband.tif'
101+
# Must not raise: the writer does not pre-invert multi-band data,
102+
# so the override is not in the "corruption that the guard exists
103+
# to prevent" set. If it raises for an unrelated reason
104+
# (e.g. RGB-requires-3-bands check elsewhere), let the test
105+
# surface that as a real failure rather than swallowing it.
106+
to_geotiff(arr, str(out))
107+
assert os.path.exists(out)

0 commit comments

Comments
 (0)