Skip to content

Commit f79b55d

Browse files
committed
geotiff: inherit nodata + rich-tag set on overview reads (#1739)
`extract_geo_info_with_overview_inheritance` was inheriting only the CRS-side fields from the level-0 IFD. The COG writer also emits GDAL_NODATA, GDAL_METADATA, the resolution tags, and the extra-tag pass-through fields (colormap, image_description, extra_samples) only on level 0, so an `open_geotiff(path, overview_level=N)` for N >= 1 returned a DataArray whose `attrs['nodata']` was None even when the file declared a sentinel. The overview's on-disk pixels still carried that sentinel (writer rewrites NaN to sentinel before reducing), so downstream stats / threshold / plot pipelines silently folded sentinel values into their results. Fix: extend the inheritance helper to fill in the per-IFD pass-through tags whenever the overview IFD lacks its own value. Mirrors the existing CRS-side inheritance pattern (only inherit fields the overview doesn't already carry), so an overview that does re-declare any of these tags keeps its own value. Affects all four backends (numpy, dask+numpy, cupy, dask+cupy) since they all go through this helper. Adds a 4-backend regression test suite that exercises nodata, gdal_metadata, and the resolution tags across overview levels, and a contract test pinning the symmetric diff of the attrs keyset to zero between level 0 and overview reads. Updates one existing test in test_cog_overview_nodata_1613.py that was inadvertently asserting the buggy behaviour (raw sentinel in the overview output instead of NaN); the post-fix expectation matches the level-0 behaviour.
1 parent 7720ea9 commit f79b55d

4 files changed

Lines changed: 409 additions & 19 deletions

File tree

.claude/sweep-metadata-state.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
2-
geotiff,2026-05-12,1657,HIGH,1;5,NewSubfileType (254) and SubIFDs (330) leaked through attrs['extra_tags'] across all four backends. Round-trip read overview -> write produced primary IFD wrongly marked as overview (NewSubfileType=1) and SubIFDs with stale byte offsets that crashed readers. Fixed by adding both tags to _MANAGED_TAGS and to _DANGEROUS_EXTRA_TAG_IDS writer-side filter (#1657).
2+
geotiff,2026-05-12,1739,HIGH,1;4,"COG overview reads dropped attrs['nodata'] from level 0, so the writer-baked sentinel survived as raw data in the overview pixels (silent numerical corruption). extract_geo_info_with_overview_inheritance was inheriting CRS-side fields only; extended to per-IFD pass-through tags (nodata, gdal_metadata*, resolution*, colormap, extra_tags, image_description, extra_samples). All four backends affected (numpy/dask/cupy/dask+cupy). Fixed in #1739."
33
reproject,2026-05-10,1572;1573,HIGH,1;3;4,geoid_height_raster dropped input attrs and used dims[-2:] for 3D inputs (#1572). reproject/merge ignored nodatavals (rasterio convention) when rioxarray absent (#1573). Fixed in same branch.

xrspatial/geotiff/_geotags.py

Lines changed: 77 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -703,24 +703,39 @@ def extract_geo_info_with_overview_inheritance(
703703
"""Extract geo metadata, inheriting from level 0 when the IFD lacks it.
704704
705705
Wraps :func:`extract_geo_info` for overview reads. GDAL-style COG
706-
writers (including this package's :func:`to_geotiff`) put the
707-
GeoKeyDirectory, ModelPixelScale and ModelTiepoint only on the
708-
level-0 IFD. Calling ``extract_geo_info`` directly on an overview
709-
IFD therefore returns a default :class:`GeoTransform` with
710-
``has_georef=False`` and no CRS, so overview reads silently lose
711-
their georeferencing.
706+
writers (including this package's :func:`to_geotiff`) put a handful
707+
of tags only on the level-0 IFD:
712708
713-
When ``ifd`` is a reduced-resolution overview (NewSubfileType bit 0
714-
set) that lacks its own georef, we re-run ``extract_geo_info`` on
715-
the first full-resolution IFD (NewSubfileType bit 0 clear, bit 2
716-
clear) and rescale the pixel size by ``width_full / width_overview``
717-
so coords cover the same extent as level 0.
709+
* GeoKeyDirectory, ModelPixelScale, ModelTiepoint (georef)
710+
* GDAL_NODATA, GDAL_METADATA (per-IFD pass-through tags)
711+
* XResolution, YResolution, ResolutionUnit (resolution tags)
712+
* ColorMap, ImageDescription, ExtraSamples (extra-tag pass-through)
713+
714+
Calling ``extract_geo_info`` directly on an overview IFD therefore
715+
returns a default :class:`GeoTransform` with ``has_georef=False``,
716+
no CRS, and a ``nodata=None`` field, so overview reads silently
717+
lose their georeferencing and their nodata sentinel.
718718
719-
If the overview IFD already carries its own geokeys (some writers do
720-
replicate them), this returns its own ``extract_geo_info`` output
721-
unchanged. If no full-resolution sibling exists or the parent's geo
722-
info is also missing, the overview's own (possibly empty) info is
723-
returned -- callers get the same fallback behaviour they used to.
719+
When ``ifd`` is a reduced-resolution overview (NewSubfileType bit 0
720+
set), we re-run ``extract_geo_info`` on the first full-resolution
721+
IFD (NewSubfileType bit 0 clear, bit 2 clear). Per-IFD pass-through
722+
tags (nodata, GDAL metadata, resolution, colormap, extra tags,
723+
image description, extra samples) are inherited when the overview
724+
lacks its own value, regardless of whether the overview has its own
725+
georef. The transform and CRS-side fields are additionally
726+
inherited when the overview lacks its own georef, with the pixel
727+
size rescaled by ``width_full / width_overview`` so coords cover
728+
the same extent as level 0.
729+
730+
If the overview IFD already carries its own value for a given
731+
field, that value wins -- inheritance is per-field and only fills
732+
in missing entries. If no full-resolution sibling exists, the
733+
overview's own (possibly empty) info is returned -- callers get the
734+
same fallback behaviour they used to.
735+
736+
Inheriting nodata + the rich-tag set fixes #1739 (silent numerical
737+
corruption when reading COG overview pixels because attrs['nodata']
738+
was lost). The georef inheritance is the original fix from #1640.
724739
725740
Parameters
726741
----------
@@ -744,7 +759,7 @@ def extract_geo_info_with_overview_inheritance(
744759
# page IFDs (bit 1) are filtered out by ``select_overview_ifd``
745760
# before reaching here, so we never inherit a mask's geo info.
746761
is_overview = bool(ifd.subfile_type & 1)
747-
if not is_overview or info.has_georef:
762+
if not is_overview:
748763
return info
749764

750765
# Find the level-0 IFD: NewSubfileType has bit 0 clear (not an
@@ -763,6 +778,51 @@ def extract_geo_info_with_overview_inheritance(
763778
return info
764779

765780
base_info = extract_geo_info(base_ifd, data, byte_order)
781+
782+
# Inherit the per-IFD metadata that the COG writer emits only on the
783+
# level-0 IFD: GDAL_NODATA, GDAL_METADATA, x/y resolution, colormap,
784+
# extra tags, image description, extra samples. Without this block
785+
# an overview read silently drops attrs['nodata'] (so the sentinel
786+
# pixels the writer baked into the overview survive as ordinary data
787+
# and poison downstream stats) and attrs['gdal_metadata'] (user
788+
# metadata loss). See issue #1739.
789+
#
790+
# Each field is inherited only when the overview lacks its own
791+
# value, so an overview IFD that does re-declare any of these keeps
792+
# its own copy. Mirrors the gate the CRS-side inheritance applies
793+
# below: prefer the overview's own value when present.
794+
if info.nodata is None and base_info.nodata is not None:
795+
info.nodata = base_info.nodata
796+
if (info.gdal_metadata is None
797+
and base_info.gdal_metadata is not None):
798+
info.gdal_metadata = base_info.gdal_metadata
799+
if (info.gdal_metadata_xml is None
800+
and base_info.gdal_metadata_xml is not None):
801+
info.gdal_metadata_xml = base_info.gdal_metadata_xml
802+
if info.x_resolution is None and base_info.x_resolution is not None:
803+
info.x_resolution = base_info.x_resolution
804+
if info.y_resolution is None and base_info.y_resolution is not None:
805+
info.y_resolution = base_info.y_resolution
806+
if (info.resolution_unit is None
807+
and base_info.resolution_unit is not None):
808+
info.resolution_unit = base_info.resolution_unit
809+
if info.colormap is None and base_info.colormap is not None:
810+
info.colormap = base_info.colormap
811+
if info.extra_tags is None and base_info.extra_tags is not None:
812+
info.extra_tags = base_info.extra_tags
813+
if (info.image_description is None
814+
and base_info.image_description is not None):
815+
info.image_description = base_info.image_description
816+
if (info.extra_samples is None
817+
and base_info.extra_samples is not None):
818+
info.extra_samples = base_info.extra_samples
819+
820+
# If the overview already has its own georef, the rest of the
821+
# inheritance (transform + CRS-side fields) is unnecessary -- return
822+
# now with just the per-IFD-tag inheritance applied above.
823+
if info.has_georef:
824+
return info
825+
766826
if not base_info.has_georef:
767827
return info
768828

xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,11 +86,15 @@ def test_cpu_cog_overview_mean_partial_block(tmp_path):
8686

8787
ov = open_geotiff(p, overview_level=1)
8888
# Top-left 2x2 was all-NaN -> reduces to NaN -> rewritten to -9999
89+
# on disk, then read back as NaN once the overview-nodata
90+
# inheritance fix (#1739) restores attrs['nodata'] and re-masks
91+
# the sentinel.
8992
# Top-right 2x2 [3,4,7,8] -> mean 5.5
9093
# Bottom-left [10,20,10,20] -> 15
9194
# Bottom-right [30,40,30,40] -> 35
9295
data = np.asarray(ov.data)
93-
assert data[0, 0] == -9999.0
96+
assert ov.attrs.get('nodata') == -9999.0
97+
assert np.isnan(data[0, 0])
9498
np.testing.assert_allclose(data[0, 1], 5.5)
9599
np.testing.assert_allclose(data[1, 0], 15.0)
96100
np.testing.assert_allclose(data[1, 1], 35.0)

0 commit comments

Comments
 (0)