Skip to content

geotiff: inherit nodata + rich-tag set on overview reads (#1739)#1745

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-05-12-ac32544f
May 12, 2026
Merged

geotiff: inherit nodata + rich-tag set on overview reads (#1739)#1745
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-05-12-ac32544f

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1739. open_geotiff(path, overview_level=N) for N >= 1 was dropping attrs['nodata'] and nine other per-IFD tags from the returned DataArray, because extract_geo_info_with_overview_inheritance only inherited the CRS-side fields from level 0. The COG writer puts the rest of these tags (GDAL_NODATA, GDAL_METADATA, x/y resolution, resolution unit, colormap, extra tags, image description, extra samples) only on level 0.

The nodata case is the one that actually corrupted data. The writer rewrites NaN to the sentinel value before reducing each overview, so the overview pixels still carry the sentinel on disk. Without the inheritance fix, the reader handed back a DataArray with attrs['nodata'] == None and literal sentinel pixels, which then folded into downstream stats and plots.

All four backends (numpy, dask+numpy, cupy, dask+cupy) go through the same helper, so all four are fixed in one shot.

Changes

  • xrspatial/geotiff/_geotags.py::extract_geo_info_with_overview_inheritance: extend the inheritance block to cover the per-IFD pass-through tags. Each field is gated on the overview not already having its own value, mirroring the existing CRS-side pattern. Restructured the early-return so the pass-through inheritance runs even when the overview already has its own georef.
  • Updated the docstring to describe the new behaviour.
  • xrspatial/geotiff/tests/test_overview_nodata_inheritance_1739.py: new 4-backend regression suite. Covers attrs['nodata'] presence, NaN-mask substitution, np.nanmean correctness, gdal_metadata propagation, resolution-tag propagation, an attrs-keyset symmetric-diff contract test, and a per-field override test (overview with its own nodata keeps it).
  • xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py::test_cpu_cog_overview_mean_partial_block: updated the assertion from raw sentinel value to NaN. The previous expectation was pinning the buggy behaviour.

Test plan

  • New tests pass on numpy, dask+numpy, cupy, dask+cupy.
  • Related existing tests still pass (test_overview_geo_inheritance_1640.py, test_overview_pixel_is_point_1642.py, test_cog.py, test_cog_overview_nodata_1613.py, test_cog_cubic_overview_nodata_1623.py, test_attrs_parity_1548.py, test_metadata_round_trip_1484.py, test_nodata_attr_aliases_1582.py): 102/102 pass.
  • Overview IFD that re-declares its own GDAL_NODATA keeps its own value (per-field gate test).
  • The repro from geotiff: overview reads silently drop attrs['nodata'] from level 0; pixel sentinel survives #1739 now produces nanmean == 100.0 on every overview level. Before the fix it was -531 at level 1.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 20:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GeoTIFF overview reads (open_geotiff(..., overview_level>=1)) dropping attrs['nodata'] and other “rich” per-IFD tags when those tags exist only on the level-0 IFD (common in GDAL-style COGs). This prevents silent numerical corruption where overview pixels containing the on-disk nodata sentinel were previously returned as ordinary values.

Changes:

  • Extend extract_geo_info_with_overview_inheritance to inherit per-IFD pass-through tags (nodata, GDAL metadata, resolution tags, colormap, extra tags, image description, extra samples) from level 0 when missing on the overview IFD.
  • Add a new 4-backend regression test suite covering nodata inheritance/masking and rich-tag propagation.
  • Update an existing COG overview test assertion to expect NaN after nodata inheritance/remasking.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/_geotags.py Adds per-field inheritance for nodata + rich per-IFD tags on overview reads, while preserving existing georef inheritance behavior.
xrspatial/geotiff/tests/test_overview_nodata_inheritance_1739.py New regression tests for nodata/rich-tag inheritance across numpy/dask/cupy backends.
xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py Updates expectation to reflect remasked NaN on overview reads after nodata inheritance fix.
.claude/sweep-metadata-state.csv Updates internal sweep metadata entry to reflect issue #1739 state/details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +271 to +275
we instead use a hand-built TIFF below would be overkill; the
simpler contract test in test_attrs_keysets_consistent_across_*
already pins the typical writer path. This test pins the
"overview already has its own value" branch by simulating it
directly against ``extract_geo_info_with_overview_inheritance``.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded in 1c194bc.

`extract_geo_info_with_overview_inheritance` was inheriting only the
CRS-side fields from the level-0 IFD. The COG writer also emits
GDAL_NODATA, GDAL_METADATA, the resolution tags, and the extra-tag
pass-through fields (colormap, image_description, extra_samples) only
on level 0, so an `open_geotiff(path, overview_level=N)` for N >= 1
returned a DataArray whose `attrs['nodata']` was None even when the
file declared a sentinel. The overview's on-disk pixels still carried
that sentinel (writer rewrites NaN to sentinel before reducing), so
downstream stats / threshold / plot pipelines silently folded sentinel
values into their results.

Fix: extend the inheritance helper to fill in the per-IFD pass-through
tags whenever the overview IFD lacks its own value. Mirrors the
existing CRS-side inheritance pattern (only inherit fields the
overview doesn't already carry), so an overview that does re-declare
any of these tags keeps its own value.

Affects all four backends (numpy, dask+numpy, cupy, dask+cupy) since
they all go through this helper. Adds a 4-backend regression test
suite that exercises nodata, gdal_metadata, and the resolution tags
across overview levels, and a contract test pinning the symmetric
diff of the attrs keyset to zero between level 0 and overview reads.

Updates one existing test in test_cog_overview_nodata_1613.py that
was inadvertently asserting the buggy behaviour (raw sentinel in the
overview output instead of NaN); the post-fix expectation matches
the level-0 behaviour.
@brendancol brendancol force-pushed the deep-sweep-metadata-geotiff-2026-05-12-ac32544f branch from f79b55d to ea1b5a7 Compare May 12, 2026 20:22
@brendancol brendancol merged commit cc77100 into main May 12, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: overview reads silently drop attrs['nodata'] from level 0; pixel sentinel survives

2 participants