geotiff: inherit nodata + rich-tag set on overview reads (#1739)#1745
Merged
brendancol merged 2 commits intoMay 12, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes GeoTIFF overview reads (open_geotiff(..., overview_level>=1)) dropping attrs['nodata'] and other “rich” per-IFD tags when those tags exist only on the level-0 IFD (common in GDAL-style COGs). This prevents silent numerical corruption where overview pixels containing the on-disk nodata sentinel were previously returned as ordinary values.
Changes:
- Extend
extract_geo_info_with_overview_inheritanceto inherit per-IFD pass-through tags (nodata, GDAL metadata, resolution tags, colormap, extra tags, image description, extra samples) from level 0 when missing on the overview IFD. - Add a new 4-backend regression test suite covering nodata inheritance/masking and rich-tag propagation.
- Update an existing COG overview test assertion to expect NaN after nodata inheritance/remasking.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/_geotags.py |
Adds per-field inheritance for nodata + rich per-IFD tags on overview reads, while preserving existing georef inheritance behavior. |
xrspatial/geotiff/tests/test_overview_nodata_inheritance_1739.py |
New regression tests for nodata/rich-tag inheritance across numpy/dask/cupy backends. |
xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py |
Updates expectation to reflect remasked NaN on overview reads after nodata inheritance fix. |
.claude/sweep-metadata-state.csv |
Updates internal sweep metadata entry to reflect issue #1739 state/details. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+271
to
+275
| we instead use a hand-built TIFF below would be overkill; the | ||
| simpler contract test in test_attrs_keysets_consistent_across_* | ||
| already pins the typical writer path. This test pins the | ||
| "overview already has its own value" branch by simulating it | ||
| directly against ``extract_geo_info_with_overview_inheritance``. |
`extract_geo_info_with_overview_inheritance` was inheriting only the CRS-side fields from the level-0 IFD. The COG writer also emits GDAL_NODATA, GDAL_METADATA, the resolution tags, and the extra-tag pass-through fields (colormap, image_description, extra_samples) only on level 0, so an `open_geotiff(path, overview_level=N)` for N >= 1 returned a DataArray whose `attrs['nodata']` was None even when the file declared a sentinel. The overview's on-disk pixels still carried that sentinel (writer rewrites NaN to sentinel before reducing), so downstream stats / threshold / plot pipelines silently folded sentinel values into their results. Fix: extend the inheritance helper to fill in the per-IFD pass-through tags whenever the overview IFD lacks its own value. Mirrors the existing CRS-side inheritance pattern (only inherit fields the overview doesn't already carry), so an overview that does re-declare any of these tags keeps its own value. Affects all four backends (numpy, dask+numpy, cupy, dask+cupy) since they all go through this helper. Adds a 4-backend regression test suite that exercises nodata, gdal_metadata, and the resolution tags across overview levels, and a contract test pinning the symmetric diff of the attrs keyset to zero between level 0 and overview reads. Updates one existing test in test_cog_overview_nodata_1613.py that was inadvertently asserting the buggy behaviour (raw sentinel in the overview output instead of NaN); the post-fix expectation matches the level-0 behaviour.
f79b55d to
ea1b5a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1739.
open_geotiff(path, overview_level=N)forN >= 1was droppingattrs['nodata']and nine other per-IFD tags from the returned DataArray, becauseextract_geo_info_with_overview_inheritanceonly inherited the CRS-side fields from level 0. The COG writer puts the rest of these tags (GDAL_NODATA, GDAL_METADATA, x/y resolution, resolution unit, colormap, extra tags, image description, extra samples) only on level 0.The
nodatacase is the one that actually corrupted data. The writer rewrites NaN to the sentinel value before reducing each overview, so the overview pixels still carry the sentinel on disk. Without the inheritance fix, the reader handed back a DataArray withattrs['nodata'] == Noneand literal sentinel pixels, which then folded into downstream stats and plots.All four backends (numpy, dask+numpy, cupy, dask+cupy) go through the same helper, so all four are fixed in one shot.
Changes
xrspatial/geotiff/_geotags.py::extract_geo_info_with_overview_inheritance: extend the inheritance block to cover the per-IFD pass-through tags. Each field is gated on the overview not already having its own value, mirroring the existing CRS-side pattern. Restructured the early-return so the pass-through inheritance runs even when the overview already has its own georef.xrspatial/geotiff/tests/test_overview_nodata_inheritance_1739.py: new 4-backend regression suite. Coversattrs['nodata']presence, NaN-mask substitution,np.nanmeancorrectness,gdal_metadatapropagation, resolution-tag propagation, an attrs-keyset symmetric-diff contract test, and a per-field override test (overview with its own nodata keeps it).xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py::test_cpu_cog_overview_mean_partial_block: updated the assertion from raw sentinel value to NaN. The previous expectation was pinning the buggy behaviour.Test plan
test_overview_geo_inheritance_1640.py,test_overview_pixel_is_point_1642.py,test_cog.py,test_cog_overview_nodata_1613.py,test_cog_cubic_overview_nodata_1623.py,test_attrs_parity_1548.py,test_metadata_round_trip_1484.py,test_nodata_attr_aliases_1582.py): 102/102 pass.nanmean == 100.0on every overview level. Before the fix it was-531at level 1.