Skip to content

geotiff: surface VRT skipped sources as attrs['vrt_holes'] (#1734)#1747

Merged
brendancol merged 2 commits into
mainfrom
issue-1734
May 12, 2026
Merged

geotiff: surface VRT skipped sources as attrs['vrt_holes'] (#1734)#1747
brendancol merged 2 commits into
mainfrom
issue-1734

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Lenient default read_vrt warns once per unreadable source and continues, leaving zero-filled holes in integer VRTs that downstream code cannot tell from real data.
  • Add a structured side-channel: VRTDataset.holes records {source, band, dst_rect, error} per skipped source. Public read_vrt copies a non-empty list to attrs['vrt_holes'] on the returned DataArray.
  • Fallback warning now names attrs['vrt_holes'] and XRSPATIAL_GEOTIFF_STRICT=1 so the recovery path is discoverable from a single captured warning.
  • Strict mode unchanged.

Closes #1734.

Test plan

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 20:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves xrspatial.geotiff.read_vrt’s default (lenient) behavior by surfacing machine-readable information about skipped/unreadable VRT sources, so downstream pipelines can detect partial mosaics without relying on warnings alone (issue #1734).

Changes:

  • Add VRTDataset.holes to record skipped source metadata (source, band, dst_rect, error) during lenient reads.
  • Extend the fallback warning to point users to DataArray.attrs['vrt_holes'] and XRSPATIAL_GEOTIFF_STRICT=1.
  • Propagate non-empty vrt.holes to the public read_vrt result as attrs['vrt_holes'], plus add a new regression test module.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_vrt.py Records skipped-source “holes” during lenient VRT reads and enhances the warning message.
xrspatial/geotiff/__init__.py Copies non-empty skipped-source records onto the returned DataArray as attrs['vrt_holes'].
xrspatial/geotiff/tests/test_vrt_holes_attr_1734.py Adds regression tests asserting the new vrt_holes attribute behavior and warning messaging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +39
' <VRTRasterBand dataType="Float32" band="1">\n'
' <NoDataValue>-9999</NoDataValue>\n'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the helper to Int32 with no NoDataValue so the missing source actually hits the zero-fill path (the integer failure mode #1734 was about). Also added a (da.values == 0).all() assert on the integer dtype, so the test pins the underlying regression and not just the attr. Fixed in 5915bd9.

missing_src = f'{tmp_path}/does_not_exist_1734_strict.tif'
_write_vrt_with_missing_source(vrt_path, missing_src)

with pytest.raises(Exception):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tightened to pytest.raises(FileNotFoundError, match=...) against the filename. Confirmed locally that the strict path actually raises FileNotFoundError (an OSError subclass) -- _FileSource opens the file via open(real, 'rb'), so a missing local source is a plain FileNotFoundError that propagates through read_to_array. Fixed in 5915bd9.

The default lenient mode of read_vrt warns once per unreadable source
and continues, returning a mosaic with zero-filled holes that
downstream integer-VRT code cannot distinguish from real data. The
warning is easy to miss in a pipeline that filters UserWarnings.

Add a structured side-channel: VRTDataset gains a .holes list that
the read loop populates with {source, band, dst_rect, error} on each
skipped source. The public read_vrt copies a non-empty list into the
returned DataArray as attrs['vrt_holes'] so callers can detect
partial mosaics by a single attribute lookup.

The fallback warning message now points at attrs['vrt_holes'] or the
XRSPATIAL_GEOTIFF_STRICT env var, so the recovery path is
discoverable from a single captured warning.

Strict mode (XRSPATIAL_GEOTIFF_STRICT=1) is unchanged and still
raises.

Adds regression coverage in test_vrt_holes_attr_1734.py.
…rror

Address PR #1747 review comments:

* The helper that wrote the VRT used `dataType="Float32"` with a
  `NoDataValue`, so the missing-source hole came out as NaN, which is
  not the failure mode #1734 documents. Switch to `Int32` and drop the
  NoDataValue so the lenient path produces the zero-filled integer
  hole the attr is meant to surface. Add an explicit
  `(da.values == 0).all()` assert on the integer dtype so the
  regression is pinned, not just the attr surface.

* The strict-mode test caught `Exception`, which would have happily
  swallowed any unrelated failure. The missing-source path raises
  `FileNotFoundError` (an `OSError` subclass) from `_FileSource` via
  `read_to_array`; assert that concrete class with a `match=` against
  the file basename so an unrelated read-path bug surfaces as a real
  test failure.
@brendancol brendancol merged commit 51408f3 into main May 12, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: read_vrt default lenient mode hides source read failures as zero-filled holes

2 participants