geotiff: support fsspec URIs in read_geotiff_dask (#1749)#1755
Open
brendancol wants to merge 1 commit into
Open
geotiff: support fsspec URIs in read_geotiff_dask (#1749)#1755brendancol wants to merge 1 commit into
brendancol wants to merge 1 commit into
Conversation
…1749) read_geotiff_dask failed on s3://, gs://, az://, memory:// and other fsspec URIs because the metadata-only step (_read_geo_info) used a plain open(source, 'rb') call. The eager path already handled cloud URIs via _read_to_array + _CloudSource, so the dask graph's per-chunk pixel reads were already cloud-aware; only the upfront metadata read broke. Detect fsspec URIs in _read_geo_info and pull the file bytes via _CloudSource.read_all(). Local-path mmap fast path is unchanged. HTTP path is unchanged and continues to use _parse_cog_http_meta. Closes #1749.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes open_geotiff(..., chunks=...) / read_geotiff_dask failures for fsspec-backed URIs (e.g., s3://, gs://, az://, memory://) by making the upfront metadata read (_read_geo_info) cloud-aware, aligning dask reads with the existing eager read path.
Changes:
- Route fsspec/cloud URIs in
_read_geo_infothrough_CloudSourceinstead ofopen(..., 'rb'). - Add a regression test that reads a GeoTIFF from
memory://using dask chunks and validates results against eager reads and the original array.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Adds an fsspec URI branch in _read_geo_info to avoid local open() for cloud/memory schemes. |
xrspatial/geotiff/tests/test_features.py |
Adds a regression test covering dask reads from an fsspec memory:// URI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+480
to
+484
| elif isinstance(source, str) and _is_fsspec_uri(source): | ||
| # fsspec URI (s3://, gs://, az://, memory://, ...): pull the | ||
| # whole file via _CloudSource for metadata parsing. Per-chunk | ||
| # pixel reads in the dask graph go through _read_to_array | ||
| # which opens its own _CloudSource, so this fetch is metadata-only. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
read_geotiff_daskfailed ons3://,gs://,az://,memory://and other fsspec URIs withFileNotFoundError. The eager path already handled cloud URIs via_read_to_array+_CloudSource, but the dask path's metadata-only step (_read_geo_infoinxrspatial/geotiff/__init__.py) used a plainopen(source, 'rb')call._read_to_arraywhich dispatches to_CloudSource), so only the upfront metadata read needed fixing._read_geo_infothrough_CloudSource.read_all(). The local-pathmmapfast path is unchanged. The HTTP path (which uses_parse_cog_http_meta) is unchanged.Closes #1749.
Test plan
test_dask_path_fsspec_uri_1749inTestCloudStoragewrites a small TIFF, copies it into the fsspecmemory://filesystem, and verifiesopen_geotiff('memory:///...', chunks=4)returns a dask-backed DataArray whose values match both the eager read and the original array.TestCloudStorageclass still passes (6/6).xrspatial/geotiff/tests/suite shows the same pre-existing failures asmain(GPU/matplotlib unrelated to this change).