Skip to content

geotiff: stamp _xrspatial_geotiff_contract=1 on every read (#1984)#2003

Merged
brendancol merged 2 commits into
mainfrom
1984-pr3-attrs-contract-version
May 17, 2026
Merged

geotiff: stamp _xrspatial_geotiff_contract=1 on every read (#1984)#2003
brendancol merged 2 commits into
mainfrom
1984-pr3-attrs-contract-version

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

PR 3 of 7 on issue #1984.

Stamps attrs['_xrspatial_geotiff_contract'] = 1 on every DataArray
returned by an xrspatial geotiff read path. Downstream code reads the
marker to learn which attrs-contract revision produced the array;
later PRs in the series will bump the value when the contract changes.

Summary

  • Added module-level constant _ATTRS_CONTRACT_VERSION = 1 in
    xrspatial/geotiff/_attrs.py.
  • Stamped the value in _populate_attrs_from_geo_info, which is the
    single chokepoint for the eager numpy, dask+numpy, GPU, dask+GPU,
    and COG/HTTP read paths.
  • Stamped the value inline in _backends/vrt.py for the two VRT
    paths (eager and chunked) that build their attrs dict directly
    rather than going through the helper.

Read paths covered

  • open_geotiff (eager numpy)
  • open_geotiff(chunks=...) / read_geotiff_dask (dask + numpy)
  • open_geotiff(gpu=True) / read_geotiff_gpu (cupy / GPU)
  • open_geotiff(gpu=True, chunks=...) (dask + cupy)
  • COG / HTTP eager read (funnels through _read_to_array and the
    shared helper)
  • read_vrt (VRT eager)
  • read_vrt(chunks=...) (VRT chunked / dask)

Test plan

  • pytest xrspatial/geotiff/tests/test_attrs_contract_version_1984.py -x -q
  • pytest xrspatial/geotiff/tests -q (pre-existing GPU and fuzz
    failures on origin/main are unaffected; no test was previously
    asserting on the exact attrs dict so no other test needed updating)

PR 3 of 7 on issue #1984. Adds a contract-version marker attr
(``_xrspatial_geotiff_contract``) to every DataArray returned by an
xrspatial geotiff read path so downstream code can identify which
attrs-contract revision produced an array.

The value lives as a module-level constant ``_ATTRS_CONTRACT_VERSION``
in ``_attrs.py``. The eager numpy, dask+numpy, GPU, dask+GPU, and the
COG/HTTP path all funnel through ``_populate_attrs_from_geo_info``, so
one stamp there covers four backends. The VRT backends in
``_backends/vrt.py`` build their attrs dict directly and stamp the
version inline; both the eager and chunked VRT paths reuse the same
constant so the value stays in lockstep when it is later bumped.

Adds ``test_attrs_contract_version_1984.py`` with one assertion per
read path (eager, dask, GPU, dask+GPU, VRT eager, VRT chunked) plus
a pin on the constant value.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 17, 2026
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: stamp _xrspatial_geotiff_contract=1 on every read

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_attrs.py:113 — the stamp is the first line of _populate_attrs_from_geo_info. If a caller pre-populates attrs['_xrspatial_geotiff_contract'], it will be silently overwritten. No caller does today, but the unconditional overwrite is worth a one-line note in the helper's docstring.
  • xrspatial/geotiff/_backends/vrt.py:196, 566_ATTRS_CONTRACT_VERSION is referenced inline in two places. A short comment pointing at _populate_attrs_from_geo_info as the canonical site would help future maintainers know the helper is bypassed because VRT builds its attrs inline.

Nits (optional improvements)

  • xrspatial/geotiff/tests/test_attrs_contract_version_1984.py:106-113_gpu_available swallows any cupy import error as "no GPU". If cupy is installed but broken (driver mismatch, etc.) the GPU tests skip silently rather than surfacing the cause. A pytest.importorskip("cupy") plus cupy.cuda.is_available() two-step would be more explicit.

What looks good

  • One stamp site in _populate_attrs_from_geo_info covers four backends (eager, dask, GPU, dask+GPU) plus the COG/HTTP path.
  • VRT eager and chunked paths import the constant rather than hard-coding 1, so a future bump stays in lockstep.
  • Tests cover every read path the PR claims to cover, plus a pin on the literal value 1.

Checklist

  • All four primary backends covered
  • VRT eager + chunked covered
  • Constant imported (no duplicated literal)
  • Test pins the version value

- _attrs.py docstring: noted that the stamp overwrites any pre-existing
  value on the passed-in attrs dict; callers pass freshly built dicts.
- _backends/vrt.py (eager + chunked): added inline comments pointing at
  _populate_attrs_from_geo_info as the canonical stamp site, so future
  maintainers know why the helper is bypassed in the VRT path.
@brendancol brendancol merged commit 139e8f2 into main May 17, 2026
4 of 5 checks passed
brendancol added a commit that referenced this pull request May 18, 2026
* geotiff: add attrs canonical-tier locking test (#1984)

PR 4 of 7 from the attrs-contract plan in #1984. Adds
test_attrs_contract_canonical_1984.py: a single fixture that exercises
every canonical attr (crs, crs_wkt, transform, nodata, raster_type,
extra_tags, gdal_metadata, gdal_metadata_xml, x_resolution / y_resolution /
resolution_unit, _xrspatial_geotiff_contract), round-trips it through
to_geotiff -> open_geotiff, and asserts presence + value per key.

Sibling locking tests already cover the other tiers: aliases (#2002),
pass-through (#2004), and per-backend stamping of the contract version
(#2003). The canonical assertion list is what #1985 (parity matrix) and
#1986 (round-trip invariants) will import.

No production-code changes.

* geotiff: address review on attrs canonical-tier locking test

Addresses self-review on PR #2009:

* Add per-backend canonical-key presence parametrisation
  (eager-numpy, dask-numpy, gpu, dask-gpu). The 7-PR plan in #1984
  called for "one fixture per backend" coverage; previously only the
  eager read path was exercised.
* Comment ``_CANONICAL_KEYS`` to explain the deliberate omission of
  ``raster_type`` (the implicit 'area' default is encoded as absence,
  so the constant cannot express it; the two dedicated raster_type
  tests handle both branches).
* Drop ``AREA_OR_POINT: 'Area'`` from the shared ``_GDAL_META``
  fixture so the point-branch test does not inherit an inconsistent
  GDAL_METADATA entry.
* Relax the crs_wkt substring check to a regex covering 'WGS 84',
  'WGS_1984', and 'WGS-84' so the assertion survives PROJ-version
  variation across CI platforms.
* Note in a comment that ``gdal_metadata['TIFFTAG_SOFTWARE']`` and the
  raw Software TIFF tag (305) in ``extra_tags`` are independent
  channels the writer does not synchronise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant