Skip to content

Skip no-op astype in _delayed_read_window (#1624)#1626

Merged
brendancol merged 2 commits into
mainfrom
issue-1624
May 11, 2026
Merged

Skip no-op astype in _delayed_read_window (#1624)#1626
brendancol merged 2 commits into
mainfrom
issue-1624

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1624.

Summary

Test plan

  • New regression test test_dask_no_op_astype_1624.py wraps read_to_array so it returns an ndarray subclass that records astype calls, then asserts no same-dtype astype survives the per-chunk return path.
  • Test fails on main with the expected error message; passes after the fix.
  • Existing test_dask_int_nodata_chunks_1597.py (the regression suite from Fix read_geotiff_dask losing nodata mask in non-first chunks (#1597) #1601) still passes -- the int-with-sentinel promotion is intact.
  • Full xrspatial/geotiff/tests/ suite passes apart from three pre-existing TestPalette failures on main that are unrelated to this change.

After #1601 widened the call site to always pass target_dtype, every
dask chunk ran arr.astype(target_dtype) even when arr.dtype already
matched. numpy.astype defaults to copy=True and so allocated a same-
dtype chunk-sized buffer and memcpy on every read. Gate the cast on a
real dtype mismatch; the #1597 mask path still promotes uint to float64
inline so every chunk lands in the dask-declared dtype.

Regression test in test_dask_no_op_astype_1624.py wraps read_to_array
to capture an ndarray subclass that records astype calls, then asserts
no same-dtype call survives the per-chunk return path.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a performance regression in the GeoTIFF dask read path by avoiding redundant per-chunk astype() calls when the chunk dtype already matches the dask-declared dtype, while preserving the existing integer-nodata promotion behavior introduced for #1597/#1601.

Changes:

  • Skip the per-chunk arr.astype(target_dtype) in _delayed_read_window when arr.dtype == target_dtype to avoid unnecessary chunk-sized allocations/copies.
  • Add a regression test that instruments the per-chunk read_to_array return value to detect same-dtype astype() calls.
  • Add coverage to ensure the uint16 nodata-sentinel path still promotes to float64.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Gates per-chunk astype() on a real dtype mismatch to avoid no-op copy costs.
xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py Adds regression tests to confirm no-op casts are skipped while nodata promotion remains intact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py Outdated
Comment thread xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py
Copilot review on #1626 flagged two unused-variable issues. Fixes:

- Delete test_float32_chunks_avoid_redundant_copy. It declared
  same_dtype_casts and wrapped _delayed_read_window without recording
  anything, so it only asserted output dtype -- which does not
  validate the #1624 regression. test_astype_skipped_when_dtypes_match
  already covers the same surface properly via an ndarray subclass
  that tracks astype calls; verified it fails on the pre-fix code
  with "Same-dtype astype still runs per chunk".

- In test_uint16_mask_path_still_promotes, exercise the previously-
  unused arr fixture: assert that pixels equal to the 65535 sentinel
  become NaN and every other pixel matches arr.astype(float64).
  Anchors the test to fixture values so regressions in the mask
  branch surface here, not just as dtype drift.
@brendancol
Copy link
Copy Markdown
Contributor Author

Addressed both Copilot review comments in 4c66bb1:

  1. test_float32_chunks_avoid_redundant_copy — deleted. The wrapper around _delayed_read_window never recorded any astype calls, the same_dtype_casts list was unused, and the only assertion was on output dtype. test_astype_skipped_when_dtypes_match already covers the same case correctly via an ndarray subclass that records astype invocations. Confirmed it fails on the pre-fix code with AssertionError: Same-dtype astype still runs per chunk (dtype=float32, calls=[dtype('float32')]) and passes after the fix.

  2. test_uint16_mask_path_still_promotes — the previously-unused arr is now exercised: the test asserts that pixels equal to the 65535 sentinel become NaN and every other pixel matches arr.astype(float64) byte-for-byte. Catches mask-branch regressions instead of just dtype drift.

pyflakes clean. All three remaining tests pass.

@brendancol brendancol merged commit 8d17a84 into main May 11, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_geotiff_dask per-chunk astype copies even when dtype already matches

2 participants