Skip to content

Fix normalize dask paths: replace boolean indexing with lazy reductions#1125

Merged
brendancol merged 4 commits intomasterfrom
issue-1124
Mar 31, 2026
Merged

Fix normalize dask paths: replace boolean indexing with lazy reductions#1125
brendancol merged 4 commits intomasterfrom
issue-1124

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 31, 2026

Summary

  • Replace data[finite_mask] (boolean fancy indexing that materializes dask arrays) with da.where(finite_mask, data, nan) + da.nanmin()/da.nanmax()/da.nanmean()/da.nanstd()
  • Guard division by zero in rescale with safe_range to prevent inf/nan from lazy branch evaluation
  • All four dask paths fixed: rescale (dask+numpy, dask+cupy) and standardize (dask+numpy, dask+cupy)

Context

Found during performance sweep (#1124). Boolean fancy indexing on dask arrays forces full materialization into a single chunk. The lazy reduction functions (da.nanmin etc.) do per-chunk reductions that never materialize the full array.

Test plan

  • All 29 existing normalize tests pass (verified)

Parallel subagent triage + ralph-loop workflow for auditing all
xrspatial modules for performance bottlenecks, OOM risk under
30TB dask workloads, and backend-specific anti-patterns.
7 tasks covering command scaffold, module scoring, parallel subagent
dispatch, report merging, ralph-loop generation, and smoke tests.
…#1124)

Replace `data[finite_mask]` (boolean fancy indexing that materializes
dask arrays) with `da.where(finite_mask, data, nan)` + `da.nanmin()`/
`da.nanmax()`/`da.nanmean()`/`da.nanstd()` for lazy per-chunk
reductions.

Guard division by zero in rescale with safe_range to prevent inf/nan
in lazy evaluation (da.where evaluates both branches).
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 31, 2026
@brendancol brendancol merged commit 9edd073 into master Mar 31, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant