Skip to content

Commit 5dc25aa

Browse files
committed
Update hypsometric_integral spec with review fixes
Add column/rasterize_kw params, fix accessor namespace to .xrs, clarify nodata semantics, specify float64 output dtype, add list-of-pairs zones support, note dask chunk alignment strategy.
1 parent 2c2250c commit 5dc25aa

File tree

1 file changed

+45
-26
lines changed

1 file changed

+45
-26
lines changed

docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md

Lines changed: 45 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ catchment, or arbitrary polygon).
2020
def hypsometric_integral(
2121
zones,
2222
values,
23-
nodata=np.nan,
23+
nodata=0,
24+
column=None,
25+
rasterize_kw=None,
2426
name='hypsometric_integral',
2527
) -> xr.DataArray:
2628
```
@@ -29,16 +31,19 @@ def hypsometric_integral(
2931

3032
| Parameter | Type | Description |
3133
|-----------|------|-------------|
32-
| `zones` | `DataArray` or `GeoDataFrame` | 2D zone IDs (integer). GeoDataFrame is rasterized via existing `_maybe_rasterize_zones`. |
33-
| `values` | `DataArray` | 2D elevation raster, same shape as zones. |
34-
| `nodata` | `float` | Fill value for cells outside any zone. Default `np.nan`. |
34+
| `zones` | `DataArray`, `GeoDataFrame`, or list of `(geometry, value)` pairs | 2D integer zone IDs. Vector inputs are rasterized via `_maybe_rasterize_zones` using `values` as the template grid. |
35+
| `values` | `DataArray` | 2D elevation raster (float), same shape as zones. |
36+
| `nodata` | `int`, default `0` | Zone ID that represents "no zone". Cells with this zone ID are excluded from computation and filled with `NaN` in the output. Matches `apply()` convention. Set to `None` to include all zone IDs. |
37+
| `column` | `str` or `None` | Column in a GeoDataFrame containing zone IDs. Required when `zones` is a GeoDataFrame. |
38+
| `rasterize_kw` | `dict` or `None` | Extra keyword arguments passed to `rasterize()` when vector zones are provided. |
3539
| `name` | `str` | Name for the output DataArray. Default `'hypsometric_integral'`. |
3640

3741
### Returns
3842

39-
`xr.DataArray` — same shape, dims, coords, and attrs as `values`. Each cell
40-
contains the HI of its zone. Cells outside any zone get `nodata`. Zones with
41-
zero elevation range (flat) get `nodata`.
43+
`xr.DataArray` (dtype `float64`) — same shape, dims, coords, and attrs as
44+
`values`. Each cell contains the HI of its zone. Cells belonging to the
45+
`nodata` zone or with non-finite elevation values get `NaN`. Zones with zero
46+
elevation range (flat) also get `NaN`.
4247

4348
## Placement
4449

@@ -52,50 +57,64 @@ structurally a zonal operation, not a local neighborhood transform.
5257

5358
All four backends via `ArrayTypeFunctionMapping`:
5459

55-
- **numpy**: iterate unique zones, compute min/mean/max per zone, paint back.
56-
Can reuse existing `_sort_and_stride` infrastructure for grouping values by
57-
zone.
58-
- **cupy**: same logic on GPU arrays. Use `cupy.unique`, scatter/gather.
59-
- **dask+numpy**: `map_blocks` or blockwise aggregation. Two-pass: first pass
60-
computes per-zone min/sum/max/count across chunks, second pass reduces and
61-
paints back.
62-
- **dask+cupy**: same as dask+numpy but with cupy chunk functions.
60+
- **numpy**: use `_sort_and_stride` to group values by zone, compute
61+
min/mean/max per zone, build a zone-to-HI lookup, paint back with
62+
vectorized indexing.
63+
- **cupy**: same logic using `cupy.unique` and device-side scatter/gather.
64+
- **dask+numpy**: compute per-chunk partial aggregates (min, max, sum, count
65+
per zone) via `map_blocks`, reduce across chunks to get global per-zone
66+
stats, then `map_blocks` again to paint HI values back using the global
67+
lookup. Zones and values chunks must be aligned (use `validate_arrays`).
68+
- **dask+cupy**: same two-pass structure. Follows the existing pattern where
69+
chunk functions use cupy internally (same as `_stats_dask_cupy`).
6370

6471
## Algorithm
6572

66-
1. Validate inputs (2D, matching shapes).
67-
2. Identify unique zones (excluding NaN / 0 if used as nodata).
68-
3. For each zone `z`:
73+
1. Validate inputs (2D, matching shapes via `validate_arrays`).
74+
2. Rasterize vector zones if needed (`_maybe_rasterize_zones`).
75+
3. Identify unique zone IDs, excluding `nodata` zone and NaN.
76+
4. For each zone `z`:
6977
- Mask: cells where `zones == z` and `values` is finite.
7078
- Compute `min_z`, `mean_z`, `max_z`.
71-
- `hi_z = (mean_z - min_z) / (max_z - min_z)` if `max_z != min_z`, else `nodata`.
72-
4. Paint `hi_z` back into all cells belonging to zone `z`.
73-
5. Fill remaining cells with `nodata`.
79+
- `hi_z = (mean_z - min_z) / (max_z - min_z)` if `max_z != min_z`,
80+
else `NaN`.
81+
5. Paint `hi_z` back into all cells belonging to zone `z`.
82+
6. Fill remaining cells (nodata zone, non-finite values, flat zones) with
83+
`NaN`.
84+
85+
### Value nodata handling
86+
87+
Only non-finite values (`NaN`, `inf`) are excluded from per-zone statistics.
88+
Users with sentinel nodata values (e.g., -9999) should mask their DEM before
89+
calling this function. This matches the convention used by `apply()`.
7490

7591
## Accessor
7692

77-
Expose via `xrspatial.accessor` as:
93+
Expose via the existing `.xrs` accessor:
7894

7995
```python
80-
da.spatial.hypsometric_integral(zones)
96+
elevation.xrs.zonal_hypsometric_integral(zones)
8197
```
8298

83-
where `da` is the elevation DataArray.
99+
Following the `zonal_` prefix convention used by `zonal_stats`, `zonal_apply`,
100+
and `zonal_crosstab`.
84101

85102
## Tests
86103

87104
- **Hand-crafted case**: zones with known elevation distributions and
88105
pre-computed HI values.
89-
- **Edge cases**: single-cell zones, flat zones (range=0 returns nodata),
106+
- **Edge cases**: single-cell zones, flat zones (range=0 returns NaN),
90107
NaN cells within a zone (ignored in computation), zones with all-NaN values.
91108
- **Cross-backend parity**: standard `general_checks` pattern comparing
92109
numpy, cupy, dask+numpy, dask+cupy outputs.
93-
- **GeoDataFrame zones input**: verify rasterization path works.
110+
- **Vector zones input**: verify GeoDataFrame and list-of-pairs rasterization
111+
paths work.
94112

95113
## Scope
96114

97115
This is intentionally minimal. Future extensions (not in this iteration):
98116
- Hypsometric curve data (normalized area-altitude distribution)
99117
- Per-zone summary table output
118+
- `zone_ids` parameter to restrict computation to a subset of zones
100119
- Skewness / kurtosis of the hypsometric distribution
101120
- Integration as a stat option in `zonal.stats()`

0 commit comments

Comments
 (0)