Skip to content

Commit 96fe0c4

Browse files
committed
Add design spec for spatial autocorrelation (#1135)
Covers Global Moran's I, Local Moran's I (LISA), and queen/rook contiguity weights. Geary's C and distance-band weights deferred to follow-up issues.
1 parent d1d85d7 commit 96fe0c4

File tree

1 file changed

+259
-0
lines changed

1 file changed

+259
-0
lines changed
Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
# Spatial Autocorrelation: Moran's I and LISA
2+
3+
**Issue:** #1135 (partial -- Global Moran's I, Local Moran's I, queen/rook contiguity)
4+
**Date:** 2026-04-01
5+
6+
## Scope
7+
8+
This spec covers the first increment of #1135:
9+
10+
- Global Moran's I with analytical inference
11+
- Local Moran's I (LISA) with permutation-based pseudo p-values
12+
- Queen and rook contiguity weights (3x3 kernel)
13+
14+
Geary's C, join count statistics, and distance-band weights are deferred to follow-up issues.
15+
16+
## Public API
17+
18+
Two functions in `xrspatial/autocorrelation.py`:
19+
20+
### `morans_i`
21+
22+
```python
23+
def morans_i(
24+
raster: xr.DataArray,
25+
contiguity: str = 'queen',
26+
boundary: str = 'nan',
27+
) -> xr.DataArray:
28+
```
29+
30+
Returns a scalar (0-dimensional) DataArray. The `.item()` value is the I statistic.
31+
Attrs carry analytical inference results:
32+
33+
| Attr | Type | Description |
34+
|------|------|-------------|
35+
| `expected_I` | float | -1/(N-1) |
36+
| `variance_I` | float | Cliff & Ord analytical variance |
37+
| `z_score` | float | (I - E[I]) / sqrt(Var[I]) |
38+
| `p_value` | float | Two-sided, from normal approximation |
39+
| `N` | int | Count of non-NaN pixels |
40+
| `S0` | float | Sum of all weights |
41+
| `contiguity` | str | 'queen' or 'rook' |
42+
43+
### `lisa`
44+
45+
```python
46+
def lisa(
47+
raster: xr.DataArray,
48+
contiguity: str = 'queen',
49+
n_permutations: int = 999,
50+
boundary: str = 'nan',
51+
) -> xr.Dataset:
52+
```
53+
54+
Returns a Dataset with three DataVariables:
55+
56+
| Variable | Dims | Dtype | Description |
57+
|----------|------|-------|-------------|
58+
| `lisa_values` | (y, x) | float32 | Local I_i per pixel |
59+
| `p_values` | (y, x) | float32 | Pseudo p-value from permutation |
60+
| `cluster` | (y, x) | int8 | 0=NS, 1=HH, 2=LL, 3=HL, 4=LH |
61+
62+
Dataset attrs: `n_permutations`, `contiguity`, `global_morans_I`.
63+
64+
Cluster codes use significance threshold p <= 0.05. Pixels with p > 0.05 get code 0 regardless of their quadrant.
65+
66+
## Mathematics
67+
68+
### Global Moran's I
69+
70+
```
71+
z = x - mean(x)
72+
lag = convolve(z, W) # W = queen or rook kernel
73+
I = (N / S0) * sum(z * lag) / sum(z^2)
74+
```
75+
76+
S0 (total weight sum) accounts for border effects and NaN gaps by convolving a non-NaN mask with the weight kernel and summing the result.
77+
78+
Analytical expected value and variance follow Cliff & Ord (1981):
79+
80+
```
81+
E[I] = -1 / (N - 1)
82+
Var[I] uses S0, S1, S2, N, and the kurtosis of z
83+
```
84+
85+
Where S1 = (1/2) * sum_ij (w_ij + w_ji)^2 and S2 = sum_i (sum_j w_ij + sum_j w_ji)^2. For symmetric binary weights (queen/rook), S1 = 2 * S0 and S2 simplifies.
86+
87+
### Local Moran's I (LISA)
88+
89+
```
90+
I_i = (z_i / var(x)) * sum_j(w_ij * z_j)
91+
```
92+
93+
### Permutation pseudo p-value
94+
95+
For each pixel i:
96+
1. Extract the neighbor values (up to 8 for queen, 4 for rook).
97+
2. Shuffle the neighbor values n_permutations times (Fisher-Yates).
98+
3. Recompute I_i with each shuffled set.
99+
4. p_value = (count(|I_perm| >= |I_obs|) + 1) / (n_permutations + 1)
100+
101+
The +1 correction includes the observed value as one permutation (Davison & Hinkley, 1997).
102+
103+
### Cluster classification
104+
105+
| Code | Label | Condition (when p <= 0.05) |
106+
|------|-------|---------------------------|
107+
| 0 | NS | not significant |
108+
| 1 | HH | z_i > 0 and lag_i > 0 |
109+
| 2 | LL | z_i < 0 and lag_i < 0 |
110+
| 3 | HL | z_i > 0 and lag_i < 0 |
111+
| 4 | LH | z_i < 0 and lag_i > 0 |
112+
113+
### NaN handling
114+
115+
- NaN input pixels produce NaN in lisa_values and p_values, 0 in cluster.
116+
- NaN neighbors are excluded from lag sums (their weight drops to zero).
117+
- Constant rasters (zero variance) produce NaN for all statistics.
118+
119+
## Contiguity kernels
120+
121+
Queen (8 neighbors):
122+
```
123+
[[1, 1, 1],
124+
[1, 0, 1],
125+
[1, 1, 1]]
126+
```
127+
128+
Rook (4 neighbors):
129+
```
130+
[[0, 1, 0],
131+
[1, 0, 1],
132+
[0, 1, 0]]
133+
```
134+
135+
## Internal Architecture
136+
137+
### Backend dispatch
138+
139+
Both public functions validate input via `_validate_raster`, build the contiguity kernel, then dispatch through `ArrayTypeFunctionMapping`:
140+
141+
```
142+
morans_i / lisa
143+
-> _validate_raster(raster, ndim=2)
144+
-> kernel = _contiguity_kernel(contiguity)
145+
-> ArrayTypeFunctionMapping(numpy, cupy, dask, dask_cupy)(raster)(...)
146+
```
147+
148+
### Backend implementations
149+
150+
**numpy:** Compute mean/var eagerly. Spatial lag via `_convolve_2d_numpy` (imported from convolution module) or a local implementation. LISA permutation via `@ngjit` loop that iterates pixels, extracts 8 neighbors, runs Fisher-Yates shuffle n_permutations times.
151+
152+
**cupy:** Same structure but GPU arrays. Spatial lag via CuPy convolution. Permutation via `@cuda.jit` kernel where each thread handles one pixel. RNG uses `numba.cuda.random.xoroshiro128p_uniform_float32` for Fisher-Yates shuffle.
153+
154+
**dask+numpy:** Eagerly compute global mean, var, N with `da.compute()`. Pass as scalars to chunk function via `partial()`. Single `map_overlap(depth=1, boundary=...)` call with fused chunk function that computes lag + I_i + permutation + cluster. Returns `(3, H, W)` float32 array (lisa_values, p_values, cluster). Unpacked into Dataset after.
155+
156+
**dask+cupy:** Same structure as dask+numpy but chunk function dispatches to CUDA kernel internally.
157+
158+
### Fused LISA chunk function
159+
160+
```python
161+
def _lisa_chunk_numpy(chunk, kernel, global_mean, global_var, n_permutations, seed):
162+
"""Single-pass: lag + LISA + permutation + cluster for one chunk."""
163+
rows, cols = chunk.shape
164+
z = chunk - global_mean
165+
out = np.empty((3, rows, cols), dtype=np.float32)
166+
# For each pixel: read neighbors, compute lag, permute, classify
167+
_lisa_fused_ngjit(z, kernel, global_var, n_permutations, seed, out)
168+
return out
169+
```
170+
171+
The `@ngjit` inner function handles the pixel loop with neighbor extraction, lag computation, Fisher-Yates permutation, and cluster assignment in a single pass.
172+
173+
### Global Moran's I backend flow
174+
175+
```
176+
numpy:
177+
z = data - mean
178+
lag = convolve(z, kernel) # reuse focal convolution
179+
mask = ~isnan(data)
180+
S0 = sum(convolve(mask, kernel)) # total weight count
181+
N = sum(mask)
182+
I = (N / S0) * nansum(z * lag) / nansum(z^2)
183+
# analytical variance via S0, S1, S2, N
184+
185+
dask:
186+
mean, var, N = da.compute(da.nanmean(data), da.nanvar(data), da.sum(~da.isnan(data)))
187+
z = data - mean
188+
lag = map_overlap(_convolve_chunk, depth=1, ...)
189+
S0 = da.sum(map_overlap(_convolve_mask_chunk, depth=1, ...))
190+
I = (N / S0) * da.nansum(z * lag) / da.nansum(z**2)
191+
I = I.compute()
192+
```
193+
194+
## map_overlap specifics
195+
196+
- **depth:** 1 in both dimensions (3x3 kernel, half-size = 1)
197+
- **boundary:** passed through `_boundary_to_dask(boundary)`, default `np.nan`
198+
- **meta:** `np.array((), dtype=np.float32)` for numpy, `cupy.array((), dtype=cupy.float32)` for cupy
199+
- **Fused output:** LISA chunk function receives a 2D chunk and returns a `(3, H, W)` array. To make `map_overlap` accept this shape change, pass `new_axis=0` so dask knows the output gains a leading dimension. After the `map_overlap` call, slice `result[0]`, `result[1]`, `result[2]` to get the three output bands. If `new_axis` with `map_overlap` proves brittle at runtime, fall back to three separate `map_overlap` calls (one per output band) at the cost of 3x neighbor reads.
200+
201+
## Seed handling
202+
203+
Permutation tests need reproducible results for cross-backend testing.
204+
205+
- numpy/dask+numpy: `np.random.SeedSequence(seed)` spawns per-pixel child sequences. In practice, the `@ngjit` function uses a simple LCG seeded with `seed + pixel_linear_index` for Fisher-Yates shuffles over 8 elements. Good enough for 8-element shuffles.
206+
- cupy/dask+cupy: `xoroshiro128p` states initialized with `create_xoroshiro128p_states(n_threads, seed)`.
207+
- Public API does not expose seed. Internal backends accept it for testing. Default seed is 0 for determinism within a single call (users wanting different random draws can call twice -- permutation p-values are not sensitive to seed choice for n >= 999).
208+
209+
## Testing
210+
211+
File: `xrspatial/tests/test_autocorrelation.py`
212+
213+
### Known-value tests
214+
215+
| Input | Expected I | Rationale |
216+
|-------|-----------|-----------|
217+
| 4x4 checkerboard | I < -0.8 | Strong negative autocorrelation |
218+
| 4x4 row gradient | I > 0.5 | Positive autocorrelation |
219+
| 8x8 random (seeded) | -0.3 < I < 0.3 | Near zero |
220+
| Constant value | NaN | Zero variance |
221+
222+
### LISA tests
223+
224+
- Checkerboard: all I_i negative, all clusters HL or LH, all p-values < 0.05
225+
- Gradient: center pixels positive (HH or LL), p-values < 0.05
226+
- NaN corners: NaN in output at those positions, valid elsewhere
227+
228+
### Edge cases
229+
230+
- Single-cell raster: returns NaN
231+
- All-NaN raster: returns NaN
232+
- Raster with one non-NaN pixel: returns NaN
233+
234+
### Cross-backend parity
235+
236+
- `assert_numpy_equals_dask_numpy` for both functions
237+
- `assert_numpy_equals_cupy` (skip if no GPU)
238+
- Fixed seed ensures identical permutation sequences
239+
240+
### Contiguity
241+
242+
- Queen vs rook produce different I values on same input
243+
- Rook on 4x4 checkerboard: I = -1.0 (perfect negative, all 4 neighbors opposite)
244+
245+
## Documentation
246+
247+
- Add API entry in `docs/source/reference/` (new section or extend focal tools)
248+
- Add row to README feature matrix under a new "Spatial Statistics" category
249+
250+
## Files changed
251+
252+
| File | Change |
253+
|------|--------|
254+
| `xrspatial/autocorrelation.py` | New module |
255+
| `xrspatial/__init__.py` | Export `morans_i`, `lisa` |
256+
| `xrspatial/tests/test_autocorrelation.py` | New test file |
257+
| `docs/source/reference/autocorrelation.rst` | New API docs |
258+
| `README.md` | Feature matrix row |
259+
| `examples/user_guide/` | New notebook |

0 commit comments

Comments
 (0)