Skip to content

Commit fb1916f

Browse files
authored
Add caveat and assumption admonitions to docs (#1134)
* Add caveat and assumption admonitions to docs (#1133) New user guide page (caveats.rst) collecting the cross-cutting assumptions: WGS84 hardcoded in geodesic path, float32 output, NaN-as-nodata, pixel-unit proximity, dask chunk requirements, GPU memory, and dimension ordering. Admonition blocks added to existing reference pages: surface, proximity, focal, classification, hydrology, terrain_metrics, and the data_types guide. * Fix review findings: proximity units, terrain dtype, missing pages (#1133) - Proximity EUCLIDEAN returns coordinate-unit distances, not pixels - Great-circle returns metres, not kilometres - Terrain metrics output float64, not float32; drop Horn attribution - Add admonitions to multispectral, zonal, and pathfinding pages - Clarify haversine section title
1 parent 45c714e commit fb1916f

File tree

12 files changed

+342
-2
lines changed

12 files changed

+342
-2
lines changed

docs/source/reference/classification.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
Classification
55
**************
66

7+
.. warning::
8+
9+
Classification functions silently set NaN and infinite input values
10+
to NaN in the output. Clean infinities before classifying if you
11+
want every cell assigned to a bin.
12+
713
Equal Interval
814
==============
915
.. autosummary::

docs/source/reference/focal.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@
44
Focal
55
*****
66

7+
.. caution::
8+
9+
Focal operations use ``dask.array.map_overlap``. Each chunk dimension
10+
must be **larger than the kernel depth** (typically 1 cell for a 3x3
11+
kernel). Chunks smaller than the depth produce wrong results.
12+
13+
.. note::
14+
15+
Focal functions output **float32**. NaN handling varies: ``mean``
16+
excludes NaN neighbours from the average, while ``hotspots`` propagates
17+
NaN from any neighbour.
18+
719
Apply
820
=====
921
.. autosummary::

docs/source/reference/hydrology.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@
44
Hydrology
55
*********
66

7+
.. warning::
8+
9+
NaN cells act as **impassable barriers** in all hydrology functions.
10+
Flow cannot cross them. If your DEM has NaN holes (e.g. water bodies
11+
masked out), fill or interpolate them first, or expect disconnected
12+
drainage networks.
13+
714
Flow Direction (D8)
815
===================
916
.. autosummary::

docs/source/reference/multispectral.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@
44
Multispectral
55
*************
66

7+
.. note::
8+
9+
All spectral indices output **float32**. Division by zero (e.g.
10+
NDVI where NIR + Red = 0) produces NaN or inf silently. Clean
11+
the result with ``xr.where(np.isfinite(result), result, np.nan)``
12+
if needed.
13+
714
Atmospherically Resistant Vegetation Index (ARVI)
815
=================================================
916
.. autosummary::

docs/source/reference/pathfinding.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,17 @@
44
Pathfinding
55
***********
66

7+
.. caution::
8+
9+
A* allocates about 65 bytes per pixel and will raise ``MemoryError``
10+
if the required memory exceeds 80 % of available RAM. Use Dask or
11+
set ``search_radius`` to limit the search area for large rasters.
12+
13+
.. warning::
14+
15+
NaN and non-positive friction values are treated as impassable
16+
barriers. Cells must have finite positive friction to be traversable.
17+
718
A* Pathfinding
819
==============
920
.. autosummary::

docs/source/reference/proximity.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@
44
Proximity
55
*********
66

7+
.. warning::
8+
9+
``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
10+
returns distances in the **coordinate units** of the DataArray.
11+
With default integer indices this equals pixel counts. Use
12+
``'GREAT_CIRCLE'`` for lat/lon data (returns metres).
13+
14+
.. caution::
15+
16+
With Dask, ``proximity()`` expands each chunk by ``max_distance`` cells.
17+
If ``max_distance`` is infinite (the default), the whole array is loaded
18+
into a single chunk. Set a finite ``max_distance`` to keep memory
19+
bounded.
20+
721
Allocation
822
==========
923
.. autosummary::

docs/source/reference/surface.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@
44
Surface
55
*******
66

7+
.. danger::
8+
9+
``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
10+
``method='geodesic'`` assume the **WGS84 ellipsoid** and require
11+
coordinates in **degrees** (geographic CRS). Passing projected
12+
coordinates (metres) to the geodesic method produces wrong results.
13+
Use ``method='planar'`` (the default) for projected data.
14+
15+
.. note::
16+
17+
All surface functions output **float32** regardless of input dtype.
18+
Edge cells within the 3x3 kernel radius are NaN by default
19+
(``boundary='nan'``).
20+
721
Aspect
822
======
923
.. autosummary::

docs/source/reference/terrain_metrics.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
Terrain Metrics
55
***************
66

7+
.. note::
8+
9+
Terrain metrics use a 3x3 neighbourhood and output **float64**. Edge
10+
cells are NaN. These functions assume the input is on a regular grid
11+
with uniform cell spacing.
12+
713
Roughness
814
=========
915
.. autosummary::

docs/source/reference/zonal.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,11 @@
44
Zonal
55
*****
66

7+
.. note::
8+
9+
NaN values are excluded from all zonal aggregations. A zone where
10+
every cell is NaN returns NaN (not zero) for sum, mean, etc.
11+
712
Apply
813
=====
914
.. autosummary::

docs/source/user_guide/caveats.rst

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
.. _user_guide.caveats:
2+
3+
***********************
4+
Caveats & Assumptions
5+
***********************
6+
7+
xarray-spatial makes several assumptions about input data that can produce
8+
wrong or confusing results if they aren't met. This page collects them in
9+
one place. Each section maps to a Sphinx admonition level so you can gauge
10+
severity at a glance:
11+
12+
.. danger:: marks assumptions that cause **silently wrong output**.
13+
14+
.. warning:: marks gotchas that **change the meaning** of results.
15+
16+
.. caution:: marks **performance or memory** pitfalls.
17+
18+
.. tip:: gives practical workarounds.
19+
20+
.. note:: provides context that clarifies non-obvious behaviour.
21+
22+
23+
.. contents:: On this page
24+
:local:
25+
:depth: 1
26+
27+
28+
Geodesic terrain functions assume WGS84
29+
========================================
30+
31+
.. danger::
32+
33+
``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
34+
``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m,
35+
b = 6 356 752 m). There is no parameter to select a different
36+
ellipsoid.
37+
38+
If your elevation data references a different ellipsoid (e.g. Clarke 1866 or
39+
Bessel 1841), the geodesic path will still run, but the ECEF projection and
40+
curvature correction will be slightly off. For most modern data this
41+
doesn't matter because WGS84 and GRS80 are functionally identical, but
42+
older survey-quality datasets on local datums can differ by tens of metres
43+
in the geoid.
44+
45+
46+
Projected coordinates + geodesic = wrong results
47+
=================================================
48+
49+
.. danger::
50+
51+
Passing a raster whose coordinates are in metres (UTM, state-plane, etc.)
52+
to ``method='geodesic'`` will produce garbage. The geodesic path expects
53+
latitude/longitude in degrees.
54+
55+
The library does validate that coordinate values fall within geographic
56+
ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a
57+
``ValueError`` if they don't. But this check is not foolproof: coordinates
58+
in small projected grids that happen to look like valid lat/lon values will
59+
slip through.
60+
61+
.. tip::
62+
63+
Use ``method='planar'`` (the default) for projected CRS data. Use
64+
``method='geodesic'`` only when your coordinates are in degrees on a
65+
geographic CRS.
66+
67+
68+
All output is float32
69+
=====================
70+
71+
.. warning::
72+
73+
Every analytical function casts its output to ``float32`` regardless of
74+
input dtype. If you pass ``float64`` elevation data, the result is
75+
``float32``.
76+
77+
Float32 gives about 7 significant digits, which is more than enough for
78+
slope, NDVI, and similar metrics. But if you're chaining many operations
79+
or working with data that needs sub-centimetre precision (e.g. LiDAR
80+
heights in a small range), the truncation can matter.
81+
82+
See :ref:`user_guide.data_types` for the full rationale.
83+
84+
.. tip::
85+
86+
If you need float64 output, cast after the function returns:
87+
88+
.. code-block:: python
89+
90+
result = slope(agg).astype('float64')
91+
92+
93+
NaN is the nodata sentinel
94+
==========================
95+
96+
.. warning::
97+
98+
xarray-spatial uses ``NaN`` as its nodata value everywhere. There is
99+
no parameter to choose a different sentinel (e.g. -9999).
100+
101+
Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN,
102+
the output cell is NaN. The exact behaviour varies by function:
103+
104+
- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output
105+
cell.
106+
- **Focal mean/std** -- NaN neighbours are excluded from the average; the
107+
cell itself is still computed.
108+
- **Zonal stats** -- NaN values are excluded from the aggregation. An
109+
all-NaN zone returns NaN, not zero.
110+
- **Hydrology (flow direction)** -- NaN cells are treated as impassable
111+
barriers. Flow cannot cross them.
112+
- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers.
113+
114+
.. note::
115+
116+
Edge cells (within the kernel radius of the raster boundary) default to
117+
NaN when ``boundary='nan'``. Use ``boundary='nearest'`` or
118+
``boundary='reflect'`` if you need values at the edges.
119+
120+
121+
Proximity distances depend on coordinate units
122+
================================================
123+
124+
.. warning::
125+
126+
``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
127+
computes distances in whatever units the DataArray's coordinates use.
128+
If coordinates are default integer indices (0, 1, 2, ...), the result
129+
is in pixel counts. If coordinates are in metres (UTM), the result is
130+
in metres. If coordinates are in degrees, the result is in degrees.
131+
132+
Switch to ``distance_metric='GREAT_CIRCLE'`` for lat/lon data to get
133+
distances in **metres** (the radius used is 6 378 137 m).
134+
135+
.. tip::
136+
137+
.. code-block:: python
138+
139+
from xrspatial.proximity import proximity
140+
141+
# EUCLIDEAN with default coords = pixel-count distances
142+
dist_px = proximity(raster)
143+
144+
# GREAT_CIRCLE with lat/lon coords = distances in metres
145+
dist_m = proximity(raster, distance_metric='GREAT_CIRCLE')
146+
147+
148+
Great-circle distances assume sphere radius = semi-major axis
149+
==============================================================
150+
151+
.. note::
152+
153+
Great-circle distances in ``proximity()`` and ``surface_distance()``
154+
model the Earth as a sphere with radius = WGS84 semi-major axis
155+
(6 378 137 m), not the mean radius (~6 371 km).
156+
157+
The difference is about 0.1 %. This is negligible for most use cases but
158+
worth knowing if you're comparing against a reference that uses the mean
159+
radius or the IUGG value.
160+
161+
162+
Dask chunk sizes and ``map_overlap``
163+
=====================================
164+
165+
.. caution::
166+
167+
Focal, slope, aspect, and other kernel-based operations use
168+
``dask.array.map_overlap``. If any chunk dimension is **smaller than
169+
the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap
170+
will fail or produce wrong results.
171+
172+
.. caution::
173+
174+
``proximity()`` with Dask expands each chunk by ``max_distance`` cells.
175+
If ``max_distance`` is infinite (the default), the entire array is loaded
176+
into one chunk. Set a finite ``max_distance`` to keep memory bounded.
177+
178+
.. tip::
179+
180+
A good rule of thumb for focal operations: keep chunks at least 64 cells
181+
on each side. For proximity, set ``max_distance`` to the largest
182+
distance you actually care about.
183+
184+
.. code-block:: python
185+
186+
import dask.array as da
187+
import xarray as xr
188+
189+
# Rechunk to safe sizes before running slope
190+
agg = xr.DataArray(
191+
da.from_array(big_array, chunks=(512, 512))
192+
)
193+
194+
195+
GPU memory is not managed automatically
196+
========================================
197+
198+
.. caution::
199+
200+
CuPy-backed operations allocate the full output array on the GPU.
201+
There is no automatic tiling or fallback to CPU if the array exceeds
202+
VRAM.
203+
204+
If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory
205+
.OutOfMemoryError``, the array is too large for your GPU. Use Dask + CuPy
206+
to process in chunks, or fall back to NumPy.
207+
208+
.. tip::
209+
210+
Complex geodesic kernels (slope, aspect with ``method='geodesic'``)
211+
use many float64 registers, which limits thread-block occupancy.
212+
xarray-spatial already reduces thread blocks to 16x16 for these
213+
kernels, but very old GPUs (compute capability < 6.0) may still
214+
hit register spill.
215+
216+
217+
Dimension ordering
218+
==================
219+
220+
.. note::
221+
222+
xarray-spatial assumes the last two dimensions of a DataArray are
223+
``(y, x)`` in that order. It does not inspect ``attrs['crs']`` or
224+
CF conventions to verify this.
225+
226+
If your data has ``(x, y)`` ordering (rare, but it happens with some
227+
netCDF conventions), transpose before passing to any function:
228+
229+
.. code-block:: python
230+
231+
agg = agg.transpose('y', 'x')
232+
233+
234+
Classification and NaN/Inf
235+
===========================
236+
237+
.. warning::
238+
239+
``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other
240+
classification functions set NaN and infinite input values to NaN in the
241+
output. They do not raise an error.
242+
243+
This means a raster with a few stray ``inf`` values (e.g. from a division
244+
by zero upstream) will silently produce NaN bins for those cells.
245+
246+
.. tip::
247+
248+
Clean infinities before classifying:
249+
250+
.. code-block:: python
251+
252+
import numpy as np
253+
agg = agg.where(np.isfinite(agg))

0 commit comments

Comments
 (0)