Skip to content

Commit 646f45e

Browse files
committed
Add caveat and assumption admonitions to docs (#1133)
New user guide page (caveats.rst) collecting the cross-cutting assumptions: WGS84 hardcoded in geodesic path, float32 output, NaN-as-nodata, pixel-unit proximity, dask chunk requirements, GPU memory, and dimension ordering. Admonition blocks added to existing reference pages: surface, proximity, focal, classification, hydrology, terrain_metrics, and the data_types guide.
1 parent 45c714e commit 646f45e

File tree

9 files changed

+319
-2
lines changed

9 files changed

+319
-2
lines changed

docs/source/reference/classification.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
Classification
55
**************
66

7+
.. warning::
8+
9+
Classification functions silently set NaN and infinite input values
10+
to NaN in the output. Clean infinities before classifying if you
11+
want every cell assigned to a bin.
12+
713
Equal Interval
814
==============
915
.. autosummary::

docs/source/reference/focal.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@
44
Focal
55
*****
66

7+
.. caution::
8+
9+
Focal operations use ``dask.array.map_overlap``. Each chunk dimension
10+
must be **larger than the kernel depth** (typically 1 cell for a 3x3
11+
kernel). Chunks smaller than the depth produce wrong results.
12+
13+
.. note::
14+
15+
Focal functions output **float32**. NaN handling varies: ``mean``
16+
excludes NaN neighbours from the average, while ``hotspots`` propagates
17+
NaN from any neighbour.
18+
719
Apply
820
=====
921
.. autosummary::

docs/source/reference/hydrology.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@
44
Hydrology
55
*********
66

7+
.. warning::
8+
9+
NaN cells act as **impassable barriers** in all hydrology functions.
10+
Flow cannot cross them. If your DEM has NaN holes (e.g. water bodies
11+
masked out), fill or interpolate them first, or expect disconnected
12+
drainage networks.
13+
714
Flow Direction (D8)
815
===================
916
.. autosummary::

docs/source/reference/proximity.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@
44
Proximity
55
*********
66

7+
.. warning::
8+
9+
``proximity()`` returns distances in **pixel units** by default
10+
(``distance_metric='EUCLIDEAN'``). Multiply by cell size to get
11+
real-world units, or use ``'GREAT_CIRCLE'`` for lat/lon data
12+
(returns kilometres).
13+
14+
.. caution::
15+
16+
With Dask, ``proximity()`` expands each chunk by ``max_distance`` cells.
17+
If ``max_distance`` is infinite (the default), the whole array is loaded
18+
into a single chunk. Set a finite ``max_distance`` to keep memory
19+
bounded.
20+
721
Allocation
822
==========
923
.. autosummary::

docs/source/reference/surface.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@
44
Surface
55
*******
66

7+
.. danger::
8+
9+
``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
10+
``method='geodesic'`` assume the **WGS84 ellipsoid** and require
11+
coordinates in **degrees** (geographic CRS). Passing projected
12+
coordinates (metres) to the geodesic method produces wrong results.
13+
Use ``method='planar'`` (the default) for projected data.
14+
15+
.. note::
16+
17+
All surface functions output **float32** regardless of input dtype.
18+
Edge cells within the 3x3 kernel radius are NaN by default
19+
(``boundary='nan'``).
20+
721
Aspect
822
======
923
.. autosummary::

docs/source/reference/terrain_metrics.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44
Terrain Metrics
55
***************
66

7+
.. note::
8+
9+
Terrain metrics use a 3x3 kernel and output **float32**. Edge cells
10+
are NaN by default. These functions use the planar (Horn) algorithm
11+
and assume the input is on a regular grid with uniform cell spacing.
12+
713
Roughness
814
=========
915
.. autosummary::

docs/source/user_guide/caveats.rst

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
.. _user_guide.caveats:
2+
3+
***********************
4+
Caveats & Assumptions
5+
***********************
6+
7+
xarray-spatial makes several assumptions about input data that can produce
8+
wrong or confusing results if they aren't met. This page collects them in
9+
one place. Each section maps to a Sphinx admonition level so you can gauge
10+
severity at a glance:
11+
12+
.. danger:: marks assumptions that cause **silently wrong output**.
13+
14+
.. warning:: marks gotchas that **change the meaning** of results.
15+
16+
.. caution:: marks **performance or memory** pitfalls.
17+
18+
.. tip:: gives practical workarounds.
19+
20+
.. note:: provides context that clarifies non-obvious behaviour.
21+
22+
23+
.. contents:: On this page
24+
:local:
25+
:depth: 1
26+
27+
28+
Geodesic terrain functions assume WGS84
29+
========================================
30+
31+
.. danger::
32+
33+
``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
34+
``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m,
35+
b = 6 356 752 m). There is no parameter to select a different
36+
ellipsoid.
37+
38+
If your elevation data references a different ellipsoid (e.g. Clarke 1866 or
39+
Bessel 1841), the geodesic path will still run, but the ECEF projection and
40+
curvature correction will be slightly off. For most modern data this
41+
doesn't matter because WGS84 and GRS80 are functionally identical, but
42+
older survey-quality datasets on local datums can differ by tens of metres
43+
in the geoid.
44+
45+
46+
Projected coordinates + geodesic = wrong results
47+
=================================================
48+
49+
.. danger::
50+
51+
Passing a raster whose coordinates are in metres (UTM, state-plane, etc.)
52+
to ``method='geodesic'`` will produce garbage. The geodesic path expects
53+
latitude/longitude in degrees.
54+
55+
The library does validate that coordinate values fall within geographic
56+
ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a
57+
``ValueError`` if they don't. But this check is not foolproof: coordinates
58+
in small projected grids that happen to look like valid lat/lon values will
59+
slip through.
60+
61+
.. tip::
62+
63+
Use ``method='planar'`` (the default) for projected CRS data. Use
64+
``method='geodesic'`` only when your coordinates are in degrees on a
65+
geographic CRS.
66+
67+
68+
All output is float32
69+
=====================
70+
71+
.. warning::
72+
73+
Every analytical function casts its output to ``float32`` regardless of
74+
input dtype. If you pass ``float64`` elevation data, the result is
75+
``float32``.
76+
77+
Float32 gives about 7 significant digits, which is more than enough for
78+
slope, NDVI, and similar metrics. But if you're chaining many operations
79+
or working with data that needs sub-centimetre precision (e.g. LiDAR
80+
heights in a small range), the truncation can matter.
81+
82+
See :ref:`user_guide.data_types` for the full rationale.
83+
84+
.. tip::
85+
86+
If you need float64 output, cast after the function returns:
87+
88+
.. code-block:: python
89+
90+
result = slope(agg).astype('float64')
91+
92+
93+
NaN is the nodata sentinel
94+
==========================
95+
96+
.. warning::
97+
98+
xarray-spatial uses ``NaN`` as its nodata value everywhere. There is
99+
no parameter to choose a different sentinel (e.g. -9999).
100+
101+
Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN,
102+
the output cell is NaN. The exact behaviour varies by function:
103+
104+
- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output
105+
cell.
106+
- **Focal mean/std** -- NaN neighbours are excluded from the average; the
107+
cell itself is still computed.
108+
- **Zonal stats** -- NaN values are excluded from the aggregation. An
109+
all-NaN zone returns NaN, not zero.
110+
- **Hydrology (flow direction)** -- NaN cells are treated as impassable
111+
barriers. Flow cannot cross them.
112+
- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers.
113+
114+
.. note::
115+
116+
Edge cells (within the kernel radius of the raster boundary) default to
117+
NaN when ``boundary='nan'``. Use ``boundary='nearest'`` or
118+
``boundary='reflect'`` if you need values at the edges.
119+
120+
121+
Proximity defaults to pixel units
122+
==================================
123+
124+
.. warning::
125+
126+
``proximity()`` returns distances in **pixel units** when
127+
``distance_metric='EUCLIDEAN'`` (the default). It does not automatically
128+
convert to metres or kilometres.
129+
130+
To get distances in real-world units with Euclidean mode, multiply the
131+
result by your cell size. Or switch to ``distance_metric='GREAT_CIRCLE'``
132+
if your data is in degrees, which returns kilometres.
133+
134+
.. tip::
135+
136+
.. code-block:: python
137+
138+
from xrspatial.proximity import proximity
139+
140+
# Pixel-unit result
141+
dist_px = proximity(raster)
142+
143+
# Convert to metres (assumes square cells)
144+
cell_m = 30.0 # e.g. Landsat 30 m resolution
145+
dist_m = dist_px * cell_m
146+
147+
148+
Haversine uses the semi-major axis
149+
====================================
150+
151+
.. note::
152+
153+
Great-circle distances in ``proximity()`` and ``surface_distance()``
154+
use the WGS84 **semi-major axis** (6 378 137 m) as the Earth radius,
155+
not the mean radius (6 371 km).
156+
157+
The difference is about 0.1 %. This is negligible for most use cases but
158+
worth knowing if you're comparing against a reference that uses the mean
159+
radius or the IUGG value.
160+
161+
162+
Dask chunk sizes and ``map_overlap``
163+
=====================================
164+
165+
.. caution::
166+
167+
Focal, slope, aspect, and other kernel-based operations use
168+
``dask.array.map_overlap``. If any chunk dimension is **smaller than
169+
the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap
170+
will fail or produce wrong results.
171+
172+
.. caution::
173+
174+
``proximity()`` with Dask expands each chunk by ``max_distance`` cells.
175+
If ``max_distance`` is infinite (the default), the entire array is loaded
176+
into one chunk. Set a finite ``max_distance`` to keep memory bounded.
177+
178+
.. tip::
179+
180+
A good rule of thumb for focal operations: keep chunks at least 64 cells
181+
on each side. For proximity, set ``max_distance`` to the largest
182+
distance you actually care about.
183+
184+
.. code-block:: python
185+
186+
import dask.array as da
187+
import xarray as xr
188+
189+
# Rechunk to safe sizes before running slope
190+
agg = xr.DataArray(
191+
da.from_array(big_array, chunks=(512, 512))
192+
)
193+
194+
195+
GPU memory is not managed automatically
196+
========================================
197+
198+
.. caution::
199+
200+
CuPy-backed operations allocate the full output array on the GPU.
201+
There is no automatic tiling or fallback to CPU if the array exceeds
202+
VRAM.
203+
204+
If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory
205+
.OutOfMemoryError``, the array is too large for your GPU. Use Dask + CuPy
206+
to process in chunks, or fall back to NumPy.
207+
208+
.. tip::
209+
210+
Complex geodesic kernels (slope, aspect with ``method='geodesic'``)
211+
use many float64 registers, which limits thread-block occupancy.
212+
xarray-spatial already reduces thread blocks to 16x16 for these
213+
kernels, but very old GPUs (compute capability < 6.0) may still
214+
hit register spill.
215+
216+
217+
Dimension ordering
218+
==================
219+
220+
.. note::
221+
222+
xarray-spatial assumes the last two dimensions of a DataArray are
223+
``(y, x)`` in that order. It does not inspect ``attrs['crs']`` or
224+
CF conventions to verify this.
225+
226+
If your data has ``(x, y)`` ordering (rare, but it happens with some
227+
netCDF conventions), transpose before passing to any function:
228+
229+
.. code-block:: python
230+
231+
agg = agg.transpose('y', 'x')
232+
233+
234+
Classification and NaN/Inf
235+
===========================
236+
237+
.. warning::
238+
239+
``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other
240+
classification functions set NaN and infinite input values to NaN in the
241+
output. They do not raise an error.
242+
243+
This means a raster with a few stray ``inf`` values (e.g. from a division
244+
by zero upstream) will silently produce NaN bins for those cells.
245+
246+
.. tip::
247+
248+
Clean infinities before classifying:
249+
250+
.. code-block:: python
251+
252+
import numpy as np
253+
agg = agg.where(np.isfinite(agg))

docs/source/user_guide/data_types.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,12 @@ Overview
1111

1212
xarray-spatial standardizes on **float32** (32-bit floating-point) for output data in most analytical functions. This design decision provides a balance between computational precision, memory efficiency, and performance that is well-suited for raster analysis tasks.
1313

14-
.. note::
15-
All multispectral indices, terrain analysis functions, and most other operations in xarray-spatial output float32 data, regardless of input data type.
14+
.. warning::
15+
16+
All multispectral indices, terrain analysis functions, and most other
17+
operations in xarray-spatial output **float32** data, regardless of input
18+
data type. If you pass float64 data, the output is float32. Cast the
19+
result with ``.astype('float64')`` if you need higher precision.
1620

1721

1822
Why Float32?

docs/source/user_guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ User Guide
77
.. toctree::
88
:maxdepth: 1
99

10+
caveats
1011
data_types
1112
classification
1213
fire

0 commit comments

Comments
 (0)