Add caveat and assumption admonitions to docs (#1134)

brendancol · web-flow · commit fb1916f65168 · 2026-04-01T08:20:02.000-07:00
* Add caveat and assumption admonitions to docs (#1133) New user guide page (caveats.rst) collecting the cross-cutting assumptions: WGS84 hardcoded in geodesic path, float32 output, NaN-as-nodata, pixel-unit proximity, dask chunk requirements, GPU memory, and dimension ordering. Admonition blocks added to existing reference pages: surface, proximity, focal, classification, hydrology, terrain_metrics, and the data_types guide. * Fix review findings: proximity units, terrain dtype, missing pages (#1133) - Proximity EUCLIDEAN returns coordinate-unit distances, not pixels - Great-circle returns metres, not kilometres - Terrain metrics output float64, not float32; drop Horn attribution - Add admonitions to multispectral, zonal, and pathfinding pages - Clarify haversine section title
diff --git a/docs/source/reference/classification.rst b/docs/source/reference/classification.rst
@@ -4,6 +4,12 @@
 Classification
 **************
 
+.. warning::
+
+   Classification functions silently set NaN and infinite input values
+   to NaN in the output.  Clean infinities before classifying if you
+   want every cell assigned to a bin.
+
 Equal Interval
 ==============
 .. autosummary::
diff --git a/docs/source/reference/focal.rst b/docs/source/reference/focal.rst
@@ -4,6 +4,18 @@
 Focal
 *****
 
+.. caution::
+
+   Focal operations use ``dask.array.map_overlap``.  Each chunk dimension
+   must be **larger than the kernel depth** (typically 1 cell for a 3x3
+   kernel).  Chunks smaller than the depth produce wrong results.
+
+.. note::
+
+   Focal functions output **float32**.  NaN handling varies: ``mean``
+   excludes NaN neighbours from the average, while ``hotspots`` propagates
+   NaN from any neighbour.
+
 Apply
 =====
 .. autosummary::
diff --git a/docs/source/reference/hydrology.rst b/docs/source/reference/hydrology.rst
@@ -4,6 +4,13 @@
 Hydrology
 *********
 
+.. warning::
+
+   NaN cells act as **impassable barriers** in all hydrology functions.
+   Flow cannot cross them.  If your DEM has NaN holes (e.g. water bodies
+   masked out), fill or interpolate them first, or expect disconnected
+   drainage networks.
+
 Flow Direction (D8)
 ===================
 .. autosummary::
diff --git a/docs/source/reference/multispectral.rst b/docs/source/reference/multispectral.rst
@@ -4,6 +4,13 @@
 Multispectral
 *************
 
+.. note::
+
+   All spectral indices output **float32**.  Division by zero (e.g.
+   NDVI where NIR + Red = 0) produces NaN or inf silently.  Clean
+   the result with ``xr.where(np.isfinite(result), result, np.nan)``
+   if needed.
+
 Atmospherically Resistant Vegetation Index (ARVI)
 =================================================
 .. autosummary::
diff --git a/docs/source/reference/pathfinding.rst b/docs/source/reference/pathfinding.rst
@@ -4,6 +4,17 @@
 Pathfinding
 ***********
 
+.. caution::
+
+   A* allocates about 65 bytes per pixel and will raise ``MemoryError``
+   if the required memory exceeds 80 % of available RAM.  Use Dask or
+   set ``search_radius`` to limit the search area for large rasters.
+
+.. warning::
+
+   NaN and non-positive friction values are treated as impassable
+   barriers.  Cells must have finite positive friction to be traversable.
+
 A* Pathfinding
 ==============
 .. autosummary::
diff --git a/docs/source/reference/proximity.rst b/docs/source/reference/proximity.rst
@@ -4,6 +4,20 @@
 Proximity
 *********
 
+.. warning::
+
+   ``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
+   returns distances in the **coordinate units** of the DataArray.
+   With default integer indices this equals pixel counts.  Use
+   ``'GREAT_CIRCLE'`` for lat/lon data (returns metres).
+
+.. caution::
+
+   With Dask, ``proximity()`` expands each chunk by ``max_distance`` cells.
+   If ``max_distance`` is infinite (the default), the whole array is loaded
+   into a single chunk.  Set a finite ``max_distance`` to keep memory
+   bounded.
+
 Allocation
 ==========
 .. autosummary::
diff --git a/docs/source/reference/surface.rst b/docs/source/reference/surface.rst
@@ -4,6 +4,20 @@
 Surface
 *******
 
+.. danger::
+
+   ``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
+   ``method='geodesic'`` assume the **WGS84 ellipsoid** and require
+   coordinates in **degrees** (geographic CRS).  Passing projected
+   coordinates (metres) to the geodesic method produces wrong results.
+   Use ``method='planar'`` (the default) for projected data.
+
+.. note::
+
+   All surface functions output **float32** regardless of input dtype.
+   Edge cells within the 3x3 kernel radius are NaN by default
+   (``boundary='nan'``).
+
 Aspect
 ======
 .. autosummary::
diff --git a/docs/source/reference/terrain_metrics.rst b/docs/source/reference/terrain_metrics.rst
@@ -4,6 +4,12 @@
 Terrain Metrics
 ***************
 
+.. note::
+
+   Terrain metrics use a 3x3 neighbourhood and output **float64**.  Edge
+   cells are NaN.  These functions assume the input is on a regular grid
+   with uniform cell spacing.
+
 Roughness
 =========
 .. autosummary::
diff --git a/docs/source/reference/zonal.rst b/docs/source/reference/zonal.rst
@@ -4,6 +4,11 @@
 Zonal
 *****
 
+.. note::
+
+   NaN values are excluded from all zonal aggregations.  A zone where
+   every cell is NaN returns NaN (not zero) for sum, mean, etc.
+
 Apply
 =====
 .. autosummary::
diff --git a/docs/source/user_guide/caveats.rst b/docs/source/user_guide/caveats.rst
@@ -0,0 +1,253 @@
+..  _user_guide.caveats:
+
+***********************
+Caveats & Assumptions
+***********************
+
+xarray-spatial makes several assumptions about input data that can produce
+wrong or confusing results if they aren't met.  This page collects them in
+one place.  Each section maps to a Sphinx admonition level so you can gauge
+severity at a glance:
+
+.. danger:: marks assumptions that cause **silently wrong output**.
+
+.. warning:: marks gotchas that **change the meaning** of results.
+
+.. caution:: marks **performance or memory** pitfalls.
+
+.. tip:: gives practical workarounds.
+
+.. note:: provides context that clarifies non-obvious behaviour.
+
+
+.. contents:: On this page
+   :local:
+   :depth: 1
+
+
+Geodesic terrain functions assume WGS84
+========================================
+
+.. danger::
+
+   ``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
+   ``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m,
+   b = 6 356 752 m).  There is no parameter to select a different
+   ellipsoid.
+
+If your elevation data references a different ellipsoid (e.g. Clarke 1866 or
+Bessel 1841), the geodesic path will still run, but the ECEF projection and
+curvature correction will be slightly off.  For most modern data this
+doesn't matter because WGS84 and GRS80 are functionally identical, but
+older survey-quality datasets on local datums can differ by tens of metres
+in the geoid.
+
+
+Projected coordinates + geodesic = wrong results
+=================================================
+
+.. danger::
+
+   Passing a raster whose coordinates are in metres (UTM, state-plane, etc.)
+   to ``method='geodesic'`` will produce garbage.  The geodesic path expects
+   latitude/longitude in degrees.
+
+The library does validate that coordinate values fall within geographic
+ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a
+``ValueError`` if they don't.  But this check is not foolproof: coordinates
+in small projected grids that happen to look like valid lat/lon values will
+slip through.
+
+.. tip::
+
+   Use ``method='planar'`` (the default) for projected CRS data.  Use
+   ``method='geodesic'`` only when your coordinates are in degrees on a
+   geographic CRS.
+
+
+All output is float32
+=====================
+
+.. warning::
+
+   Every analytical function casts its output to ``float32`` regardless of
+   input dtype.  If you pass ``float64`` elevation data, the result is
+   ``float32``.
+
+Float32 gives about 7 significant digits, which is more than enough for
+slope, NDVI, and similar metrics.  But if you're chaining many operations
+or working with data that needs sub-centimetre precision (e.g. LiDAR
+heights in a small range), the truncation can matter.
+
+See :ref:`user_guide.data_types` for the full rationale.
+
+.. tip::
+
+   If you need float64 output, cast after the function returns:
+
+   .. code-block:: python
+
+      result = slope(agg).astype('float64')
+
+
+NaN is the nodata sentinel
+==========================
+
+.. warning::
+
+   xarray-spatial uses ``NaN`` as its nodata value everywhere.  There is
+   no parameter to choose a different sentinel (e.g. -9999).
+
+Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN,
+the output cell is NaN.  The exact behaviour varies by function:
+
+- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output
+  cell.
+- **Focal mean/std** -- NaN neighbours are excluded from the average; the
+  cell itself is still computed.
+- **Zonal stats** -- NaN values are excluded from the aggregation.  An
+  all-NaN zone returns NaN, not zero.
+- **Hydrology (flow direction)** -- NaN cells are treated as impassable
+  barriers.  Flow cannot cross them.
+- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers.
+
+.. note::
+
+   Edge cells (within the kernel radius of the raster boundary) default to
+   NaN when ``boundary='nan'``.  Use ``boundary='nearest'`` or
+   ``boundary='reflect'`` if you need values at the edges.
+
+
+Proximity distances depend on coordinate units
+================================================
+
+.. warning::
+
+   ``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
+   computes distances in whatever units the DataArray's coordinates use.
+   If coordinates are default integer indices (0, 1, 2, ...), the result
+   is in pixel counts.  If coordinates are in metres (UTM), the result is
+   in metres.  If coordinates are in degrees, the result is in degrees.
+
+Switch to ``distance_metric='GREAT_CIRCLE'`` for lat/lon data to get
+distances in **metres** (the radius used is 6 378 137 m).
+
+.. tip::
+
+   .. code-block:: python
+
+      from xrspatial.proximity import proximity
+
+      # EUCLIDEAN with default coords = pixel-count distances
+      dist_px = proximity(raster)
+
+      # GREAT_CIRCLE with lat/lon coords = distances in metres
+      dist_m = proximity(raster, distance_metric='GREAT_CIRCLE')
+
+
+Great-circle distances assume sphere radius = semi-major axis
+==============================================================
+
+.. note::
+
+   Great-circle distances in ``proximity()`` and ``surface_distance()``
+   model the Earth as a sphere with radius = WGS84 semi-major axis
+   (6 378 137 m), not the mean radius (~6 371 km).
+
+The difference is about 0.1 %.  This is negligible for most use cases but
+worth knowing if you're comparing against a reference that uses the mean
+radius or the IUGG value.
+
+
+Dask chunk sizes and ``map_overlap``
+=====================================
+
+.. caution::
+
+   Focal, slope, aspect, and other kernel-based operations use
+   ``dask.array.map_overlap``.  If any chunk dimension is **smaller than
+   the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap
+   will fail or produce wrong results.
+
+.. caution::
+
+   ``proximity()`` with Dask expands each chunk by ``max_distance`` cells.
+   If ``max_distance`` is infinite (the default), the entire array is loaded
+   into one chunk.  Set a finite ``max_distance`` to keep memory bounded.
+
+.. tip::
+
+   A good rule of thumb for focal operations: keep chunks at least 64 cells
+   on each side.  For proximity, set ``max_distance`` to the largest
+   distance you actually care about.
+
+   .. code-block:: python
+
+      import dask.array as da
+      import xarray as xr
+
+      # Rechunk to safe sizes before running slope
+      agg = xr.DataArray(
+          da.from_array(big_array, chunks=(512, 512))
+      )
+
+
+GPU memory is not managed automatically
+========================================
+
+.. caution::
+
+   CuPy-backed operations allocate the full output array on the GPU.
+   There is no automatic tiling or fallback to CPU if the array exceeds
+   VRAM.
+
+If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory
+.OutOfMemoryError``, the array is too large for your GPU.  Use Dask + CuPy
+to process in chunks, or fall back to NumPy.
+
+.. tip::
+
+   Complex geodesic kernels (slope, aspect with ``method='geodesic'``)
+   use many float64 registers, which limits thread-block occupancy.
+   xarray-spatial already reduces thread blocks to 16x16 for these
+   kernels, but very old GPUs (compute capability < 6.0) may still
+   hit register spill.
+
+
+Dimension ordering
+==================
+
+.. note::
+
+   xarray-spatial assumes the last two dimensions of a DataArray are
+   ``(y, x)`` in that order.  It does not inspect ``attrs['crs']`` or
+   CF conventions to verify this.
+
+If your data has ``(x, y)`` ordering (rare, but it happens with some
+netCDF conventions), transpose before passing to any function:
+
+.. code-block:: python
+
+   agg = agg.transpose('y', 'x')
+
+
+Classification and NaN/Inf
+===========================
+
+.. warning::
+
+   ``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other
+   classification functions set NaN and infinite input values to NaN in the
+   output.  They do not raise an error.
+
+This means a raster with a few stray ``inf`` values (e.g. from a division
+by zero upstream) will silently produce NaN bins for those cells.
+
+.. tip::
+
+   Clean infinities before classifying:
+
+   .. code-block:: python
+
+      import numpy as np
+      agg = agg.where(np.isfinite(agg))
diff --git a/docs/source/user_guide/data_types.rst b/docs/source/user_guide/data_types.rst
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst