|
| 1 | +.. _user_guide.caveats: |
| 2 | + |
| 3 | +*********************** |
| 4 | +Caveats & Assumptions |
| 5 | +*********************** |
| 6 | + |
| 7 | +xarray-spatial makes several assumptions about input data that can produce |
| 8 | +wrong or confusing results if they aren't met. This page collects them in |
| 9 | +one place. Each section maps to a Sphinx admonition level so you can gauge |
| 10 | +severity at a glance: |
| 11 | + |
| 12 | +.. danger:: marks assumptions that cause **silently wrong output**. |
| 13 | + |
| 14 | +.. warning:: marks gotchas that **change the meaning** of results. |
| 15 | + |
| 16 | +.. caution:: marks **performance or memory** pitfalls. |
| 17 | + |
| 18 | +.. tip:: gives practical workarounds. |
| 19 | + |
| 20 | +.. note:: provides context that clarifies non-obvious behaviour. |
| 21 | + |
| 22 | + |
| 23 | +.. contents:: On this page |
| 24 | + :local: |
| 25 | + :depth: 1 |
| 26 | + |
| 27 | + |
| 28 | +Geodesic terrain functions assume WGS84 |
| 29 | +======================================== |
| 30 | + |
| 31 | +.. danger:: |
| 32 | + |
| 33 | + ``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with |
| 34 | + ``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m, |
| 35 | + b = 6 356 752 m). There is no parameter to select a different |
| 36 | + ellipsoid. |
| 37 | + |
| 38 | +If your elevation data references a different ellipsoid (e.g. Clarke 1866 or |
| 39 | +Bessel 1841), the geodesic path will still run, but the ECEF projection and |
| 40 | +curvature correction will be slightly off. For most modern data this |
| 41 | +doesn't matter because WGS84 and GRS80 are functionally identical, but |
| 42 | +older survey-quality datasets on local datums can differ by tens of metres |
| 43 | +in the geoid. |
| 44 | + |
| 45 | + |
| 46 | +Projected coordinates + geodesic = wrong results |
| 47 | +================================================= |
| 48 | + |
| 49 | +.. danger:: |
| 50 | + |
| 51 | + Passing a raster whose coordinates are in metres (UTM, state-plane, etc.) |
| 52 | + to ``method='geodesic'`` will produce garbage. The geodesic path expects |
| 53 | + latitude/longitude in degrees. |
| 54 | + |
| 55 | +The library does validate that coordinate values fall within geographic |
| 56 | +ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a |
| 57 | +``ValueError`` if they don't. But this check is not foolproof: coordinates |
| 58 | +in small projected grids that happen to look like valid lat/lon values will |
| 59 | +slip through. |
| 60 | + |
| 61 | +.. tip:: |
| 62 | + |
| 63 | + Use ``method='planar'`` (the default) for projected CRS data. Use |
| 64 | + ``method='geodesic'`` only when your coordinates are in degrees on a |
| 65 | + geographic CRS. |
| 66 | + |
| 67 | + |
| 68 | +All output is float32 |
| 69 | +===================== |
| 70 | + |
| 71 | +.. warning:: |
| 72 | + |
| 73 | + Every analytical function casts its output to ``float32`` regardless of |
| 74 | + input dtype. If you pass ``float64`` elevation data, the result is |
| 75 | + ``float32``. |
| 76 | + |
| 77 | +Float32 gives about 7 significant digits, which is more than enough for |
| 78 | +slope, NDVI, and similar metrics. But if you're chaining many operations |
| 79 | +or working with data that needs sub-centimetre precision (e.g. LiDAR |
| 80 | +heights in a small range), the truncation can matter. |
| 81 | + |
| 82 | +See :ref:`user_guide.data_types` for the full rationale. |
| 83 | + |
| 84 | +.. tip:: |
| 85 | + |
| 86 | + If you need float64 output, cast after the function returns: |
| 87 | + |
| 88 | + .. code-block:: python |
| 89 | +
|
| 90 | + result = slope(agg).astype('float64') |
| 91 | +
|
| 92 | +
|
| 93 | +NaN is the nodata sentinel |
| 94 | +========================== |
| 95 | + |
| 96 | +.. warning:: |
| 97 | + |
| 98 | + xarray-spatial uses ``NaN`` as its nodata value everywhere. There is |
| 99 | + no parameter to choose a different sentinel (e.g. -9999). |
| 100 | + |
| 101 | +Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN, |
| 102 | +the output cell is NaN. The exact behaviour varies by function: |
| 103 | + |
| 104 | +- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output |
| 105 | + cell. |
| 106 | +- **Focal mean/std** -- NaN neighbours are excluded from the average; the |
| 107 | + cell itself is still computed. |
| 108 | +- **Zonal stats** -- NaN values are excluded from the aggregation. An |
| 109 | + all-NaN zone returns NaN, not zero. |
| 110 | +- **Hydrology (flow direction)** -- NaN cells are treated as impassable |
| 111 | + barriers. Flow cannot cross them. |
| 112 | +- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers. |
| 113 | + |
| 114 | +.. note:: |
| 115 | + |
| 116 | + Edge cells (within the kernel radius of the raster boundary) default to |
| 117 | + NaN when ``boundary='nan'``. Use ``boundary='nearest'`` or |
| 118 | + ``boundary='reflect'`` if you need values at the edges. |
| 119 | + |
| 120 | + |
| 121 | +Proximity distances depend on coordinate units |
| 122 | +================================================ |
| 123 | + |
| 124 | +.. warning:: |
| 125 | + |
| 126 | + ``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default) |
| 127 | + computes distances in whatever units the DataArray's coordinates use. |
| 128 | + If coordinates are default integer indices (0, 1, 2, ...), the result |
| 129 | + is in pixel counts. If coordinates are in metres (UTM), the result is |
| 130 | + in metres. If coordinates are in degrees, the result is in degrees. |
| 131 | + |
| 132 | +Switch to ``distance_metric='GREAT_CIRCLE'`` for lat/lon data to get |
| 133 | +distances in **metres** (the radius used is 6 378 137 m). |
| 134 | + |
| 135 | +.. tip:: |
| 136 | + |
| 137 | + .. code-block:: python |
| 138 | +
|
| 139 | + from xrspatial.proximity import proximity |
| 140 | +
|
| 141 | + # EUCLIDEAN with default coords = pixel-count distances |
| 142 | + dist_px = proximity(raster) |
| 143 | +
|
| 144 | + # GREAT_CIRCLE with lat/lon coords = distances in metres |
| 145 | + dist_m = proximity(raster, distance_metric='GREAT_CIRCLE') |
| 146 | +
|
| 147 | +
|
| 148 | +Great-circle distances assume sphere radius = semi-major axis |
| 149 | +============================================================== |
| 150 | + |
| 151 | +.. note:: |
| 152 | + |
| 153 | + Great-circle distances in ``proximity()`` and ``surface_distance()`` |
| 154 | + model the Earth as a sphere with radius = WGS84 semi-major axis |
| 155 | + (6 378 137 m), not the mean radius (~6 371 km). |
| 156 | + |
| 157 | +The difference is about 0.1 %. This is negligible for most use cases but |
| 158 | +worth knowing if you're comparing against a reference that uses the mean |
| 159 | +radius or the IUGG value. |
| 160 | + |
| 161 | + |
| 162 | +Dask chunk sizes and ``map_overlap`` |
| 163 | +===================================== |
| 164 | + |
| 165 | +.. caution:: |
| 166 | + |
| 167 | + Focal, slope, aspect, and other kernel-based operations use |
| 168 | + ``dask.array.map_overlap``. If any chunk dimension is **smaller than |
| 169 | + the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap |
| 170 | + will fail or produce wrong results. |
| 171 | + |
| 172 | +.. caution:: |
| 173 | + |
| 174 | + ``proximity()`` with Dask expands each chunk by ``max_distance`` cells. |
| 175 | + If ``max_distance`` is infinite (the default), the entire array is loaded |
| 176 | + into one chunk. Set a finite ``max_distance`` to keep memory bounded. |
| 177 | + |
| 178 | +.. tip:: |
| 179 | + |
| 180 | + A good rule of thumb for focal operations: keep chunks at least 64 cells |
| 181 | + on each side. For proximity, set ``max_distance`` to the largest |
| 182 | + distance you actually care about. |
| 183 | + |
| 184 | + .. code-block:: python |
| 185 | +
|
| 186 | + import dask.array as da |
| 187 | + import xarray as xr |
| 188 | +
|
| 189 | + # Rechunk to safe sizes before running slope |
| 190 | + agg = xr.DataArray( |
| 191 | + da.from_array(big_array, chunks=(512, 512)) |
| 192 | + ) |
| 193 | +
|
| 194 | +
|
| 195 | +GPU memory is not managed automatically |
| 196 | +======================================== |
| 197 | + |
| 198 | +.. caution:: |
| 199 | + |
| 200 | + CuPy-backed operations allocate the full output array on the GPU. |
| 201 | + There is no automatic tiling or fallback to CPU if the array exceeds |
| 202 | + VRAM. |
| 203 | + |
| 204 | +If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory |
| 205 | +.OutOfMemoryError``, the array is too large for your GPU. Use Dask + CuPy |
| 206 | +to process in chunks, or fall back to NumPy. |
| 207 | + |
| 208 | +.. tip:: |
| 209 | + |
| 210 | + Complex geodesic kernels (slope, aspect with ``method='geodesic'``) |
| 211 | + use many float64 registers, which limits thread-block occupancy. |
| 212 | + xarray-spatial already reduces thread blocks to 16x16 for these |
| 213 | + kernels, but very old GPUs (compute capability < 6.0) may still |
| 214 | + hit register spill. |
| 215 | + |
| 216 | + |
| 217 | +Dimension ordering |
| 218 | +================== |
| 219 | + |
| 220 | +.. note:: |
| 221 | + |
| 222 | + xarray-spatial assumes the last two dimensions of a DataArray are |
| 223 | + ``(y, x)`` in that order. It does not inspect ``attrs['crs']`` or |
| 224 | + CF conventions to verify this. |
| 225 | + |
| 226 | +If your data has ``(x, y)`` ordering (rare, but it happens with some |
| 227 | +netCDF conventions), transpose before passing to any function: |
| 228 | + |
| 229 | +.. code-block:: python |
| 230 | +
|
| 231 | + agg = agg.transpose('y', 'x') |
| 232 | +
|
| 233 | +
|
| 234 | +Classification and NaN/Inf |
| 235 | +=========================== |
| 236 | + |
| 237 | +.. warning:: |
| 238 | + |
| 239 | + ``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other |
| 240 | + classification functions set NaN and infinite input values to NaN in the |
| 241 | + output. They do not raise an error. |
| 242 | + |
| 243 | +This means a raster with a few stray ``inf`` values (e.g. from a division |
| 244 | +by zero upstream) will silently produce NaN bins for those cells. |
| 245 | + |
| 246 | +.. tip:: |
| 247 | + |
| 248 | + Clean infinities before classifying: |
| 249 | + |
| 250 | + .. code-block:: python |
| 251 | +
|
| 252 | + import numpy as np |
| 253 | + agg = agg.where(np.isfinite(agg)) |
0 commit comments