Skip to content

Commit 29556ed

Browse files
authored
Add dask laziness docs and regression tests (#1164) (#1165)
Reference page listing every public function's dask laziness level (fully lazy / partially lazy / fully materialized) so users can plan pipelines without reading source. 22 tests assert that functions expected to stay lazy actually return dask collections.
1 parent 3b56af8 commit 29556ed

File tree

3 files changed

+431
-0
lines changed

3 files changed

+431
-0
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
.. _reference.dask_laziness:
2+
3+
*********************
4+
Dask backend behavior
5+
*********************
6+
7+
When you pass a dask-backed ``DataArray`` to an xarray-spatial function, the
8+
result *should* also be dask-backed so your pipeline stays lazy until you call
9+
``.compute()``. Most functions do this, but some algorithms need random access
10+
to the full array and have to materialize intermediate results.
11+
12+
This page lists every public function and its laziness level so you can plan
13+
dask pipelines without reading source code.
14+
15+
Laziness levels
16+
===============
17+
18+
**Fully lazy** -- the function returns a dask array without triggering any
19+
computation. Safe for arbitrarily large out-of-core datasets.
20+
21+
**Partially lazy** -- the function computes small bounded statistics (scalars,
22+
quartiles, a ~20K sample) during setup, then returns a dask array for the main
23+
result. The statistics are cheap; the heavy work stays lazy.
24+
25+
**Fully materialized** -- the algorithm needs the entire array in memory
26+
(connected-component labeling, A* search, viewshed sweepline, etc.). The
27+
result may be re-wrapped as dask, but the function calls ``.compute()``
28+
internally. Watch your memory on large inputs.
29+
30+
31+
Terrain metrics
32+
===============
33+
34+
.. list-table::
35+
:header-rows: 1
36+
:widths: 30 20 50
37+
38+
* - Function
39+
- Laziness
40+
- Notes
41+
* - ``slope``
42+
- Fully lazy
43+
- ``map_overlap``, planar and geodesic
44+
* - ``aspect``
45+
- Fully lazy
46+
- ``map_overlap``, planar and geodesic
47+
* - ``curvature``
48+
- Fully lazy
49+
- ``map_overlap``
50+
* - ``hillshade``
51+
- Fully lazy
52+
- ``map_overlap``
53+
* - ``northness``
54+
- Fully lazy
55+
- Uses ``da.cos`` / ``da.deg2rad`` on aspect output
56+
* - ``eastness``
57+
- Fully lazy
58+
- Uses ``da.sin`` / ``da.deg2rad`` on aspect output
59+
60+
61+
Focal operations
62+
================
63+
64+
.. list-table::
65+
:header-rows: 1
66+
:widths: 30 20 50
67+
68+
* - Function
69+
- Laziness
70+
- Notes
71+
* - ``mean``
72+
- Fully lazy
73+
- Iterative ``map_overlap``
74+
* - ``apply``
75+
- Fully lazy
76+
- ``map_overlap`` with user kernel
77+
* - ``focal_stats``
78+
- Fully lazy
79+
- Multiple stats via ``map_overlap``, 3D output
80+
* - ``hotspots``
81+
- Partially lazy
82+
- Computes global mean and std, result is dask
83+
84+
85+
Classification
86+
==============
87+
88+
.. list-table::
89+
:header-rows: 1
90+
:widths: 30 20 50
91+
92+
* - Function
93+
- Laziness
94+
- Notes
95+
* - ``binary``
96+
- Fully lazy
97+
- ``map_blocks``
98+
* - ``reclassify``
99+
- Fully lazy
100+
- ``map_blocks``
101+
* - ``quantile``
102+
- Partially lazy
103+
- Computes percentiles from ~20K sample
104+
* - ``natural_breaks``
105+
- Partially lazy
106+
- Computes Jenks breaks from ~20K sample + scalar max
107+
* - ``equal_interval``
108+
- Partially lazy
109+
- Computes scalar min/max
110+
* - ``std_mean``
111+
- Partially lazy
112+
- Computes scalar mean/std/max
113+
* - ``head_tail_breaks``
114+
- Partially lazy
115+
- Computes O(log N) scalar means
116+
* - ``percentiles``
117+
- Partially lazy
118+
- Computes percentiles from ~20K sample
119+
* - ``maximum_breaks``
120+
- Partially lazy
121+
- Computes breaks from ~20K sample
122+
* - ``box_plot``
123+
- Partially lazy
124+
- Computes scalar quartiles and max
125+
126+
127+
Normalization
128+
=============
129+
130+
.. list-table::
131+
:header-rows: 1
132+
:widths: 30 20 50
133+
134+
* - Function
135+
- Laziness
136+
- Notes
137+
* - ``rescale``
138+
- Fully lazy
139+
- ``da.nanmin`` / ``da.nanmax`` (lazy reductions)
140+
* - ``standardize``
141+
- Fully lazy
142+
- ``da.nanmean`` / ``da.nanstd`` (lazy reductions)
143+
144+
145+
Visibility
146+
==========
147+
148+
.. list-table::
149+
:header-rows: 1
150+
:widths: 30 20 50
151+
152+
* - Function
153+
- Laziness
154+
- Notes
155+
* - ``viewshed``
156+
- Fully materialized
157+
- Sweepline algorithm needs random access
158+
* - ``line_of_sight``
159+
- Fully materialized
160+
- Extracts 1D transect via ``.compute()``
161+
* - ``cumulative_viewshed``
162+
- Fully materialized
163+
- Runs multiple viewshed calls
164+
* - ``visibility_frequency``
165+
- Fully materialized
166+
- Wraps ``cumulative_viewshed``
167+
168+
169+
Morphology
170+
==========
171+
172+
.. list-table::
173+
:header-rows: 1
174+
:widths: 30 20 50
175+
176+
* - Function
177+
- Laziness
178+
- Notes
179+
* - ``sieve``
180+
- Fully materialized
181+
- Connected-component labeling needs the full array; result re-wrapped as dask
182+
183+
184+
Proximity
185+
=========
186+
187+
.. list-table::
188+
:header-rows: 1
189+
:widths: 30 20 50
190+
191+
* - Function
192+
- Laziness
193+
- Notes
194+
* - ``proximity``
195+
- Fully materialized
196+
- Distance computation needs full array
197+
* - ``allocation``
198+
- Fully materialized
199+
- Nearest-source allocation
200+
* - ``direction``
201+
- Fully materialized
202+
- Direction to nearest source
203+
204+
205+
Zonal
206+
=====
207+
208+
.. list-table::
209+
:header-rows: 1
210+
:widths: 30 20 50
211+
212+
* - Function
213+
- Laziness
214+
- Notes
215+
* - ``zonal_stats`` / ``stats``
216+
- Partially lazy
217+
- Groupby aggregation via dask dataframe
218+
* - ``zonal_crosstab`` / ``crosstab``
219+
- Partially lazy
220+
- Groupby cross-tabulation
221+
* - ``zonal_apply`` / ``apply``
222+
- Fully lazy
223+
- ``map_blocks`` per zone
224+
* - ``regions``
225+
- Fully materialized
226+
- Connected-component labeling
227+
* - ``trim``
228+
- Fully lazy
229+
- Lazy slicing
230+
* - ``crop``
231+
- Fully lazy
232+
- Lazy slicing
233+
234+
235+
Pathfinding
236+
===========
237+
238+
.. list-table::
239+
:header-rows: 1
240+
:widths: 30 20 50
241+
242+
* - Function
243+
- Laziness
244+
- Notes
245+
* - ``a_star_search``
246+
- Fully materialized
247+
- A* needs random access and visited-set tracking
248+
* - ``multi_stop_search``
249+
- Fully materialized
250+
- Iterative A*

docs/source/reference/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Reference
77
.. toctree::
88
:maxdepth: 2
99

10+
dask_laziness
1011
classification
1112
dasymetric
1213
diffusion

0 commit comments

Comments
 (0)