Skip to content

Commit ebb4872

Browse files
authored
Fixes #190: GPU-enable all classification operations (#852)
* GPU-enable all classification ops: equal_interval, quantile, natural_breaks now support all 4 backends (#190) - Add Dask+CuPy backend for equal_interval via _run_dask_cupy_equal_interval - Replace quantile Dask+CuPy NotImplementedError with working implementation that materializes data to CPU for percentile computation - Add CuPy, Dask+NumPy, and Dask+CuPy backends for natural_breaks by extracting shared _compute_natural_break_bins helper - Add 7 new tests covering all new backend combinations - Update README feature matrix to reflect full backend support * fix OOM in dask classification backends for large datasets - quantile dask+cupy: replace full materialization with map_blocks(cupy.asnumpy) to convert chunks to CPU one at a time, then delegate to dask's streaming approximate percentile - natural_breaks dask backends: sample lazily from the dask array and only materialize the sample (default 20k points), not the entire dataset. Add _generate_sample_indices helper that uses O(num_sample) memory via RandomState.choice() for large datasets, falling back to the original linspace+shuffle for small datasets to preserve determinism with numpy * optimize classification ops: reduce memory allocations and dask passes - Remove unnecessary .ravel() in _run_equal_interval; nanmin/nanmax work on 2D - Combine double where(±inf) into single isinf pass in _run_equal_interval and _run_cupy_bin, halving temporary allocations - Use dask.compute(min, max) instead of two separate .compute() calls so dask reads data once instead of twice - Build cuts as numpy array for all backends (was needlessly dask for k elements) - Replace boolean fancy indexing in dask natural_break functions with da.where + da.nanmax to preserve chunk structure - Delete _run_dask_cupy_equal_interval; unified _run_equal_interval with module=da handles both dask+numpy and dask+cupy * add 14 new classify tests: mutation safety, edge cases, cross-backend consistency - Missing backend: natural_breaks dask+cupy num_sample - Input mutation: verify all 5 functions don't modify input DataArray - Untested path: natural_breaks with num_sample=None - Edge cases: equal_interval k=1, all-NaN input for equal_interval and natural_breaks - Name parameter: verify default and custom name on all 5 functions - Cross-backend: verify natural_breaks cupy and dask match numpy results on a separate 10x10 dataset * added a few more classify methods and experimental geodesic slope and aspect * added supported geodesic helpers and test * add description column to README feature matrix tables * fix non-deterministic maximum_breaks when gaps are equal Replace np.argpartition with np.argsort(kind='stable') so that tied gap sizes are broken by index order, consistently selecting the highest-value gaps.
1 parent a28019b commit ebb4872

File tree

10 files changed

+2656
-202
lines changed

10 files changed

+2656
-202
lines changed

README.md

Lines changed: 70 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -135,112 +135,117 @@ In the GIS world, rasters are used for representing continuous phenomena (e.g. e
135135

136136
### **Classification**
137137

138-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
139-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
140-
| [Equal Interval](xrspatial/classify.py) |✅️ ||||
141-
| [Natural Breaks](xrspatial/classify.py) |✅️ | | ||
142-
| [Reclassify](xrspatial/classify.py) |✅️ ||||
143-
| [Quantile](xrspatial/classify.py) |✅️ ||||
138+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
139+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
140+
| [Box Plot](xrspatial/classify.py) | Classifies values into bins based on box plot quartile boundaries | ✅️ ||||
141+
| [Equal Interval](xrspatial/classify.py) | Divides the value range into equal-width bins | ✅️ ||||
142+
| [Head/Tail Breaks](xrspatial/classify.py) | Classifies heavy-tailed distributions using recursive mean splitting | ✅️ ||||
143+
| [Maximum Breaks](xrspatial/classify.py) | Finds natural groupings by maximizing differences between sorted values | ✅️ ||||
144+
| [Natural Breaks](xrspatial/classify.py) | Optimizes class boundaries to minimize within-class variance (Jenks) | ✅️ ||||
145+
| [Percentiles](xrspatial/classify.py) | Assigns classes based on user-defined percentile breakpoints | ✅️ ||||
146+
| [Quantile](xrspatial/classify.py) | Distributes values into classes with equal observation counts | ✅️ ||||
147+
| [Reclassify](xrspatial/classify.py) | Remaps pixel values to new classes using a user-defined lookup | ✅️ ||||
148+
| [Std Mean](xrspatial/classify.py) | Classifies values by standard deviation intervals from the mean | ✅️ ||||
144149

145150
-------
146151

147152
### **Focal**
148153

149-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
150-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
151-
| [Apply](xrspatial/focal.py) | ✅️ | ✅️ | | |
152-
| [Hotspots](xrspatial/focal.py) | ✅️ | ✅️ | ✅️ | |
153-
| [Mean](xrspatial/focal.py) | ✅️ | ✅️ | ✅️ | |
154-
| [Focal Statistics](xrspatial/focal.py) | ✅️ | ✅️ | ✅️ | |
154+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
155+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
156+
| [Apply](xrspatial/focal.py) | Applies a custom function over a sliding neighborhood window | ✅️ | ✅️ | | |
157+
| [Hotspots](xrspatial/focal.py) | Identifies statistically significant spatial clusters using Getis-Ord Gi* | ✅️ | ✅️ | ✅️ | |
158+
| [Mean](xrspatial/focal.py) | Computes the mean value within a sliding neighborhood window | ✅️ | ✅️ | ✅️ | |
159+
| [Focal Statistics](xrspatial/focal.py) | Computes summary statistics over a sliding neighborhood window | ✅️ | ✅️ | ✅️ | |
155160

156161
-------
157162

158163
### **Multispectral**
159164

160-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
161-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
162-
| [Atmospherically Resistant Vegetation Index (ARVI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
163-
| [Enhanced Built-Up and Bareness Index (EBBI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
164-
| [Enhanced Vegetation Index (EVI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
165-
| [Green Chlorophyll Index (GCI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
166-
| [Normalized Burn Ratio (NBR)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
167-
| [Normalized Burn Ratio 2 (NBR2)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
168-
| [Normalized Difference Moisture Index (NDMI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
169-
| [Normalized Difference Vegetation Index (NDVI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
170-
| [Soil Adjusted Vegetation Index (SAVI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
171-
| [Structure Insensitive Pigment Index (SIPI)](xrspatial/multispectral.py) | ✅️ |✅️ | ✅️ |✅️ |
172-
| [True Color](xrspatial/multispectral.py) | ✅️ || ✅️ ||
165+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
166+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
167+
| [Atmospherically Resistant Vegetation Index (ARVI)](xrspatial/multispectral.py) | Vegetation index resistant to atmospheric effects using blue band correction | ✅️ |✅️ | ✅️ |✅️ |
168+
| [Enhanced Built-Up and Bareness Index (EBBI)](xrspatial/multispectral.py) | Highlights built-up areas and barren land from thermal and SWIR bands | ✅️ |✅️ | ✅️ |✅️ |
169+
| [Enhanced Vegetation Index (EVI)](xrspatial/multispectral.py) | Enhanced vegetation index reducing soil and atmospheric noise | ✅️ |✅️ | ✅️ |✅️ |
170+
| [Green Chlorophyll Index (GCI)](xrspatial/multispectral.py) | Estimates leaf chlorophyll content from green and NIR reflectance | ✅️ |✅️ | ✅️ |✅️ |
171+
| [Normalized Burn Ratio (NBR)](xrspatial/multispectral.py) | Measures burn severity using NIR and SWIR band difference | ✅️ |✅️ | ✅️ |✅️ |
172+
| [Normalized Burn Ratio 2 (NBR2)](xrspatial/multispectral.py) | Refines burn severity mapping using two SWIR bands | ✅️ |✅️ | ✅️ |✅️ |
173+
| [Normalized Difference Moisture Index (NDMI)](xrspatial/multispectral.py) | Detects vegetation moisture stress from NIR and SWIR reflectance | ✅️ |✅️ | ✅️ |✅️ |
174+
| [Normalized Difference Vegetation Index (NDVI)](xrspatial/multispectral.py) | Quantifies vegetation density from red and NIR band difference | ✅️ |✅️ | ✅️ |✅️ |
175+
| [Soil Adjusted Vegetation Index (SAVI)](xrspatial/multispectral.py) | Vegetation index with soil brightness correction factor | ✅️ |✅️ | ✅️ |✅️ |
176+
| [Structure Insensitive Pigment Index (SIPI)](xrspatial/multispectral.py) | Estimates carotenoid-to-chlorophyll ratio for plant stress detection | ✅️ |✅️ | ✅️ |✅️ |
177+
| [True Color](xrspatial/multispectral.py) | Composites red, green, and blue bands into a natural color image | ✅️ || ✅️ ||
173178

174179
-------
175180

176181

177182
### **Pathfinding**
178183

179-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
180-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
181-
| [A* Pathfinding](xrspatial/pathfinding.py) | ✅️ | | | |
184+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
185+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
186+
| [A* Pathfinding](xrspatial/pathfinding.py) | Finds the least-cost path between two cells on a cost surface | ✅️ | | | |
182187

183188
----------
184189

185190
### **Proximity**
186191

187-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
188-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
189-
| [Allocation](xrspatial/proximity.py) | ✅️ || | |
190-
| [Direction](xrspatial/proximity.py) | ✅️ || | |
191-
| [Proximity](xrspatial/proximity.py) | ✅️ || | |
192+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
193+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
194+
| [Allocation](xrspatial/proximity.py) | Assigns each cell to the identity of the nearest source feature | ✅️ || | |
195+
| [Direction](xrspatial/proximity.py) | Computes the direction from each cell to the nearest source feature | ✅️ || | |
196+
| [Proximity](xrspatial/proximity.py) | Computes the distance from each cell to the nearest source feature | ✅️ || | |
192197

193198
--------
194199

195200
### **Raster to vector**
196201

197-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
198-
|:-----|:------------------:|:-----------------:|:---------------------:|:---------------------:|
199-
| [Polygonize](xrspatial/experimental/polygonize.py) | ✅️ | | | |
202+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
203+
|:-----|:------------|:------------------:|:-----------------:|:---------------------:|:---------------------:|
204+
| [Polygonize](xrspatial/experimental/polygonize.py) | Converts contiguous regions of equal value into vector polygons | ✅️ | | | |
200205

201206
--------
202207

203208
### **Surface**
204209

205-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
206-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
207-
| [Aspect](xrspatial/aspect.py) | ✅️ | ✅️ | ✅️ | ✅️ |
208-
| [Curvature](xrspatial/curvature.py) | ✅️ |✅️ |✅️ | ✅️ |
209-
| [Hillshade](xrspatial/hillshade.py) | ✅️ | ✅️ | | |
210-
| [Slope](xrspatial/slope.py) | ✅️ | ✅️ | ✅️ | ✅️ |
211-
| [Terrain Generation](xrspatial/terrain.py) | ✅️ | ✅️ | ✅️ | |
212-
| [Viewshed](xrspatial/viewshed.py) | ✅️ | | | |
213-
| [Perlin Noise](xrspatial/perlin.py) | ✅️ | ✅️ | ✅️ | |
214-
| [Bump Mapping](xrspatial/bump.py) | ✅️ | | | |
210+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
211+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
212+
| [Aspect](xrspatial/aspect.py) | Computes downslope direction of each cell in degrees | ✅️ | ✅️ | ✅️ | ✅️ |
213+
| [Curvature](xrspatial/curvature.py) | Measures rate of slope change (concavity/convexity) at each cell | ✅️ |✅️ |✅️ | ✅️ |
214+
| [Hillshade](xrspatial/hillshade.py) | Simulates terrain illumination from a given sun angle and azimuth | ✅️ | ✅️ | | |
215+
| [Slope](xrspatial/slope.py) | Computes terrain gradient steepness at each cell in degrees | ✅️ | ✅️ | ✅️ | ✅️ |
216+
| [Terrain Generation](xrspatial/terrain.py) | Generates synthetic terrain elevation using fractal noise | ✅️ | ✅️ | ✅️ | |
217+
| [Viewshed](xrspatial/viewshed.py) | Determines visible cells from a given observer point on terrain | ✅️ | | | |
218+
| [Perlin Noise](xrspatial/perlin.py) | Generates smooth continuous random noise for procedural textures | ✅️ | ✅️ | ✅️ | |
219+
| [Bump Mapping](xrspatial/bump.py) | Adds randomized bump features to simulate natural terrain variation | ✅️ | | | |
215220

216221
-----------
217222

218223
### **Zonal**
219224

220-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
221-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
222-
| [Apply](xrspatial/zonal.py) | ✅️ | ✅️ | | |
223-
| [Crop](xrspatial/zonal.py) | ✅️ | | | |
224-
| [Regions](xrspatial/zonal.py) | | | | |
225-
| [Trim](xrspatial/zonal.py) | ✅️ | | | |
226-
| [Zonal Statistics](xrspatial/zonal.py) | ✅️ | ✅️| | |
227-
| [Zonal Cross Tabulate](xrspatial/zonal.py) | ✅️ | ✅️| | |
225+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
226+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
227+
| [Apply](xrspatial/zonal.py) | Applies a custom function to each zone in a classified raster | ✅️ | ✅️ | | |
228+
| [Crop](xrspatial/zonal.py) | Extracts the bounding rectangle of a specific zone | ✅️ | | | |
229+
| [Regions](xrspatial/zonal.py) | Identifies connected regions of non-zero cells | | | | |
230+
| [Trim](xrspatial/zonal.py) | Removes nodata border rows and columns from a raster | ✅️ | | | |
231+
| [Zonal Statistics](xrspatial/zonal.py) | Computes summary statistics for a value raster within each zone | ✅️ | ✅️| | |
232+
| [Zonal Cross Tabulate](xrspatial/zonal.py) | Cross-tabulates agreement between two categorical rasters | ✅️ | ✅️| | |
228233

229234
-----------
230235

231236
### **Local**
232237

233-
| Name | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
234-
|:----------:|:----------------------:|:--------------------:|:-------------------:|:------:|
235-
| [Cell Stats](xrspatial/local.py) | ✅️ | | | |
236-
| [Combine](xrspatial/local.py) | ✅️ | | | |
237-
| [Lesser Frequency](xrspatial/local.py) | ✅️ | | | |
238-
| [Equal Frequency](xrspatial/local.py) | ✅️ | | | |
239-
| [Greater Frequency](xrspatial/local.py) | ✅️ | | | |
240-
| [Lowest Position](xrspatial/local.py) | ✅️ | | | |
241-
| [Highest Position](xrspatial/local.py) | ✅️ | | | |
242-
| [Popularity](xrspatial/local.py) | ✅️ | | | |
243-
| [Rank](xrspatial/local.py) | ✅️ | | | |
238+
| Name | Description | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
239+
|:----------:|:------------|:----------------------:|:--------------------:|:-------------------:|:------:|
240+
| [Cell Stats](xrspatial/local.py) | Computes summary statistics across multiple rasters per cell | ✅️ | | | |
241+
| [Combine](xrspatial/local.py) | Assigns unique IDs to each distinct combination across rasters | ✅️ | | | |
242+
| [Lesser Frequency](xrspatial/local.py) | Counts how many rasters have values less than a reference | ✅️ | | | |
243+
| [Equal Frequency](xrspatial/local.py) | Counts how many rasters have values equal to a reference | ✅️ | | | |
244+
| [Greater Frequency](xrspatial/local.py) | Counts how many rasters have values greater than a reference | ✅️ | | | |
245+
| [Lowest Position](xrspatial/local.py) | Identifies which raster has the minimum value at each cell | ✅️ | | | |
246+
| [Highest Position](xrspatial/local.py) | Identifies which raster has the maximum value at each cell | ✅️ | | | |
247+
| [Popularity](xrspatial/local.py) | Returns the value from the most common position across rasters | ✅️ | | | |
248+
| [Rank](xrspatial/local.py) | Ranks cell values across multiple rasters per cell | ✅️ | | | |
244249

245250
#### Usage
246251

xrspatial/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
from xrspatial.aspect import aspect # noqa
22
from xrspatial.bump import bump # noqa
33
from xrspatial.classify import binary # noqa
4+
from xrspatial.classify import box_plot # noqa
5+
from xrspatial.classify import head_tail_breaks # noqa
6+
from xrspatial.classify import maximum_breaks # noqa
7+
from xrspatial.classify import percentiles # noqa
8+
from xrspatial.classify import std_mean # noqa
49
from xrspatial.diagnostics import diagnose # noqa
510
from xrspatial.classify import equal_interval # noqa
611
from xrspatial.classify import natural_breaks # noqa

0 commit comments

Comments
 (0)