Skip to content

Commit 0914a9c

Browse files
authored
Add sieve filter to remove small raster clumps (#1159)
* Add sieve filter to remove small raster clumps (#1149) Implements a sieve() function that identifies connected components of same-value pixels and replaces regions smaller than a threshold with the value of their largest spatial neighbor. Supports 4- and 8-connectivity, selective sieving via skip_values, and all four backends (numpy, cupy via CPU fallback, dask+numpy, dask+cupy). * Add sieve docs, README entry, and user guide notebook (#1149)
1 parent b7ddf67 commit 0914a9c

File tree

6 files changed

+1067
-0
lines changed

6 files changed

+1067
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,7 @@ Same-CRS tiles skip reprojection entirely and are placed by direct coordinate al
486486
| [Gradient](xrspatial/morphology.py) | Dilation minus erosion (edge detection) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
487487
| [White Top-hat](xrspatial/morphology.py) | Original minus opening (isolate bright features) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
488488
| [Black Top-hat](xrspatial/morphology.py) | Closing minus original (isolate dark features) | Standard (morphology) | ✅️ | ✅️ | ✅️ | ✅️ |
489+
| [Sieve](xrspatial/sieve.py) | Remove small connected clumps from classified rasters | GDAL sieve | ✅️ | ✅️ | 🔄 | 🔄 |
489490

490491
-------
491492

docs/source/reference/zonal.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,13 @@ Regions
3030

3131
xrspatial.zonal.regions
3232

33+
Sieve
34+
=====
35+
.. autosummary::
36+
:toctree: _autosummary
37+
38+
xrspatial.sieve.sieve
39+
3340
Trim
3441
====
3542
.. autosummary::
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Sieve Filter: Removing Small Raster Clumps\n",
8+
"\n",
9+
"Classification outputs often contain salt-and-pepper noise: tiny clumps of 1-3 pixels that don't represent real features. The `sieve` function removes these by replacing connected regions smaller than a given threshold with the value of their largest spatial neighbor.\n",
10+
"\n",
11+
"This is the xarray-spatial equivalent of GDAL's `gdal_sieve.py`, and it pairs naturally with classification functions like `natural_breaks()` or `reclassify()` and with `polygonize()` for cleaning results before vectorization."
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": null,
17+
"metadata": {},
18+
"outputs": [],
19+
"source": [
20+
"import numpy as np\n",
21+
"import xarray as xr\n",
22+
"import matplotlib.pyplot as plt\n",
23+
"from matplotlib.colors import ListedColormap\n",
24+
"\n",
25+
"from xrspatial.sieve import sieve\n",
26+
"from xrspatial.classify import natural_breaks"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"metadata": {},
32+
"source": [
33+
"## Generate a Noisy Classified Raster\n",
34+
"\n",
35+
"We'll create a synthetic classified raster with three land-cover classes and scatter some salt-and-pepper noise across it."
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": null,
41+
"metadata": {},
42+
"outputs": [],
43+
"source": [
44+
"np.random.seed(42)\n",
45+
"rows, cols = 80, 100\n",
46+
"\n",
47+
"# Build a base classification with three broad zones\n",
48+
"base = np.ones((rows, cols), dtype=np.float64)\n",
49+
"base[:, 40:70] = 2.0\n",
50+
"base[30:60, :] = 3.0\n",
51+
"base[30:60, 40:70] = 2.0\n",
52+
"\n",
53+
"# Add salt-and-pepper noise: randomly flip ~8% of pixels\n",
54+
"noise_mask = np.random.random((rows, cols)) < 0.08\n",
55+
"noise_vals = np.random.choice([1.0, 2.0, 3.0], size=(rows, cols))\n",
56+
"noisy = base.copy()\n",
57+
"noisy[noise_mask] = noise_vals[noise_mask]\n",
58+
"\n",
59+
"# Sprinkle some NaN (nodata) pixels\n",
60+
"noisy[0:3, 0:3] = np.nan\n",
61+
"noisy[77:, 97:] = np.nan\n",
62+
"\n",
63+
"raster = xr.DataArray(noisy, dims=['y', 'x'], name='landcover')\n",
64+
"print(f'Raster shape: {raster.shape}')\n",
65+
"print(f'Unique values (excl. NaN): {np.unique(raster.values[~np.isnan(raster.values)])}')"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {},
72+
"outputs": [],
73+
"source": [
74+
"cmap = ListedColormap(['#2ecc71', '#3498db', '#e74c3c'])\n",
75+
"\n",
76+
"fig, ax = plt.subplots(figsize=(8, 5))\n",
77+
"im = ax.imshow(raster.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
78+
"ax.set_title('Noisy classified raster')\n",
79+
"cbar = fig.colorbar(im, ax=ax, ticks=[1, 2, 3])\n",
80+
"cbar.ax.set_yticklabels(['Class 1', 'Class 2', 'Class 3'])\n",
81+
"plt.tight_layout()\n",
82+
"plt.show()"
83+
]
84+
},
85+
{
86+
"cell_type": "markdown",
87+
"metadata": {},
88+
"source": [
89+
"## Basic Sieve: Remove Single-Pixel Noise\n",
90+
"\n",
91+
"The simplest use case: set a threshold so isolated pixels are absorbed by their surroundings."
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": null,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"sieved = sieve(raster, threshold=4)\n",
101+
"\n",
102+
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
103+
"for ax, data, title in zip(axes, [raster, sieved], ['Before sieve', 'After sieve (threshold=4)']):\n",
104+
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
105+
" ax.set_title(title)\n",
106+
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
107+
"plt.tight_layout()\n",
108+
"plt.show()"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"## Connectivity: 4 vs 8\n",
116+
"\n",
117+
"With 4-connectivity (rook), only pixels sharing an edge are considered connected. With 8-connectivity (queen), diagonally adjacent pixels also form part of the same region. This affects which clumps are identified as \"small.\""
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"sieved_4 = sieve(raster, threshold=6, neighborhood=4)\n",
127+
"sieved_8 = sieve(raster, threshold=6, neighborhood=8)\n",
128+
"\n",
129+
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
130+
"for ax, data, title in zip(\n",
131+
" axes,\n",
132+
" [raster, sieved_4, sieved_8],\n",
133+
" ['Original', '4-connectivity (threshold=6)', '8-connectivity (threshold=6)'],\n",
134+
"):\n",
135+
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
136+
" ax.set_title(title)\n",
137+
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
138+
"plt.tight_layout()\n",
139+
"plt.show()"
140+
]
141+
},
142+
{
143+
"cell_type": "markdown",
144+
"metadata": {},
145+
"source": [
146+
"## Selective Sieving with `skip_values`\n",
147+
"\n",
148+
"Sometimes certain class values should never be removed, even if their regions are small. Use `skip_values` to protect specific categories from merging while still allowing other small regions to be cleaned up."
149+
]
150+
},
151+
{
152+
"cell_type": "code",
153+
"execution_count": null,
154+
"metadata": {},
155+
"outputs": [],
156+
"source": [
157+
"# Protect class 3 from sieving\n",
158+
"sieved_skip = sieve(raster, threshold=10, skip_values=[3.0])\n",
159+
"sieved_noskip = sieve(raster, threshold=10)\n",
160+
"\n",
161+
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
162+
"for ax, data, title in zip(\n",
163+
" axes,\n",
164+
" [raster, sieved_noskip, sieved_skip],\n",
165+
" ['Original', 'threshold=10 (no skip)', 'threshold=10 (skip class 3)'],\n",
166+
"):\n",
167+
" im = ax.imshow(data.values, cmap=cmap, vmin=0.5, vmax=3.5, interpolation='nearest')\n",
168+
" ax.set_title(title)\n",
169+
"fig.colorbar(im, ax=axes, ticks=[1, 2, 3], shrink=0.8)\n",
170+
"plt.tight_layout()\n",
171+
"plt.show()"
172+
]
173+
},
174+
{
175+
"cell_type": "markdown",
176+
"metadata": {},
177+
"source": [
178+
"## Practical Example: Clean Up a Classification\n",
179+
"\n",
180+
"Generate a continuous surface, classify it with `natural_breaks`, and then sieve the result to remove small artifacts."
181+
]
182+
},
183+
{
184+
"cell_type": "code",
185+
"execution_count": null,
186+
"metadata": {},
187+
"outputs": [],
188+
"source": [
189+
"# Create a smooth surface with some high-frequency variation\n",
190+
"y = np.linspace(0, 4 * np.pi, rows)\n",
191+
"x = np.linspace(0, 4 * np.pi, cols)\n",
192+
"Y, X = np.meshgrid(y, x, indexing='ij')\n",
193+
"surface = np.sin(Y) * np.cos(X) + 0.4 * np.random.randn(rows, cols)\n",
194+
"\n",
195+
"surface_da = xr.DataArray(surface, dims=['y', 'x'])\n",
196+
"classified = natural_breaks(surface_da, k=5)\n",
197+
"\n",
198+
"# Sieve the classification\n",
199+
"cleaned = sieve(classified, threshold=8)\n",
200+
"\n",
201+
"fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
202+
"axes[0].imshow(surface, cmap='terrain', interpolation='nearest')\n",
203+
"axes[0].set_title('Continuous surface')\n",
204+
"axes[1].imshow(classified.values, cmap='tab10', interpolation='nearest')\n",
205+
"axes[1].set_title('natural_breaks (k=5)')\n",
206+
"axes[2].imshow(cleaned.values, cmap='tab10', interpolation='nearest')\n",
207+
"axes[2].set_title('After sieve (threshold=8)')\n",
208+
"plt.tight_layout()\n",
209+
"plt.show()"
210+
]
211+
},
212+
{
213+
"cell_type": "markdown",
214+
"metadata": {},
215+
"source": [
216+
"## Threshold Selection\n",
217+
"\n",
218+
"The right threshold depends on pixel resolution and the minimum feature size you care about. Here's a comparison across threshold values."
219+
]
220+
},
221+
{
222+
"cell_type": "code",
223+
"execution_count": null,
224+
"metadata": {},
225+
"outputs": [],
226+
"source": [
227+
"thresholds = [2, 5, 15, 50]\n",
228+
"fig, axes = plt.subplots(1, len(thresholds), figsize=(5 * len(thresholds), 5))\n",
229+
"\n",
230+
"for ax, t in zip(axes, thresholds):\n",
231+
" result = sieve(classified, threshold=t)\n",
232+
" ax.imshow(result.values, cmap='tab10', interpolation='nearest')\n",
233+
" ax.set_title(f'threshold={t}')\n",
234+
"\n",
235+
"plt.suptitle('Effect of sieve threshold on classified raster', y=1.02)\n",
236+
"plt.tight_layout()\n",
237+
"plt.show()"
238+
]
239+
}
240+
],
241+
"metadata": {
242+
"kernelspec": {
243+
"display_name": "Python 3 (ipykernel)",
244+
"language": "python",
245+
"name": "python3"
246+
},
247+
"language_info": {
248+
"codemirror_mode": {
249+
"name": "ipython",
250+
"version": 3
251+
},
252+
"file_extension": ".py",
253+
"mimetype": "text/x-python",
254+
"name": "python",
255+
"nbformat_minor": 4,
256+
"version": "3.11.0"
257+
}
258+
},
259+
"nbformat": 4,
260+
"nbformat_minor": 4
261+
}

xrspatial/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@
104104
from xrspatial.hydro import stream_link_d8, stream_link_dinf, stream_link_mfd # noqa
105105
from xrspatial.hydro import stream_order # noqa: unified wrapper
106106
from xrspatial.hydro import stream_order_d8, stream_order_dinf, stream_order_mfd # noqa
107+
from xrspatial.sieve import sieve # noqa
107108
from xrspatial.sky_view_factor import sky_view_factor # noqa
108109
from xrspatial.slope import slope # noqa
109110
from xrspatial.surface_distance import surface_allocation # noqa

0 commit comments

Comments
 (0)