Skip to content

Commit 2b7611c

Browse files
committed
Add design spec for polygonize simplification (#1151)
1 parent 46ee269 commit 2b7611c

File tree

1 file changed

+138
-0
lines changed

1 file changed

+138
-0
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Polygonize Geometry Simplification
2+
3+
**Issue:** #1151
4+
**Date:** 2026-04-06
5+
6+
## Problem
7+
8+
`polygonize()` produces exact pixel-boundary polygons. On high-resolution
9+
rasters this creates dense geometries with thousands of vertices per polygon,
10+
making output slow to render, large on disk, and unwieldy for spatial joins.
11+
12+
The current workaround chains GDAL's `gdal_polygonize.py` with
13+
`ogr2ogr -simplify`, adding an external dependency and intermediate file.
14+
15+
## API
16+
17+
Two new parameters on `polygonize()`:
18+
19+
```python
20+
def polygonize(
21+
raster, mask=None, connectivity=4, transform=None,
22+
column_name="DN", return_type="numpy",
23+
simplify_tolerance=None, # float, coordinate units
24+
simplify_method="douglas-peucker", # str
25+
):
26+
```
27+
28+
- `simplify_tolerance=None` or `0.0`: no simplification (backward compatible).
29+
- `simplify_tolerance > 0`: apply Douglas-Peucker with given tolerance.
30+
- `simplify_method="visvalingam-whyatt"`: raises `NotImplementedError`.
31+
- Negative tolerance raises `ValueError`.
32+
33+
## Algorithm: Shared-Edge Douglas-Peucker
34+
35+
Topology-preserving simplification via shared-edge decomposition, same
36+
approach used by TopoJSON and GRASS `v.generalize`.
37+
38+
### Pipeline position
39+
40+
Simplification runs between boundary tracing and output conversion:
41+
42+
```
43+
CCL -> boundary tracing -> [simplification] -> output conversion
44+
```
45+
46+
For dask backends, simplification runs after chunk merging.
47+
48+
### Steps
49+
50+
1. **Find junctions.** Scan all ring vertices. A junction is any coordinate
51+
that appears as a vertex in 3 or more distinct rings. These points are
52+
pinned and never removed by simplification.
53+
54+
2. **Split rings into edge chains.** Walk each ring and split at junction
55+
vertices. Each resulting chain connects two junctions (or forms a closed
56+
loop when the ring contains no junctions). Each chain is shared by at
57+
most 2 adjacent polygons.
58+
59+
3. **Deduplicate chains.** Normalize each chain by its sorted endpoint pair
60+
so shared edges between adjacent polygons are identified and simplified
61+
only once.
62+
63+
4. **Simplify each chain.** Apply Douglas-Peucker to each unique chain.
64+
Junction endpoints are fixed. The DP implementation is numba-compiled
65+
(`@ngjit`) for performance on large coordinate arrays.
66+
67+
5. **Reassemble rings.** Replace each ring's chain segments with their
68+
simplified versions and rebuild the ring coordinate arrays.
69+
70+
### Why this preserves topology
71+
72+
Adjacent polygons reference the same physical edge chain. Simplifying
73+
each chain once means both neighbors get identical simplified boundaries.
74+
No gaps or overlaps can arise because there is no independent simplification
75+
of shared geometry.
76+
77+
## Implementation
78+
79+
All new code lives in `xrspatial/polygonize.py` as internal functions.
80+
81+
### New functions
82+
83+
| Function | Decorator | Purpose |
84+
|---|---|---|
85+
| `_find_junctions(all_rings)` | pure Python | Scan rings, return set of junction coords |
86+
| `_split_ring_at_junctions(ring, junctions)` | pure Python | Break one ring into chains at junctions |
87+
| `_normalize_chain(chain)` | pure Python | Canonical key for deduplication |
88+
| `_douglas_peucker(coords, tolerance)` | `@ngjit` | DP simplification on Nx2 array |
89+
| `_simplify_polygons(polygon_points, tolerance)` | pure Python | Orchestrator: junctions -> split -> DP -> reassemble |
90+
91+
### Integration point
92+
93+
In `polygonize()`, after the `mapper(raster)(...)` call returns
94+
`(column, polygon_points)` and before the return-type conversion block:
95+
96+
```python
97+
if simplify_tolerance and simplify_tolerance > 0:
98+
polygon_points = _simplify_polygons(polygon_points, simplify_tolerance)
99+
```
100+
101+
### Backend behavior
102+
103+
- **NumPy / CuPy:** simplification runs on CPU-side coordinate arrays
104+
returned by boundary tracing (CuPy already transfers to CPU for tracing).
105+
- **Dask:** simplification runs after `_merge_chunk_polygons()`, on the
106+
fully merged result.
107+
- No GPU-side simplification. Boundary tracing is already CPU-bound;
108+
simplification follows the same pattern.
109+
110+
## Constraints
111+
112+
- No Visvalingam-Whyatt yet. The `simplify_method` parameter is present
113+
in the API for forward compatibility; passing `"visvalingam-whyatt"`
114+
raises `NotImplementedError`.
115+
- No streaming simplification. The full polygon set must fit in memory,
116+
same constraint as existing boundary tracing.
117+
- Minimum ring size after simplification: exterior rings keep at least 4
118+
vertices (3 unique + closing). Degenerate rings (area below tolerance
119+
squared) are dropped.
120+
121+
## Testing
122+
123+
- Correctness: known 4x4 raster, verify simplified polygon areas match
124+
originals (simplification must not change topology, only vertex count).
125+
- Vertex reduction: verify output has fewer vertices than unsimplified.
126+
- Topology: verify no gaps between adjacent polygons (union of simplified
127+
polygons equals union of originals, within floating-point tolerance).
128+
- Edge cases: tolerance=0, tolerance=None, negative tolerance, single-pixel
129+
raster, raster with one uniform value.
130+
- Backend parity: numpy and dask produce same results.
131+
- Return types: simplification works with all five return types.
132+
133+
## Out of scope
134+
135+
- Visvalingam-Whyatt implementation (future PR).
136+
- GPU-accelerated simplification.
137+
- Per-chunk simplification for dask (simplification is post-merge only).
138+
- Area-weighted simplification or other adaptive tolerance schemes.

0 commit comments

Comments
 (0)