|
| 1 | +# Polygonize Geometry Simplification |
| 2 | + |
| 3 | +**Issue:** #1151 |
| 4 | +**Date:** 2026-04-06 |
| 5 | + |
| 6 | +## Problem |
| 7 | + |
| 8 | +`polygonize()` produces exact pixel-boundary polygons. On high-resolution |
| 9 | +rasters this creates dense geometries with thousands of vertices per polygon, |
| 10 | +making output slow to render, large on disk, and unwieldy for spatial joins. |
| 11 | + |
| 12 | +The current workaround chains GDAL's `gdal_polygonize.py` with |
| 13 | +`ogr2ogr -simplify`, adding an external dependency and intermediate file. |
| 14 | + |
| 15 | +## API |
| 16 | + |
| 17 | +Two new parameters on `polygonize()`: |
| 18 | + |
| 19 | +```python |
| 20 | +def polygonize( |
| 21 | + raster, mask=None, connectivity=4, transform=None, |
| 22 | + column_name="DN", return_type="numpy", |
| 23 | + simplify_tolerance=None, # float, coordinate units |
| 24 | + simplify_method="douglas-peucker", # str |
| 25 | +): |
| 26 | +``` |
| 27 | + |
| 28 | +- `simplify_tolerance=None` or `0.0`: no simplification (backward compatible). |
| 29 | +- `simplify_tolerance > 0`: apply Douglas-Peucker with given tolerance. |
| 30 | +- `simplify_method="visvalingam-whyatt"`: raises `NotImplementedError`. |
| 31 | +- Negative tolerance raises `ValueError`. |
| 32 | + |
| 33 | +## Algorithm: Shared-Edge Douglas-Peucker |
| 34 | + |
| 35 | +Topology-preserving simplification via shared-edge decomposition, same |
| 36 | +approach used by TopoJSON and GRASS `v.generalize`. |
| 37 | + |
| 38 | +### Pipeline position |
| 39 | + |
| 40 | +Simplification runs between boundary tracing and output conversion: |
| 41 | + |
| 42 | +``` |
| 43 | +CCL -> boundary tracing -> [simplification] -> output conversion |
| 44 | +``` |
| 45 | + |
| 46 | +For dask backends, simplification runs after chunk merging. |
| 47 | + |
| 48 | +### Steps |
| 49 | + |
| 50 | +1. **Find junctions.** Scan all ring vertices. A junction is any coordinate |
| 51 | + that appears as a vertex in 3 or more distinct rings. These points are |
| 52 | + pinned and never removed by simplification. |
| 53 | + |
| 54 | +2. **Split rings into edge chains.** Walk each ring and split at junction |
| 55 | + vertices. Each resulting chain connects two junctions (or forms a closed |
| 56 | + loop when the ring contains no junctions). Each chain is shared by at |
| 57 | + most 2 adjacent polygons. |
| 58 | + |
| 59 | +3. **Deduplicate chains.** Normalize each chain by its sorted endpoint pair |
| 60 | + so shared edges between adjacent polygons are identified and simplified |
| 61 | + only once. |
| 62 | + |
| 63 | +4. **Simplify each chain.** Apply Douglas-Peucker to each unique chain. |
| 64 | + Junction endpoints are fixed. The DP implementation is numba-compiled |
| 65 | + (`@ngjit`) for performance on large coordinate arrays. |
| 66 | + |
| 67 | +5. **Reassemble rings.** Replace each ring's chain segments with their |
| 68 | + simplified versions and rebuild the ring coordinate arrays. |
| 69 | + |
| 70 | +### Why this preserves topology |
| 71 | + |
| 72 | +Adjacent polygons reference the same physical edge chain. Simplifying |
| 73 | +each chain once means both neighbors get identical simplified boundaries. |
| 74 | +No gaps or overlaps can arise because there is no independent simplification |
| 75 | +of shared geometry. |
| 76 | + |
| 77 | +## Implementation |
| 78 | + |
| 79 | +All new code lives in `xrspatial/polygonize.py` as internal functions. |
| 80 | + |
| 81 | +### New functions |
| 82 | + |
| 83 | +| Function | Decorator | Purpose | |
| 84 | +|---|---|---| |
| 85 | +| `_find_junctions(all_rings)` | pure Python | Scan rings, return set of junction coords | |
| 86 | +| `_split_ring_at_junctions(ring, junctions)` | pure Python | Break one ring into chains at junctions | |
| 87 | +| `_normalize_chain(chain)` | pure Python | Canonical key for deduplication | |
| 88 | +| `_douglas_peucker(coords, tolerance)` | `@ngjit` | DP simplification on Nx2 array | |
| 89 | +| `_simplify_polygons(polygon_points, tolerance)` | pure Python | Orchestrator: junctions -> split -> DP -> reassemble | |
| 90 | + |
| 91 | +### Integration point |
| 92 | + |
| 93 | +In `polygonize()`, after the `mapper(raster)(...)` call returns |
| 94 | +`(column, polygon_points)` and before the return-type conversion block: |
| 95 | + |
| 96 | +```python |
| 97 | +if simplify_tolerance and simplify_tolerance > 0: |
| 98 | + polygon_points = _simplify_polygons(polygon_points, simplify_tolerance) |
| 99 | +``` |
| 100 | + |
| 101 | +### Backend behavior |
| 102 | + |
| 103 | +- **NumPy / CuPy:** simplification runs on CPU-side coordinate arrays |
| 104 | + returned by boundary tracing (CuPy already transfers to CPU for tracing). |
| 105 | +- **Dask:** simplification runs after `_merge_chunk_polygons()`, on the |
| 106 | + fully merged result. |
| 107 | +- No GPU-side simplification. Boundary tracing is already CPU-bound; |
| 108 | + simplification follows the same pattern. |
| 109 | + |
| 110 | +## Constraints |
| 111 | + |
| 112 | +- No Visvalingam-Whyatt yet. The `simplify_method` parameter is present |
| 113 | + in the API for forward compatibility; passing `"visvalingam-whyatt"` |
| 114 | + raises `NotImplementedError`. |
| 115 | +- No streaming simplification. The full polygon set must fit in memory, |
| 116 | + same constraint as existing boundary tracing. |
| 117 | +- Minimum ring size after simplification: exterior rings keep at least 4 |
| 118 | + vertices (3 unique + closing). Degenerate rings (area below tolerance |
| 119 | + squared) are dropped. |
| 120 | + |
| 121 | +## Testing |
| 122 | + |
| 123 | +- Correctness: known 4x4 raster, verify simplified polygon areas match |
| 124 | + originals (simplification must not change topology, only vertex count). |
| 125 | +- Vertex reduction: verify output has fewer vertices than unsimplified. |
| 126 | +- Topology: verify no gaps between adjacent polygons (union of simplified |
| 127 | + polygons equals union of originals, within floating-point tolerance). |
| 128 | +- Edge cases: tolerance=0, tolerance=None, negative tolerance, single-pixel |
| 129 | + raster, raster with one uniform value. |
| 130 | +- Backend parity: numpy and dask produce same results. |
| 131 | +- Return types: simplification works with all five return types. |
| 132 | + |
| 133 | +## Out of scope |
| 134 | + |
| 135 | +- Visvalingam-Whyatt implementation (future PR). |
| 136 | +- GPU-accelerated simplification. |
| 137 | +- Per-chunk simplification for dask (simplification is post-merge only). |
| 138 | +- Area-weighted simplification or other adaptive tolerance schemes. |
0 commit comments