Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,10 +223,11 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset
**Consistency:** 100% pixel-exact match vs rioxarray on all tested files (Landsat 8, Copernicus DEM, USGS 1-arc-second, USGS 1-meter).

-----------
### **Reproject / Merge**
### **Reproject / Merge / Resample**

| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
|:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:|
| [Resample](xrspatial/resample.py) | Changes raster resolution (cell size) without reprojection. Nearest, bilinear, cubic, average, mode, min, max, median methods | Standard (interpolation / block aggregation) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS with Numba JIT / CUDA coordinate transforms and resampling. Supports vertical datums (EGM96, EGM2008) and horizontal datum shifts (NAD27, OSGB36, etc.) | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ |
| [Merge](xrspatial/reproject/__init__.py) | Merges multiple rasters into a single mosaic with configurable overlap strategy | Standard (mosaic) | ✅️ | ✅️ | 🔄 | 🔄 |

Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Reference
multispectral
pathfinding
proximity
resample
surface
terrain_metrics
utilities
Expand Down
14 changes: 14 additions & 0 deletions docs/source/reference/resample.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _reference.resample:

********
Resample
********

Change raster resolution (cell size) without changing its CRS.

Resample
========
.. autosummary::
:toctree: _autosummary

xrspatial.resample.resample
138 changes: 138 additions & 0 deletions docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Polygonize Geometry Simplification

**Issue:** #1151
**Date:** 2026-04-06

## Problem

`polygonize()` produces exact pixel-boundary polygons. On high-resolution
rasters this creates dense geometries with thousands of vertices per polygon,
making output slow to render, large on disk, and unwieldy for spatial joins.

The current workaround chains GDAL's `gdal_polygonize.py` with
`ogr2ogr -simplify`, adding an external dependency and intermediate file.

## API

Two new parameters on `polygonize()`:

```python
def polygonize(
raster, mask=None, connectivity=4, transform=None,
column_name="DN", return_type="numpy",
simplify_tolerance=None, # float, coordinate units
simplify_method="douglas-peucker", # str
):
```

- `simplify_tolerance=None` or `0.0`: no simplification (backward compatible).
- `simplify_tolerance > 0`: apply Douglas-Peucker with given tolerance.
- `simplify_method="visvalingam-whyatt"`: raises `NotImplementedError`.
- Negative tolerance raises `ValueError`.

## Algorithm: Shared-Edge Douglas-Peucker

Topology-preserving simplification via shared-edge decomposition, same
approach used by TopoJSON and GRASS `v.generalize`.

### Pipeline position

Simplification runs between boundary tracing and output conversion:

```
CCL -> boundary tracing -> [simplification] -> output conversion
```

For dask backends, simplification runs after chunk merging.

### Steps

1. **Find junctions.** Scan all ring vertices. A junction is any coordinate
that appears as a vertex in 3 or more distinct rings. These points are
pinned and never removed by simplification.

2. **Split rings into edge chains.** Walk each ring and split at junction
vertices. Each resulting chain connects two junctions (or forms a closed
loop when the ring contains no junctions). Each chain is shared by at
most 2 adjacent polygons.

3. **Deduplicate chains.** Normalize each chain by its sorted endpoint pair
so shared edges between adjacent polygons are identified and simplified
only once.

4. **Simplify each chain.** Apply Douglas-Peucker to each unique chain.
Junction endpoints are fixed. The DP implementation is numba-compiled
(`@ngjit`) for performance on large coordinate arrays.

5. **Reassemble rings.** Replace each ring's chain segments with their
simplified versions and rebuild the ring coordinate arrays.

### Why this preserves topology

Adjacent polygons reference the same physical edge chain. Simplifying
each chain once means both neighbors get identical simplified boundaries.
No gaps or overlaps can arise because there is no independent simplification
of shared geometry.

## Implementation

All new code lives in `xrspatial/polygonize.py` as internal functions.

### New functions

| Function | Decorator | Purpose |
|---|---|---|
| `_find_junctions(all_rings)` | pure Python | Scan rings, return set of junction coords |
| `_split_ring_at_junctions(ring, junctions)` | pure Python | Break one ring into chains at junctions |
| `_normalize_chain(chain)` | pure Python | Canonical key for deduplication |
| `_douglas_peucker(coords, tolerance)` | `@ngjit` | DP simplification on Nx2 array |
| `_simplify_polygons(polygon_points, tolerance)` | pure Python | Orchestrator: junctions -> split -> DP -> reassemble |

### Integration point

In `polygonize()`, after the `mapper(raster)(...)` call returns
`(column, polygon_points)` and before the return-type conversion block:

```python
if simplify_tolerance and simplify_tolerance > 0:
polygon_points = _simplify_polygons(polygon_points, simplify_tolerance)
```

### Backend behavior

- **NumPy / CuPy:** simplification runs on CPU-side coordinate arrays
returned by boundary tracing (CuPy already transfers to CPU for tracing).
- **Dask:** simplification runs after `_merge_chunk_polygons()`, on the
fully merged result.
- No GPU-side simplification. Boundary tracing is already CPU-bound;
simplification follows the same pattern.

## Constraints

- No Visvalingam-Whyatt yet. The `simplify_method` parameter is present
in the API for forward compatibility; passing `"visvalingam-whyatt"`
raises `NotImplementedError`.
- No streaming simplification. The full polygon set must fit in memory,
same constraint as existing boundary tracing.
- Minimum ring size after simplification: exterior rings keep at least 4
vertices (3 unique + closing). Degenerate rings (area below tolerance
squared) are dropped.

## Testing

- Correctness: known 4x4 raster, verify simplified polygon areas match
originals (simplification must not change topology, only vertex count).
- Vertex reduction: verify output has fewer vertices than unsimplified.
- Topology: verify no gaps between adjacent polygons (union of simplified
polygons equals union of originals, within floating-point tolerance).
- Edge cases: tolerance=0, tolerance=None, negative tolerance, single-pixel
raster, raster with one uniform value.
- Backend parity: numpy and dask produce same results.
- Return types: simplification works with all five return types.

## Out of scope

- Visvalingam-Whyatt implementation (future PR).
- GPU-accelerated simplification.
- Per-chunk simplification for dask (simplification is post-merge only).
- Area-weighted simplification or other adaptive tolerance schemes.
210 changes: 210 additions & 0 deletions examples/user_guide/50_Resample.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Raster Resampling\n",
"\n",
"The `resample` function changes a raster's resolution (cell size) without\n",
"changing its CRS. This is the operation you'd reach for when you need to\n",
"match two rasters to a common grid or reduce a raster's memory footprint\n",
"before analysis.\n",
"\n",
"**Methods**:\n",
"\n",
"| Method | Direction | Best for |\n",
"|--------|-----------|----------|\n",
"| `nearest` | up/down | Categorical data, fast preview |\n",
"| `bilinear` | up/down | Smooth continuous surfaces |\n",
"| `cubic` | up/down | High-quality continuous surfaces |\n",
"| `average` | down only | Aggregating high-res to low-res |\n",
"| `min`, `max` | down only | Extremes within each output cell |\n",
"| `median` | down only | Robust centre, ignores outliers |\n",
"| `mode` | down only | Majority class in categorical rasters |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import xarray as xr\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from xrspatial import resample\n",
"from xrspatial.terrain import generate_terrain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate synthetic terrain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dem = generate_terrain(width=200, height=200)\n",
"# Assign a regular coordinate grid\n",
"dem = dem.assign_coords(\n",
" y=np.linspace(100, 0, dem.sizes['y']),\n",
" x=np.linspace(0, 100, dem.sizes['x']),\n",
")\n",
"dem.attrs['res'] = (0.5, 0.5)\n",
"\n",
"fig, ax = plt.subplots(figsize=(6, 5))\n",
"dem.plot(ax=ax, cmap='terrain')\n",
"ax.set_title(f'Original DEM ({dem.shape[0]}x{dem.shape[1]}, res={dem.attrs[\"res\"][0]:.1f}m)')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Downsample with `scale_factor`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"down = resample(dem, scale_factor=0.25, method='bilinear')\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n",
"dem.plot(ax=axes[0], cmap='terrain')\n",
"axes[0].set_title(f'Original ({dem.shape[0]}x{dem.shape[1]})')\n",
"down.plot(ax=axes[1], cmap='terrain')\n",
"axes[1].set_title(f'Downsampled 4x ({down.shape[0]}x{down.shape[1]})')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upsample with `target_resolution`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"up = resample(down, target_resolution=0.5, method='cubic')\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n",
"down.plot(ax=axes[0], cmap='terrain')\n",
"axes[0].set_title(f'Coarse ({down.shape[0]}x{down.shape[1]})')\n",
"up.plot(ax=axes[1], cmap='terrain')\n",
"axes[1].set_title(f'Upsampled to 0.5m ({up.shape[0]}x{up.shape[1]})')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compare resampling methods"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"methods = ['nearest', 'bilinear', 'cubic', 'average']\n",
"fig, axes = plt.subplots(1, 4, figsize=(16, 4))\n",
"\n",
"for ax, method in zip(axes, methods):\n",
" out = resample(dem, scale_factor=0.1, method=method)\n",
" out.plot(ax=ax, cmap='terrain', add_colorbar=False)\n",
" ax.set_title(method)\n",
" ax.set_aspect('equal')\n",
"\n",
"plt.suptitle('Downsample 10x with different methods', y=1.02)\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Categorical raster with `mode`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from xrspatial import equal_interval\n",
"\n",
"# Classify elevation into 5 zones\n",
"classes = equal_interval(dem, k=5)\n",
"classes.attrs = dem.attrs.copy()\n",
"classes = classes.assign_coords(dem.coords)\n",
"\n",
"# Downsample: mode preserves class boundaries\n",
"classes_down = resample(classes.astype('float32'),\n",
" scale_factor=0.2, method='mode')\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n",
"classes.plot(ax=axes[0], cmap='Set2')\n",
"axes[0].set_title(f'Classes ({classes.shape[0]}x{classes.shape[1]})')\n",
"classes_down.plot(ax=axes[1], cmap='Set2')\n",
"axes[1].set_title(f'Mode downsample ({classes_down.shape[0]}x{classes_down.shape[1]})')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Works with Dask"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import dask.array as da\n",
"\n",
"dask_dem = dem.copy()\n",
"dask_dem.data = da.from_array(dem.values, chunks=(100, 100))\n",
"\n",
"result = resample(dask_dem, scale_factor=0.5, method='bilinear')\n",
"print(f'Input: {dask_dem.shape} (dask, chunks={dask_dem.data.chunksize})')\n",
"print(f'Output: {result.shape} (dask, chunks={result.data.chunksize})')\n",
"print(f'Computed shape: {result.compute().shape}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.14.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
1 change: 1 addition & 0 deletions xrspatial/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@
from xrspatial.preview import preview # noqa
from xrspatial.proximity import allocation # noqa
from xrspatial.rasterize import rasterize # noqa
from xrspatial.resample import resample # noqa
from xrspatial.proximity import direction # noqa
from xrspatial.proximity import euclidean_distance # noqa
from xrspatial.proximity import great_circle_distance # noqa
Expand Down
Loading
Loading