You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Fix OOM in visibility module for dask-backed rasters
_extract_transect was calling .compute() on the full dask array just to
read a handful of transect cells. Now uses vindex fancy indexing so only
the relevant chunks are materialized.
cumulative_viewshed was allocating a full-size np.zeros count array and
calling .values on each viewshed result, forcing materialization every
iteration. Now accumulates lazily with da.zeros and dask array addition
when the input is dask-backed.
* Tighten viewshed Tier B memory estimate and avoid needless copy
The dask Tier B memory guard underestimated peak usage at 280 bytes/pixel.
Actual peak during lexsort reaches ~360 bytes/pixel (sorted + unsorted
event_list coexist) plus 8 bytes/pixel for the computed raster. Updated
estimate to 368 bytes/pixel to prevent borderline OOM.
Also use astype(copy=False) to skip the float64 copy when data is already
float64.
* Add kernel density estimation module (#1143)
Implements kde() and line_density() for point-to-raster and
line-to-raster density surfaces. Supports Gaussian, Epanechnikov,
and quartic kernels with automatic bandwidth selection via
Silverman's rule. All four backends: numpy, cupy, dask+numpy,
dask+cupy.
* Add KDE test suite and fix CUDA kernel cutoff (#1143)
35 tests covering correctness, edge cases, kernel types,
bandwidth selection, weights, and cross-backend parity
(dask+numpy, cupy). Removes hard cutoff from GPU Gaussian
kernel to avoid box-vs-circle mismatch with CPU.
* Add KDE docs and exports (#1143)
Creates docs/source/reference/kde.rst with autosummary entries
for kde() and line_density(). Adds both functions to __init__.py
and the docs toctree.
* Add KDE user guide notebook (#1143)
Covers Gaussian/Epanechnikov/quartic kernels, bandwidth effects,
weighted KDE, and line density with synthetic earthquake and
road network data.
* Add KDE section to README feature matrix (#1143)
"source": "# Xarray-Spatial KDE: Point density, kernel shapes, and line density\n\nTurning scattered point data into a continuous surface is one of the most common spatial analysis tasks. It comes up whenever you have event locations (earthquakes, crimes, disease cases) and want a smooth density map instead of a dot plot. xrspatial's KDE tools produce density rasters from raw coordinates, with configurable kernels and bandwidth.",
7
+
"metadata": {}
8
+
},
9
+
{
10
+
"cell_type": "markdown",
11
+
"id": "ee7bb1d4",
12
+
"source": "### What you'll build\n\n1. Generate synthetic earthquake cluster data\n2. Compute a Gaussian density surface with `kde()`\n3. Compare three kernel shapes: Gaussian, Epanechnikov, and quartic\n4. See how bandwidth controls smoothness\n5. Use per-point weights for magnitude-weighted density\n6. Map road density with `line_density()`\n\n\n\n[Gaussian KDE](#Gaussian-KDE) · [Kernel comparison](#Kernel-comparison) · [Bandwidth effects](#Bandwidth-effects) · [Weighted KDE](#Weighted-KDE) · [Line density](#Line-density)",
13
+
"metadata": {}
14
+
},
15
+
{
16
+
"cell_type": "markdown",
17
+
"id": "92a81141",
18
+
"source": "Standard imports plus the two KDE functions.",
19
+
"metadata": {}
20
+
},
21
+
{
22
+
"cell_type": "code",
23
+
"id": "a1764753",
24
+
"source": "import numpy as np\nimport xarray as xr\n\nimport matplotlib.pyplot as plt\nfrom matplotlib.patches import Patch\n\nfrom xrspatial.kde import kde, line_density",
25
+
"metadata": {},
26
+
"execution_count": null,
27
+
"outputs": []
28
+
},
29
+
{
30
+
"cell_type": "markdown",
31
+
"id": "965e4d4d",
32
+
"source": "## Synthetic earthquake data\n\nThree clusters of varying tightness and size, plus some background scatter. Each point gets a random magnitude between 2 and 6 that we'll use as weights later.",
"source": "## Gaussian KDE\n\n[Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) places a smooth kernel function at each data point and sums the contributions at every output pixel. The Gaussian kernel is the most common choice because it produces infinitely smooth surfaces with no hard edges.\n\nThe plot below shows the density surface with the original points overlaid. Brighter areas have more points per unit area.",
"source": "The three clusters show up clearly. The tight cluster in the lower middle produces the sharpest peak, while the spread-out upper-right cluster has a broader, lower dome.",
75
+
"metadata": {}
76
+
},
77
+
{
78
+
"cell_type": "markdown",
79
+
"id": "d9a7148a",
80
+
"source": "## Kernel comparison\n\nThree kernel shapes are available:\n\n- **Gaussian**: bell curve, infinite range, always smooth\n- **Epanechnikov**: parabolic, drops to zero at the bandwidth radius, statistically optimal for minimizing mean integrated squared error\n- **Quartic** (biweight): similar compact support but smoother falloff than Epanechnikov\n\nThe next plot shows all three side by side on the same data and bandwidth.",
81
+
"metadata": {}
82
+
},
83
+
{
84
+
"cell_type": "code",
85
+
"id": "cff98669",
86
+
"source": "kernels = ['gaussian', 'epanechnikov', 'quartic']\nresults = {}\nfor k in kernels:\n results[k] = kde(x, y, bandwidth=0.8, kernel=k,\n x_range=(0, 10), y_range=(0, 10), width=300, height=300)\n\nfig, axes = plt.subplots(1, 3, figsize=(15, 5))\nfor ax, k in zip(axes, kernels):\n results[k].plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n ax.set_title(k.capitalize(), fontsize=13)\n ax.set_xticks([])\n ax.set_yticks([])\n ax.set_xlabel('')\n ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
87
+
"metadata": {},
88
+
"execution_count": null,
89
+
"outputs": []
90
+
},
91
+
{
92
+
"cell_type": "markdown",
93
+
"id": "69773ca7",
94
+
"source": "The Gaussian surface bleeds out to the edges because it has infinite range. Epanechnikov and quartic cut off at exactly the bandwidth radius, leaving the corners at zero. For most practical work the differences are small; pick Gaussian for smoothness, Epanechnikov if you need a hard cutoff boundary.",
95
+
"metadata": {}
96
+
},
97
+
{
98
+
"cell_type": "markdown",
99
+
"id": "d3e9f621",
100
+
"source": "## Bandwidth effects\n\nBandwidth controls how far each point's influence reaches. Too narrow and you get spiky noise; too wide and real structure gets washed out. The default `'silverman'` rule works well for unimodal data but can oversmooth multimodal clusters.\n\nBelow: four bandwidths from very narrow (0.2) to very wide (2.0).",
"source": "At bw=0.2 you can pick out individual points. At bw=2.0 the three clusters have merged into a single blob. Something around 0.5 to 0.8 captures the cluster structure without too much noise.\n\n<div class=\"alert alert-block alert-warning\">\n<b>Units matter.</b> Bandwidth is in the same units as your coordinates. If x and y are in meters, a bandwidth of 500 means 500 m. If they're in degrees, 0.01 means roughly 1 km at mid-latitudes. Always think about what physical distance makes sense for your data before picking a number.\n</div>",
115
+
"metadata": {}
116
+
},
117
+
{
118
+
"cell_type": "markdown",
119
+
"id": "2465dd33",
120
+
"source": "## Weighted KDE\n\nNot all points are equal. An earthquake of magnitude 5 releases 30x more energy than a magnitude 4. Passing per-point weights lets the density surface reflect importance, not just count.\n\nBelow: unweighted (left) vs magnitude-weighted (right). The weighted surface shifts toward wherever the larger events happened to fall.",
"source": "## Line density\n\n`line_density()` works the same way but for linear features. Each line segment is uniformly sampled and the samples are convolved with the kernel. The result shows where lines are concentrated, useful for things like road network analysis or fault-line mapping.\n\nInput is four arrays: start x, start y, end x, end y for each segment. The plot below shows a synthetic road grid.",
135
+
"metadata": {}
136
+
},
137
+
{
138
+
"cell_type": "code",
139
+
"id": "eab23452",
140
+
"source": "# Synthetic road segments: a loose grid with some randomness\nroad_x1, road_y1, road_x2, road_y2 = [], [], [], []\n\n# Horizontal roads\nfor yy in np.linspace(1, 9, 6):\n offset = rng.normal(0, 0.2)\n road_x1.append(0.5)\n road_y1.append(yy + offset)\n road_x2.append(9.5)\n road_y2.append(yy + rng.normal(0, 0.3))\n\n# Vertical roads (fewer in the east = lower density there)\nfor xx in np.linspace(1, 5, 6):\n road_x1.append(xx + rng.normal(0, 0.1))\n road_y1.append(0.5)\n road_x2.append(xx + rng.normal(0, 0.1))\n road_y2.append(9.5)\n\nfor xx in np.linspace(6, 9, 2):\n road_x1.append(xx)\n road_y1.append(0.5)\n road_x2.append(xx)\n road_y2.append(9.5)\n\nroad_x1 = np.array(road_x1)\nroad_y1 = np.array(road_y1)\nroad_x2 = np.array(road_x2)\nroad_y2 = np.array(road_y2)\n\nld = line_density(road_x1, road_y1, road_x2, road_y2,\n bandwidth=0.8,\n x_range=(0, 10), y_range=(0, 10),\n width=300, height=300)\n\nfig, ax = plt.subplots(figsize=(10, 7.5))\nld.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n# Draw the actual road segments on top\nfor i in range(len(road_x1)):\n ax.plot([road_x1[i], road_x2[i]], [road_y1[i], road_y2[i]],\n color='white', alpha=0.4, linewidth=0.8)\nax.set_title('')\nax.set_xlabel('')\nax.set_ylabel('')\nax.set_xticks([])\nax.set_yticks([])\nax.legend(handles=[Patch(facecolor='yellow', alpha=0.78, label='High road density'),\n Patch(facecolor='white', alpha=0.4, label='Road segments')],\n loc='lower right', fontsize=11, framealpha=0.9)\nplt.show()",
141
+
"metadata": {},
142
+
"execution_count": null,
143
+
"outputs": []
144
+
},
145
+
{
146
+
"cell_type": "markdown",
147
+
"id": "43046e13",
148
+
"source": "The denser grid on the west side shows up clearly as a brighter band. Intersections where horizontal and vertical roads cross produce the brightest hotspots.",
149
+
"metadata": {}
150
+
},
151
+
{
152
+
"cell_type": "markdown",
153
+
"id": "d1036afc",
154
+
"source": "<div class=\"alert alert-block alert-info\">\n<b>Template grids.</b> Both <code>kde()</code> and <code>line_density()</code> accept a <code>template</code> DataArray instead of <code>x_range</code>/<code>y_range</code>/<code>width</code>/<code>height</code>. The output will match the template's shape, extent, and coordinates. If the template is dask-backed, the output is computed lazily in chunks.\n</div>",
155
+
"metadata": {}
156
+
},
157
+
{
158
+
"cell_type": "markdown",
159
+
"id": "5dd32bc7",
160
+
"source": "### References\n\n- [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation), Wikipedia\n- [Silverman, B.W. (1986). *Density Estimation for Statistics and Data Analysis*.](https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf) Chapman & Hall.\n- [Epanechnikov, V.A. (1969). Non-parametric estimation of a multivariate probability density.](https://doi.org/10.1137/1114019) *Theory of Probability & Its Applications*, 14(1), 153-158.",
0 commit comments