|
25 | 25 | "cell_type": "code", |
26 | 26 | "execution_count": null, |
27 | 27 | "id": "3a06tk9sxuh", |
28 | | - "metadata": {}, |
| 28 | + "metadata": { |
| 29 | + "execution": { |
| 30 | + "iopub.execute_input": "2026-03-09T15:30:49.123715Z", |
| 31 | + "iopub.status.busy": "2026-03-09T15:30:49.123629Z", |
| 32 | + "iopub.status.idle": "2026-03-09T15:30:50.272830Z", |
| 33 | + "shell.execute_reply": "2026-03-09T15:30:50.272259Z" |
| 34 | + } |
| 35 | + }, |
29 | 36 | "outputs": [], |
30 | 37 | "source": [ |
31 | 38 | "import numpy as np\n", |
|
48 | 55 | "cell_type": "code", |
49 | 56 | "execution_count": null, |
50 | 57 | "id": "mqtgqmsnvlp", |
51 | | - "metadata": {}, |
| 58 | + "metadata": { |
| 59 | + "execution": { |
| 60 | + "iopub.execute_input": "2026-03-09T15:30:50.274712Z", |
| 61 | + "iopub.status.busy": "2026-03-09T15:30:50.274450Z", |
| 62 | + "iopub.status.idle": "2026-03-09T15:30:50.338586Z", |
| 63 | + "shell.execute_reply": "2026-03-09T15:30:50.338001Z" |
| 64 | + } |
| 65 | + }, |
52 | 66 | "outputs": [], |
53 | 67 | "source": [ |
54 | 68 | "# Build a 100x100 raster with four texture quadrants\n", |
|
93 | 107 | "cell_type": "code", |
94 | 108 | "execution_count": null, |
95 | 109 | "id": "2fgm4xz3uwj", |
96 | | - "metadata": {}, |
| 110 | + "metadata": { |
| 111 | + "execution": { |
| 112 | + "iopub.execute_input": "2026-03-09T15:30:50.339809Z", |
| 113 | + "iopub.status.busy": "2026-03-09T15:30:50.339694Z", |
| 114 | + "iopub.status.idle": "2026-03-09T15:30:51.046768Z", |
| 115 | + "shell.execute_reply": "2026-03-09T15:30:51.046259Z" |
| 116 | + } |
| 117 | + }, |
97 | 118 | "outputs": [], |
98 | 119 | "source": [ |
99 | 120 | "contrast = glcm_texture(agg, metric='contrast', window_size=7, levels=64)\n", |
|
122 | 143 | "cell_type": "code", |
123 | 144 | "execution_count": null, |
124 | 145 | "id": "7ah4bm6v5ut", |
125 | | - "metadata": {}, |
| 146 | + "metadata": { |
| 147 | + "execution": { |
| 148 | + "iopub.execute_input": "2026-03-09T15:30:51.047924Z", |
| 149 | + "iopub.status.busy": "2026-03-09T15:30:51.047834Z", |
| 150 | + "iopub.status.idle": "2026-03-09T15:30:51.568614Z", |
| 151 | + "shell.execute_reply": "2026-03-09T15:30:51.567915Z" |
| 152 | + } |
| 153 | + }, |
126 | 154 | "outputs": [], |
127 | 155 | "source": [ |
128 | 156 | "metrics = ['contrast', 'dissimilarity', 'homogeneity', 'energy', 'correlation', 'entropy']\n", |
|
160 | 188 | "cell_type": "code", |
161 | 189 | "execution_count": null, |
162 | 190 | "id": "nkzw9wmyxio", |
163 | | - "metadata": {}, |
| 191 | + "metadata": { |
| 192 | + "execution": { |
| 193 | + "iopub.execute_input": "2026-03-09T15:30:51.570036Z", |
| 194 | + "iopub.status.busy": "2026-03-09T15:30:51.569932Z", |
| 195 | + "iopub.status.idle": "2026-03-09T15:30:51.920501Z", |
| 196 | + "shell.execute_reply": "2026-03-09T15:30:51.919825Z" |
| 197 | + } |
| 198 | + }, |
164 | 199 | "outputs": [], |
165 | 200 | "source": [ |
166 | 201 | "fig, axes = plt.subplots(1, 4, figsize=(16, 4))\n", |
|
188 | 223 | "cell_type": "code", |
189 | 224 | "execution_count": null, |
190 | 225 | "id": "keurc3dmugs", |
191 | | - "metadata": {}, |
| 226 | + "metadata": { |
| 227 | + "execution": { |
| 228 | + "iopub.execute_input": "2026-03-09T15:30:51.921620Z", |
| 229 | + "iopub.status.busy": "2026-03-09T15:30:51.921522Z", |
| 230 | + "iopub.status.idle": "2026-03-09T15:30:52.045363Z", |
| 231 | + "shell.execute_reply": "2026-03-09T15:30:52.044567Z" |
| 232 | + } |
| 233 | + }, |
192 | 234 | "outputs": [], |
193 | 235 | "source": [ |
194 | 236 | "import dask.array as da\n", |
|
228 | 270 | "\n", |
229 | 271 | "Lower `levels` values run faster but lose gray-level resolution. For most remote sensing work, 32-64 levels is a good balance." |
230 | 272 | ] |
| 273 | + }, |
| 274 | + { |
| 275 | + "cell_type": "markdown", |
| 276 | + "id": "r8ezsmezbt9", |
| 277 | + "source": [ |
| 278 | + "## Ok, how do we apply this to GIS?\n", |
| 279 | + "\n", |
| 280 | + "The synthetic quadrants above make the math obvious, but the real payoff is on actual imagery. Water and land have very different spatial textures in satellite data: water is smooth and uniform, land is rough and varied. GLCM features capture that difference even when raw pixel brightness alone is ambiguous (shadows, mudflats, dark pavement).\n", |
| 281 | + "\n", |
| 282 | + "Below we'll grab a Sentinel-2 NIR band crop of the San Francisco Bay coastline, compute GLCM texture features, and use KMeans clustering to classify water vs. land \u2014 no training labels required." |
| 283 | + ], |
| 284 | + "metadata": {} |
| 285 | + }, |
| 286 | + { |
| 287 | + "cell_type": "markdown", |
| 288 | + "id": "ec79xdunce9", |
| 289 | + "source": [ |
| 290 | + "### Step 1 \u2014 Download a Sentinel-2 NIR band\n", |
| 291 | + "\n", |
| 292 | + "We read a 500 x 500 pixel window (5 km x 5 km at 10 m resolution) straight from a\n", |
| 293 | + "Cloud-Optimized GeoTIFF hosted on AWS. The scene is\n", |
| 294 | + "[S2B_10SEG_20230921](https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/10/S/EG/2023/9/S2B_10SEG_20230921_0_L2A/),\n", |
| 295 | + "a near-cloudless capture of the Bay Area from September 2023.\n", |
| 296 | + "\n", |
| 297 | + "If the remote file is unreachable, a synthetic coastal raster is used instead\n", |
| 298 | + "so the rest of the notebook still runs." |
| 299 | + ], |
| 300 | + "metadata": {} |
| 301 | + }, |
| 302 | + { |
| 303 | + "cell_type": "code", |
| 304 | + "id": "2hogpjtllw7", |
| 305 | + "source": [ |
| 306 | + "import os\n", |
| 307 | + "import rioxarray\n", |
| 308 | + "\n", |
| 309 | + "os.environ['AWS_NO_SIGN_REQUEST'] = 'YES'\n", |
| 310 | + "os.environ['GDAL_DISABLE_READDIR_ON_OPEN'] = 'EMPTY_DIR'\n", |
| 311 | + "\n", |
| 312 | + "COG_URL = (\n", |
| 313 | + " 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/'\n", |
| 314 | + " 'sentinel-s2-l2a-cogs/10/S/EG/2023/9/'\n", |
| 315 | + " 'S2B_10SEG_20230921_0_L2A/B08.tif'\n", |
| 316 | + ")\n", |
| 317 | + "\n", |
| 318 | + "try:\n", |
| 319 | + " nir_da = rioxarray.open_rasterio(COG_URL).isel(band=0, y=slice(2100, 2600), x=slice(5300, 5800))\n", |
| 320 | + " nir = nir_da.load().values.astype(np.float64)\n", |
| 321 | + " print(f'Downloaded NIR band: {nir.shape}, range {nir.min():.0f}\u2013{nir.max():.0f}')\n", |
| 322 | + "except Exception as exc:\n", |
| 323 | + " print(f'Remote read failed ({exc}), using synthetic fallback')\n", |
| 324 | + " rng_sat = np.random.default_rng(99)\n", |
| 325 | + " nir = np.zeros((500, 500), dtype=np.float64)\n", |
| 326 | + " # Water: low reflectance, smooth\n", |
| 327 | + " nir[:, 250:] = rng_sat.normal(80, 10, (500, 250)).clip(20, 200)\n", |
| 328 | + " # Land: high reflectance, rough\n", |
| 329 | + " nir[:, :250] = rng_sat.normal(1800, 400, (500, 250)).clip(300, 4000)\n", |
| 330 | + "\n", |
| 331 | + "satellite = xr.DataArray(nir, dims=['y', 'x'])\n", |
| 332 | + "\n", |
| 333 | + "fig, ax = plt.subplots(figsize=(6, 6))\n", |
| 334 | + "ax.imshow(nir, cmap='gray', vmin=0, vmax=np.percentile(nir, 98))\n", |
| 335 | + "ax.set_title('Sentinel-2 NIR \u2014 SF Bay coastline')\n", |
| 336 | + "plt.tight_layout()\n", |
| 337 | + "plt.show()" |
| 338 | + ], |
| 339 | + "metadata": { |
| 340 | + "execution": { |
| 341 | + "iopub.execute_input": "2026-03-09T15:30:52.046807Z", |
| 342 | + "iopub.status.busy": "2026-03-09T15:30:52.046700Z", |
| 343 | + "iopub.status.idle": "2026-03-09T15:30:53.261434Z", |
| 344 | + "shell.execute_reply": "2026-03-09T15:30:53.260895Z" |
| 345 | + } |
| 346 | + }, |
| 347 | + "execution_count": null, |
| 348 | + "outputs": [] |
| 349 | + }, |
| 350 | + { |
| 351 | + "cell_type": "markdown", |
| 352 | + "id": "joxz7n8olpc", |
| 353 | + "source": [ |
| 354 | + "### Step 2 \u2014 Compute GLCM texture features\n", |
| 355 | + "\n", |
| 356 | + "We pick four metrics that tend to separate water (uniform, high energy, high homogeneity) from land (rough, high contrast):\n", |
| 357 | + "\n", |
| 358 | + "| Metric | Water (smooth) | Land (rough) |\n", |
| 359 | + "|---|---|---|\n", |
| 360 | + "| contrast | low | high |\n", |
| 361 | + "| homogeneity | high | low |\n", |
| 362 | + "| energy | high | low |\n", |
| 363 | + "| correlation | high (uniform) | lower (varied) |\n", |
| 364 | + "\n", |
| 365 | + "An 11 x 11 window gives enough spatial context at 10 m resolution (110 m footprint)." |
| 366 | + ], |
| 367 | + "metadata": {} |
| 368 | + }, |
| 369 | + { |
| 370 | + "cell_type": "code", |
| 371 | + "id": "ytdru4ssilp", |
| 372 | + "source": [ |
| 373 | + "sat_metrics = ['contrast', 'homogeneity', 'energy', 'correlation']\n", |
| 374 | + "sat_textures = glcm_texture(satellite, metric=sat_metrics, window_size=11, levels=64)\n", |
| 375 | + "\n", |
| 376 | + "fig, axes = plt.subplots(1, 4, figsize=(18, 4))\n", |
| 377 | + "for ax, name in zip(axes, sat_metrics):\n", |
| 378 | + " vals = sat_textures.sel(metric=name).values\n", |
| 379 | + " im = ax.imshow(vals, cmap='viridis')\n", |
| 380 | + " ax.set_title(name.capitalize())\n", |
| 381 | + " plt.colorbar(im, ax=ax, shrink=0.7)\n", |
| 382 | + "plt.suptitle('GLCM features on satellite NIR', fontsize=13, y=1.02)\n", |
| 383 | + "plt.tight_layout()\n", |
| 384 | + "plt.show()" |
| 385 | + ], |
| 386 | + "metadata": { |
| 387 | + "execution": { |
| 388 | + "iopub.execute_input": "2026-03-09T15:30:53.262766Z", |
| 389 | + "iopub.status.busy": "2026-03-09T15:30:53.262453Z", |
| 390 | + "iopub.status.idle": "2026-03-09T15:30:58.515207Z", |
| 391 | + "shell.execute_reply": "2026-03-09T15:30:58.514628Z" |
| 392 | + } |
| 393 | + }, |
| 394 | + "execution_count": null, |
| 395 | + "outputs": [] |
| 396 | + }, |
| 397 | + { |
| 398 | + "cell_type": "markdown", |
| 399 | + "id": "xya2v71uhb", |
| 400 | + "source": [ |
| 401 | + "### Step 3 \u2014 Classify water vs. land with KMeans\n", |
| 402 | + "\n", |
| 403 | + "We stack the four texture layers into a feature vector per pixel, normalize them,\n", |
| 404 | + "and let KMeans find two clusters. No training labels needed \u2014 the texture\n", |
| 405 | + "difference between water and land is large enough that unsupervised clustering\n", |
| 406 | + "picks it up cleanly.\n", |
| 407 | + "\n", |
| 408 | + "After clustering, we check which cluster has lower mean NIR reflectance and\n", |
| 409 | + "label that one \"water.\"" |
| 410 | + ], |
| 411 | + "metadata": {} |
| 412 | + }, |
| 413 | + { |
| 414 | + "cell_type": "code", |
| 415 | + "id": "keramnxr509", |
| 416 | + "source": [ |
| 417 | + "from sklearn.cluster import KMeans\n", |
| 418 | + "from sklearn.preprocessing import StandardScaler\n", |
| 419 | + "\n", |
| 420 | + "# Stack texture features: (H, W, 4)\n", |
| 421 | + "feature_stack = np.stack(\n", |
| 422 | + " [sat_textures.sel(metric=m).values for m in sat_metrics], axis=-1\n", |
| 423 | + ")\n", |
| 424 | + "h, w, n_feat = feature_stack.shape\n", |
| 425 | + "\n", |
| 426 | + "# Mask out NaN edges left by the GLCM window\n", |
| 427 | + "valid = ~np.any(np.isnan(feature_stack), axis=-1)\n", |
| 428 | + "X = feature_stack[valid]\n", |
| 429 | + "\n", |
| 430 | + "# Normalize and cluster\n", |
| 431 | + "X_scaled = StandardScaler().fit_transform(X)\n", |
| 432 | + "labels = KMeans(n_clusters=2, random_state=42, n_init=10).fit_predict(X_scaled)\n", |
| 433 | + "\n", |
| 434 | + "# Build the classification raster\n", |
| 435 | + "class_map = np.full((h, w), np.nan)\n", |
| 436 | + "class_map[valid] = labels\n", |
| 437 | + "\n", |
| 438 | + "# Assign \"water\" to the cluster with lower NIR reflectance\n", |
| 439 | + "cluster_nir = [np.nanmean(nir[class_map == c]) for c in range(2)]\n", |
| 440 | + "water_id = int(np.argmin(cluster_nir))\n", |
| 441 | + "\n", |
| 442 | + "water_mask = np.where(valid, class_map == water_id, np.nan)\n", |
| 443 | + "print(f'Water cluster: {water_id} (mean NIR {cluster_nir[water_id]:.0f})')\n", |
| 444 | + "print(f'Land cluster: {1 - water_id} (mean NIR {cluster_nir[1 - water_id]:.0f})')" |
| 445 | + ], |
| 446 | + "metadata": { |
| 447 | + "execution": { |
| 448 | + "iopub.execute_input": "2026-03-09T15:30:58.517250Z", |
| 449 | + "iopub.status.busy": "2026-03-09T15:30:58.517125Z", |
| 450 | + "iopub.status.idle": "2026-03-09T15:30:59.860739Z", |
| 451 | + "shell.execute_reply": "2026-03-09T15:30:59.860156Z" |
| 452 | + } |
| 453 | + }, |
| 454 | + "execution_count": null, |
| 455 | + "outputs": [] |
| 456 | + }, |
| 457 | + { |
| 458 | + "cell_type": "markdown", |
| 459 | + "id": "pbro14h2x4", |
| 460 | + "source": [ |
| 461 | + "### Result\n", |
| 462 | + "\n", |
| 463 | + "Left: the NIR image (bright = land, dark = water). Right: the GLCM-based classification. The shoreline comes through cleanly without any hand-drawn training polygons." |
| 464 | + ], |
| 465 | + "metadata": {} |
| 466 | + }, |
| 467 | + { |
| 468 | + "cell_type": "code", |
| 469 | + "id": "q1zl6bioi7e", |
| 470 | + "source": [ |
| 471 | + "from matplotlib.colors import ListedColormap\n", |
| 472 | + "\n", |
| 473 | + "fig, axes = plt.subplots(1, 2, figsize=(13, 6))\n", |
| 474 | + "\n", |
| 475 | + "# NIR image\n", |
| 476 | + "axes[0].imshow(nir, cmap='gray', vmin=0, vmax=np.percentile(nir, 98))\n", |
| 477 | + "axes[0].set_title('Sentinel-2 NIR band')\n", |
| 478 | + "\n", |
| 479 | + "# Classification\n", |
| 480 | + "water_cmap = ListedColormap(['#a6611a', '#2166ac']) # 0=land(brown), 1=water(blue)\n", |
| 481 | + "im = axes[1].imshow(\n", |
| 482 | + " np.where(valid, class_map == water_id, np.nan),\n", |
| 483 | + " cmap=water_cmap, vmin=0, vmax=1, interpolation='nearest',\n", |
| 484 | + ")\n", |
| 485 | + "axes[1].set_title('GLCM water / land classification')\n", |
| 486 | + "\n", |
| 487 | + "# Legend\n", |
| 488 | + "from matplotlib.patches import Patch\n", |
| 489 | + "axes[1].legend(\n", |
| 490 | + " handles=[Patch(color='#2166ac', label='Water'),\n", |
| 491 | + " Patch(color='#a6611a', label='Land')],\n", |
| 492 | + " loc='lower right', framealpha=0.9,\n", |
| 493 | + ")\n", |
| 494 | + "\n", |
| 495 | + "plt.tight_layout()\n", |
| 496 | + "plt.show()" |
| 497 | + ], |
| 498 | + "metadata": { |
| 499 | + "execution": { |
| 500 | + "iopub.execute_input": "2026-03-09T15:30:59.862223Z", |
| 501 | + "iopub.status.busy": "2026-03-09T15:30:59.861896Z", |
| 502 | + "iopub.status.idle": "2026-03-09T15:31:00.062349Z", |
| 503 | + "shell.execute_reply": "2026-03-09T15:31:00.061769Z" |
| 504 | + } |
| 505 | + }, |
| 506 | + "execution_count": null, |
| 507 | + "outputs": [] |
| 508 | + }, |
| 509 | + { |
| 510 | + "cell_type": "markdown", |
| 511 | + "id": "yqyikmy8is", |
| 512 | + "source": [ |
| 513 | + "### Where to go from here\n", |
| 514 | + "\n", |
| 515 | + "- **More clusters.** Bump `n_clusters` to 3 or 4 to split out mud flats, urban areas, or vegetation.\n", |
| 516 | + "- **Supervised classification.** Replace KMeans with a Random Forest trained on labeled polygons for higher accuracy.\n", |
| 517 | + "- **Larger areas.** Wrap the satellite DataArray in `dask.array.from_array` and `glcm_texture` will process it chunk-by-chunk, the same way we showed in the Dask section above.\n", |
| 518 | + "- **Multi-band features.** Compute GLCM on several spectral bands and stack the results for a richer feature space." |
| 519 | + ], |
| 520 | + "metadata": {} |
231 | 521 | } |
232 | 522 | ], |
233 | 523 | "metadata": { |
|
0 commit comments