Add applied GIS section to GLCM notebook (#993)

brendancol · brendancol · commit 7517c963f600 · 2026-03-09T08:00:05.000-07:00
Downloads a Sentinel-2 NIR band crop of the SF Bay coastline,
computes GLCM texture features, and classifies water vs. land
with KMeans clustering. Includes a synthetic fallback for
offline use.
diff --git a/examples/user_guide/25_GLCM_Texture.ipynb b/examples/user_guide/25_GLCM_Texture.ipynb
@@ -89,6 +89,74 @@
    "id": "vda99az5wo",
    "source": "## Parameters reference\n\n| Parameter | Default | Description |\n|---|---|---|\n| `metric` | `'contrast'` | One metric name (str) or a list of names |\n| `window_size` | `7` | Side length of the sliding window (must be odd, >= 3) |\n| `levels` | `64` | Number of gray levels for quantization (2-256) |\n| `distance` | `1` | Pixel pair distance |\n| `angle` | `None` | 0, 45, 90, 135, or None (average all four) |\n\nLower `levels` values run faster but lose gray-level resolution. For most remote sensing work, 32-64 levels is a good balance.",
    "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "r8ezsmezbt9",
+   "source": "## Ok, how do we apply this to GIS?\n\nThe synthetic quadrants above make the math obvious, but the real payoff is on actual imagery. Water and land have very different spatial textures in satellite data: water is smooth and uniform, land is rough and varied. GLCM features capture that difference even when raw pixel brightness alone is ambiguous (shadows, mudflats, dark pavement).\n\nBelow we'll grab a Sentinel-2 NIR band crop of the San Francisco Bay coastline, compute GLCM texture features, and use KMeans clustering to classify water vs. land — no training labels required.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec79xdunce9",
+   "source": "### Step 1 — Download a Sentinel-2 NIR band\n\nWe read a 500 x 500 pixel window (5 km x 5 km at 10 m resolution) straight from a\nCloud-Optimized GeoTIFF hosted on AWS. The scene is\n[S2B_10SEG_20230921](https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/10/S/EG/2023/9/S2B_10SEG_20230921_0_L2A/),\na near-cloudless capture of the Bay Area from September 2023.\n\nIf the remote file is unreachable, a synthetic coastal raster is used instead\nso the rest of the notebook still runs.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "2hogpjtllw7",
+   "source": "import os, rasterio\nfrom rasterio.windows import Window\n\nos.environ['AWS_NO_SIGN_REQUEST'] = 'YES'\nos.environ['GDAL_DISABLE_READDIR_ON_OPEN'] = 'EMPTY_DIR'\n\nCOG_URL = (\n    'https://sentinel-cogs.s3.us-west-2.amazonaws.com/'\n    'sentinel-s2-l2a-cogs/10/S/EG/2023/9/'\n    'S2B_10SEG_20230921_0_L2A/B08.tif'\n)\n\ntry:\n    with rasterio.open(COG_URL) as src:\n        nir = src.read(1, window=Window(5300, 2100, 500, 500)).astype(np.float64)\n    print(f'Downloaded NIR band: {nir.shape}, range {nir.min():.0f}–{nir.max():.0f}')\nexcept Exception as exc:\n    print(f'Remote read failed ({exc}), using synthetic fallback')\n    rng_sat = np.random.default_rng(99)\n    nir = np.zeros((500, 500), dtype=np.float64)\n    # Water: low reflectance, smooth\n    nir[:, 250:] = rng_sat.normal(80, 10, (500, 250)).clip(20, 200)\n    # Land: high reflectance, rough\n    nir[:, :250] = rng_sat.normal(1800, 400, (500, 250)).clip(300, 4000)\n\nsatellite = xr.DataArray(nir, dims=['y', 'x'])\n\nfig, ax = plt.subplots(figsize=(6, 6))\nax.imshow(nir, cmap='gray', vmin=0, vmax=np.percentile(nir, 98))\nax.set_title('Sentinel-2 NIR — SF Bay coastline')\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "joxz7n8olpc",
+   "source": "### Step 2 — Compute GLCM texture features\n\nWe pick four metrics that tend to separate water (uniform, high energy, high homogeneity) from land (rough, high contrast):\n\n| Metric | Water (smooth) | Land (rough) |\n|---|---|---|\n| contrast | low | high |\n| homogeneity | high | low |\n| energy | high | low |\n| correlation | high (uniform) | lower (varied) |\n\nAn 11 x 11 window gives enough spatial context at 10 m resolution (110 m footprint).",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ytdru4ssilp",
+   "source": "sat_metrics = ['contrast', 'homogeneity', 'energy', 'correlation']\nsat_textures = glcm_texture(satellite, metric=sat_metrics, window_size=11, levels=64)\n\nfig, axes = plt.subplots(1, 4, figsize=(18, 4))\nfor ax, name in zip(axes, sat_metrics):\n    vals = sat_textures.sel(metric=name).values\n    im = ax.imshow(vals, cmap='viridis')\n    ax.set_title(name.capitalize())\n    plt.colorbar(im, ax=ax, shrink=0.7)\nplt.suptitle('GLCM features on satellite NIR', fontsize=13, y=1.02)\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "xya2v71uhb",
+   "source": "### Step 3 — Classify water vs. land with KMeans\n\nWe stack the four texture layers into a feature vector per pixel, normalize them,\nand let KMeans find two clusters. No training labels needed — the texture\ndifference between water and land is large enough that unsupervised clustering\npicks it up cleanly.\n\nAfter clustering, we check which cluster has lower mean NIR reflectance and\nlabel that one \"water.\"",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "keramnxr509",
+   "source": "from sklearn.cluster import KMeans\nfrom sklearn.preprocessing import StandardScaler\n\n# Stack texture features: (H, W, 4)\nfeature_stack = np.stack(\n    [sat_textures.sel(metric=m).values for m in sat_metrics], axis=-1\n)\nh, w, n_feat = feature_stack.shape\n\n# Mask out NaN edges left by the GLCM window\nvalid = ~np.any(np.isnan(feature_stack), axis=-1)\nX = feature_stack[valid]\n\n# Normalize and cluster\nX_scaled = StandardScaler().fit_transform(X)\nlabels = KMeans(n_clusters=2, random_state=42, n_init=10).fit_predict(X_scaled)\n\n# Build the classification raster\nclass_map = np.full((h, w), np.nan)\nclass_map[valid] = labels\n\n# Assign \"water\" to the cluster with lower NIR reflectance\ncluster_nir = [np.nanmean(nir[class_map == c]) for c in range(2)]\nwater_id = int(np.argmin(cluster_nir))\n\nwater_mask = np.where(valid, class_map == water_id, np.nan)\nprint(f'Water cluster: {water_id} (mean NIR {cluster_nir[water_id]:.0f})')\nprint(f'Land  cluster: {1 - water_id} (mean NIR {cluster_nir[1 - water_id]:.0f})')",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "pbro14h2x4",
+   "source": "### Result\n\nLeft: the NIR image (bright = land, dark = water). Right: the GLCM-based classification. The shoreline comes through cleanly without any hand-drawn training polygons.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "q1zl6bioi7e",
+   "source": "from matplotlib.colors import ListedColormap\n\nfig, axes = plt.subplots(1, 2, figsize=(13, 6))\n\n# NIR image\naxes[0].imshow(nir, cmap='gray', vmin=0, vmax=np.percentile(nir, 98))\naxes[0].set_title('Sentinel-2 NIR band')\n\n# Classification\nwater_cmap = ListedColormap(['#2166ac', '#a6611a'])  # blue=water, brown=land\nim = axes[1].imshow(\n    np.where(valid, class_map == water_id, np.nan),\n    cmap=water_cmap, vmin=0, vmax=1, interpolation='nearest',\n)\naxes[1].set_title('GLCM water / land classification')\n\n# Legend\nfrom matplotlib.patches import Patch\naxes[1].legend(\n    handles=[Patch(color='#2166ac', label='Water'),\n             Patch(color='#a6611a', label='Land')],\n    loc='lower right', framealpha=0.9,\n)\n\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "yqyikmy8is",
+   "source": "### Where to go from here\n\n- **More clusters.** Bump `n_clusters` to 3 or 4 to split out mud flats, urban areas, or vegetation.\n- **Supervised classification.** Replace KMeans with a Random Forest trained on labeled polygons for higher accuracy.\n- **Larger areas.** Wrap the satellite DataArray in `dask.array.from_array` and `glcm_texture` will process it chunk-by-chunk, the same way we showed in the Dask section above.\n- **Multi-band features.** Compute GLCM on several spectral bands and stack the results for a richer feature space.",
+   "metadata": {}
   }
  ],
  "metadata": {