|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "<!--\n", |
| 8 | + " Licensed to the Apache Software Foundation (ASF) under one\n", |
| 9 | + " or more contributor license agreements. See the NOTICE file\n", |
| 10 | + " distributed with this work for additional information\n", |
| 11 | + " regarding copyright ownership. The ASF licenses this file\n", |
| 12 | + " to you under the Apache License, Version 2.0 (the\n", |
| 13 | + " \"License\"); you may not use this file except in compliance\n", |
| 14 | + " with the License. You may obtain a copy of the License at\n", |
| 15 | + "\n", |
| 16 | + " http://www.apache.org/licenses/LICENSE-2.0\n", |
| 17 | + "\n", |
| 18 | + " Unless required by applicable law or agreed to in writing,\n", |
| 19 | + " software distributed under the License is distributed on an\n", |
| 20 | + " \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n", |
| 21 | + " KIND, either express or implied. See the License for the\n", |
| 22 | + " specific language governing permissions and limitations\n", |
| 23 | + " under the License.\n", |
| 24 | + "-->\n", |
| 25 | + "\n", |
| 26 | + "# Did this farmland green up?\n", |
| 27 | + "\n", |
| 28 | + "A complete raster pipeline. We answer:\n", |
| 29 | + "\n", |
| 30 | + "> **Between two satellite scenes a season apart, which farm parcels in this AOI greened up the most?**\n", |
| 31 | + "\n", |
| 32 | + "End-to-end on Sedona's raster surface: load two GeoTIFFs through the new tiling raster data source, compute NDVI per scene with `RS_MapAlgebra`, take their difference for a \"greening\" delta raster, run `RS_ZonalStats` per parcel, rank, clip the delta raster to the top parcel with `RS_Clip`, round-trip through `RS_AsCOG` to prove the result is a valid Cloud-Optimized GeoTIFF, and visualize the four stages as a single matplotlib figure.\n", |
| 33 | + "\n", |
| 34 | + "The two source scenes are **synthesized** at the top of the notebook (256 \u00d7 256 px, 2 bands each: red + NIR, EPSG:4326) so the notebook is fully reproducible and ships no new bytes. The same code path runs unchanged against real Sentinel-2 chips \u2014 only the raster filenames change.\n", |
| 35 | + "\n", |
| 36 | + "**Requires Sedona \u2265 1.9.0.** The `raster` data source (auto-tiling GeoTIFF reader, GH-2672) and `RS_AsCOG` (GH-2652) land in 1.9.0; the notebook will fail on older versions of the docker image." |
| 37 | + ] |
| 38 | + }, |
| 39 | + { |
| 40 | + "cell_type": "markdown", |
| 41 | + "metadata": {}, |
| 42 | + "source": [ |
| 43 | + "## 1. Connect to Spark through SedonaContext" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "code", |
| 48 | + "execution_count": null, |
| 49 | + "metadata": {}, |
| 50 | + "outputs": [], |
| 51 | + "source": [ |
| 52 | + "from sedona.spark import SedonaContext\n", |
| 53 | + "\n", |
| 54 | + "config = SedonaContext.builder().master(\"spark://localhost:7077\").getOrCreate()\n", |
| 55 | + "sedona = SedonaContext.create(config)" |
| 56 | + ] |
| 57 | + }, |
| 58 | + { |
| 59 | + "cell_type": "markdown", |
| 60 | + "metadata": {}, |
| 61 | + "source": [ |
| 62 | + "## 2. Synthesize two scenes a season apart\n", |
| 63 | + "\n", |
| 64 | + "We build two 256\u00d7256, 2-band GeoTIFFs in `/tmp` representing red and near-infrared reflectance over the same AOI. The \"before\" scene is mostly bare; the \"after\" scene has a circular field of vegetation in the south-west corner with elevated NIR. Pixel values are scaled to the same 0-10000 reflectance range as Sentinel-2 surface-reflectance products so the NDVI math is identical to the real-world case." |
| 65 | + ] |
| 66 | + }, |
| 67 | + { |
| 68 | + "cell_type": "code", |
| 69 | + "execution_count": null, |
| 70 | + "metadata": {}, |
| 71 | + "outputs": [], |
| 72 | + "source": "import os\n\nimport numpy as np\nimport rasterio\nfrom rasterio.transform import from_bounds\n\nWORK = \"/tmp/veg-change\"\nos.makedirs(WORK, exist_ok=True)\n\nAOI = (-91.10, 41.50, -91.00, 41.60) # (xmin, ymin, xmax, ymax) in EPSG:4326\nW = H = 256\ntransform = from_bounds(*AOI, W, H)\nrng = np.random.default_rng(42)\n\nys, xs = np.mgrid[0:H, 0:W]\nfield_mask = ((xs - 64) ** 2 + (ys - 192) ** 2) < 50**2 # circular field\n\n\ndef synth_scene(green_strength):\n red = (1500 + 200 * rng.standard_normal((H, W))).clip(0, 10000)\n nir = (1800 + 200 * rng.standard_normal((H, W))).clip(0, 10000)\n nir = np.where(field_mask, nir + green_strength, nir)\n return red.astype(\"uint16\"), nir.astype(\"uint16\")\n\n\nfor name, strength in [(\"before\", 200), (\"after\", 4000)]:\n red, nir = synth_scene(strength)\n with rasterio.open(\n f\"{WORK}/scene_{name}.tif\",\n \"w\",\n driver=\"GTiff\",\n tiled=True,\n blockxsize=256,\n blockysize=256,\n height=H,\n width=W,\n count=2,\n dtype=\"uint16\",\n crs=\"EPSG:4326\",\n transform=transform,\n ) as dst:\n dst.write(red, 1)\n dst.write(nir, 2)\n dst.set_band_description(1, \"red\")\n dst.set_band_description(2, \"nir\")\n\nfor f in (f\"{WORK}/scene_before.tif\", f\"{WORK}/scene_after.tif\"):\n print(f\"{f}: {os.path.getsize(f)} bytes\")" |
| 73 | + }, |
| 74 | + { |
| 75 | + "cell_type": "markdown", |
| 76 | + "metadata": {}, |
| 77 | + "source": "## 3. Load both scenes with the `raster` data source\n\n`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and yields one `Raster`-typed row per tile. Each row carries `(x, y)` tile-index columns \u2014 keep them; we'll join on those when computing \u0394NDVI so the same query works for single-tile inputs (as here) and for multi-gigabyte rasters that yield thousands of tiles per scene." |
| 78 | + }, |
| 79 | + { |
| 80 | + "cell_type": "code", |
| 81 | + "execution_count": null, |
| 82 | + "metadata": {}, |
| 83 | + "outputs": [], |
| 84 | + "source": "scenes = (\n sedona.read.format(\"raster\")\n .load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0] as scene\", \"x\", \"y\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)" |
| 85 | + }, |
| 86 | + { |
| 87 | + "cell_type": "markdown", |
| 88 | + "metadata": {}, |
| 89 | + "source": [ |
| 90 | + "## 4. Compute NDVI per scene with `RS_MapAlgebra`\n", |
| 91 | + "\n", |
| 92 | + "`RS_MapAlgebra(raster, pixelType, script)` runs a single-raster pixel script. The script syntax (`out[0] = ...; rast[i]` indexing) is documented [here](../api/sql/Raster-map-algebra.md). We coerce the result to `D` (double) so the negative side of NDVI survives." |
| 93 | + ] |
| 94 | + }, |
| 95 | + { |
| 96 | + "cell_type": "code", |
| 97 | + "execution_count": null, |
| 98 | + "metadata": {}, |
| 99 | + "outputs": [], |
| 100 | + "source": "scenes.createOrReplaceTempView(\"scenes\")\n\nndvi = sedona.sql(\"\"\"\n SELECT scene, x, y,\n RS_MapAlgebra(\n rast, 'D',\n 'out[0] = (rast[1] - rast[0]) / (rast[1] + rast[0] + 0.000001);'\n ) AS rast\n FROM scenes\n\"\"\")\nndvi.cache()\nndvi.createOrReplaceTempView(\"ndvi\")\nndvi.selectExpr(\"scene\", \"x\", \"y\", \"RS_BandPixelType(rast, 1) as dtype\").show()" |
| 101 | + }, |
| 102 | + { |
| 103 | + "cell_type": "markdown", |
| 104 | + "metadata": {}, |
| 105 | + "source": "## 5. Compute the greening delta raster\n\nTwo-raster `RS_MapAlgebra(rast0, rast1, pixelType, script, noDataValue)` subtracts the before-NDVI from the after-NDVI in a single pass. We join the two NDVI dataframes on the tile coordinates `(x, y)` so the same query works whether each scene produced a single tile (as here) or thousands." |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "code", |
| 109 | + "execution_count": null, |
| 110 | + "metadata": {}, |
| 111 | + "outputs": [], |
| 112 | + "source": "delta = sedona.sql(\"\"\"\n SELECT a.x, a.y,\n RS_MapAlgebra(\n a.rast, b.rast, 'D',\n 'out[0] = rast0[0] - rast1[0];',\n -9999.0\n ) AS rast\n FROM (SELECT x, y, rast FROM ndvi WHERE scene = 'scene_after') a\n JOIN (SELECT x, y, rast FROM ndvi WHERE scene = 'scene_before') b\n ON a.x = b.x AND a.y = b.y\n\"\"\")\ndelta.cache()\ndelta.createOrReplaceTempView(\"delta\")\ndelta.selectExpr(\n \"x\",\n \"y\",\n \"RS_BandPixelType(rast, 1) as dtype\",\n \"RS_Width(rast) as w\",\n \"RS_Height(rast) as h\",\n).show()" |
| 113 | + }, |
| 114 | + { |
| 115 | + "cell_type": "markdown", |
| 116 | + "metadata": {}, |
| 117 | + "source": "## 6. Synthesize parcel polygons and run `RS_ZonalStats`\n\nBuild a 4\u00d74 grid of square parcels covering the AOI and compute the mean \u0394NDVI per parcel. `RS_ZonalStats(raster, zone, statType)` is the canonical raster\u2192vector aggregation: every pixel inside `zone` contributes to the statistic.\n\nFor tiled inputs, the cross-join below produces one row per `(parcel, tile)`; with our single-tile delta it collapses to one row per parcel. To handle truly tiled inputs robustly you would compute `RS_ZonalStats(rast, geom, 'sum')` and `'count'` per tile and aggregate `SUM(sum) / SUM(count)` per parcel \u2014 same idiom, one extra `GROUP BY`." |
| 118 | + }, |
| 119 | + { |
| 120 | + "cell_type": "code", |
| 121 | + "execution_count": null, |
| 122 | + "metadata": {}, |
| 123 | + "outputs": [], |
| 124 | + "source": [ |
| 125 | + "from pyspark.sql import Row\n", |
| 126 | + "\n", |
| 127 | + "xmin, ymin, xmax, ymax = AOI\n", |
| 128 | + "step_x = (xmax - xmin) / 4\n", |
| 129 | + "step_y = (ymax - ymin) / 4\n", |
| 130 | + "parcel_rows = []\n", |
| 131 | + "for ix in range(4):\n", |
| 132 | + " for iy in range(4):\n", |
| 133 | + " x0, x1 = xmin + ix * step_x, xmin + (ix + 1) * step_x\n", |
| 134 | + " y0, y1 = ymin + iy * step_y, ymin + (iy + 1) * step_y\n", |
| 135 | + " wkt = f\"POLYGON(({x0} {y0}, {x1} {y0}, {x1} {y1}, {x0} {y1}, {x0} {y0}))\"\n", |
| 136 | + " parcel_rows.append(Row(parcel_id=f\"P{ix}{iy}\", wkt=wkt))\n", |
| 137 | + "\n", |
| 138 | + "parcels = sedona.createDataFrame(parcel_rows).selectExpr(\n", |
| 139 | + " \"parcel_id\", \"ST_GeomFromText(wkt) as geom\"\n", |
| 140 | + ")\n", |
| 141 | + "parcels.createOrReplaceTempView(\"parcels\")\n", |
| 142 | + "\n", |
| 143 | + "ranked = sedona.sql(\"\"\"\n", |
| 144 | + " SELECT p.parcel_id,\n", |
| 145 | + " ROUND(RS_ZonalStats(d.rast, p.geom, 'mean'), 4) AS mean_delta_ndvi\n", |
| 146 | + " FROM parcels p, delta d\n", |
| 147 | + " ORDER BY mean_delta_ndvi DESC\n", |
| 148 | + "\"\"\")\n", |
| 149 | + "ranked.show()" |
| 150 | + ] |
| 151 | + }, |
| 152 | + { |
| 153 | + "cell_type": "markdown", |
| 154 | + "metadata": {}, |
| 155 | + "source": [ |
| 156 | + "## 7. Clip the delta raster to the top-greening parcel\n", |
| 157 | + "\n", |
| 158 | + "`RS_Clip(raster, band, geom)` crops the raster to the polygon's extent. We pick the parcel with the highest mean \u0394NDVI and zoom in on its delta values." |
| 159 | + ] |
| 160 | + }, |
| 161 | + { |
| 162 | + "cell_type": "code", |
| 163 | + "execution_count": null, |
| 164 | + "metadata": {}, |
| 165 | + "outputs": [], |
| 166 | + "source": [ |
| 167 | + "top_id = ranked.first()[\"parcel_id\"]\n", |
| 168 | + "print(f\"top-greening parcel: {top_id}\")\n", |
| 169 | + "\n", |
| 170 | + "clipped = sedona.sql(f\"\"\"\n", |
| 171 | + " SELECT RS_Clip(d.rast, 1, p.geom) AS rast\n", |
| 172 | + " FROM delta d, parcels p\n", |
| 173 | + " WHERE p.parcel_id = '{top_id}'\n", |
| 174 | + "\"\"\")\n", |
| 175 | + "clipped.cache()\n", |
| 176 | + "clipped.createOrReplaceTempView(\"clipped\")\n", |
| 177 | + "clipped.selectExpr(\n", |
| 178 | + " \"RS_Width(rast) as w\",\n", |
| 179 | + " \"RS_Height(rast) as h\",\n", |
| 180 | + ").show()" |
| 181 | + ] |
| 182 | + }, |
| 183 | + { |
| 184 | + "cell_type": "markdown", |
| 185 | + "metadata": {}, |
| 186 | + "source": [ |
| 187 | + "## 8. Round-trip through `RS_AsCOG`\n", |
| 188 | + "\n", |
| 189 | + "`RS_AsCOG` returns the raster as a binary Cloud-Optimized GeoTIFF byte array. We persist it to disk, then read it back with the same `raster` data source we used to load the input \u2014 proving the output is a valid GeoTIFF that downstream tools can stream from object storage." |
| 190 | + ] |
| 191 | + }, |
| 192 | + { |
| 193 | + "cell_type": "code", |
| 194 | + "execution_count": null, |
| 195 | + "metadata": {}, |
| 196 | + "outputs": [], |
| 197 | + "source": "cog_bytes = clipped.selectExpr(\"RS_AsCOG(rast) as cog\").first()[\"cog\"]\nwith open(f\"{WORK}/delta_topparcel_cog.tif\", \"wb\") as fh:\n fh.write(cog_bytes)\nprint(f\"COG size: {len(cog_bytes):,} bytes\")\n\nroundtrip = sedona.read.format(\"raster\").load(f\"{WORK}/delta_topparcel_cog.tif\")\nroundtrip.selectExpr(\n \"RS_Width(rast) as w\",\n \"RS_Height(rast) as h\",\n \"RS_BandPixelType(rast, 1) as dtype\",\n).show()" |
| 198 | + }, |
| 199 | + { |
| 200 | + "cell_type": "markdown", |
| 201 | + "metadata": {}, |
| 202 | + "source": [ |
| 203 | + "## 9. Visualize the pipeline as a 4-panel figure\n", |
| 204 | + "\n", |
| 205 | + "Read the rasters back via `rasterio` and lay them out side by side: NDVI before, NDVI after, \u0394NDVI across the full AOI with parcel boundaries overlaid, and the top-parcel close-up." |
| 206 | + ] |
| 207 | + }, |
| 208 | + { |
| 209 | + "cell_type": "code", |
| 210 | + "execution_count": null, |
| 211 | + "metadata": {}, |
| 212 | + "outputs": [], |
| 213 | + "source": "import matplotlib.patches as mpatches\nimport matplotlib.pyplot as plt\n\n\ndef ndvi_arr(path):\n with rasterio.open(path) as ds:\n red = ds.read(1).astype(\"float32\")\n nir = ds.read(2).astype(\"float32\")\n return (nir - red) / (nir + red + 1e-6)\n\n\nndvi_before = ndvi_arr(f\"{WORK}/scene_before.tif\")\nndvi_after = ndvi_arr(f\"{WORK}/scene_after.tif\")\ndelta_arr = ndvi_after - ndvi_before\nwith rasterio.open(f\"{WORK}/delta_topparcel_cog.tif\") as ds:\n top_arr = ds.read(1)\n top_extent = (ds.bounds.left, ds.bounds.right, ds.bounds.bottom, ds.bounds.top)\n\nfig, axes = plt.subplots(1, 4, figsize=(16, 4))\nextent = (AOI[0], AOI[2], AOI[1], AOI[3])\naxes[0].imshow(ndvi_before, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[0].set_title(\"NDVI before\")\naxes[1].imshow(ndvi_after, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[1].set_title(\"NDVI after\")\naxes[2].imshow(delta_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=extent)\naxes[2].set_title(\"\u0394NDVI (after \u2212 before) with parcel grid\")\nxmin, ymin, xmax, ymax = AOI\nstep_x = (xmax - xmin) / 4\nstep_y = (ymax - ymin) / 4\nfor ix in range(4):\n for iy in range(4):\n x0 = xmin + ix * step_x\n y0 = ymin + iy * step_y\n axes[2].add_patch(\n mpatches.Rectangle(\n (x0, y0), step_x, step_y, fill=False, edgecolor=\"black\", linewidth=0.6\n )\n )\n axes[2].text(\n x0 + step_x / 2,\n y0 + step_y / 2,\n f\"P{ix}{iy}\",\n ha=\"center\",\n va=\"center\",\n fontsize=7,\n color=\"black\",\n )\naxes[3].imshow(top_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=top_extent)\naxes[3].set_title(f\"Top parcel ({top_id}) \u0394NDVI\")\nfor ax in axes:\n ax.set_xticks([])\n ax.set_yticks([])\nfig.tight_layout()\nfig" |
| 214 | + }, |
| 215 | + { |
| 216 | + "cell_type": "markdown", |
| 217 | + "metadata": {}, |
| 218 | + "source": [ |
| 219 | + "## What just happened?\n", |
| 220 | + "\n", |
| 221 | + "We turned two synthetic Sentinel-2-style scenes into a ranked parcel-level greening report in nine steps:\n", |
| 222 | + "\n", |
| 223 | + "1. Synthesized two 256\u00d7256 red+NIR GeoTIFFs in `/tmp`.\n", |
| 224 | + "2. Loaded both with `sedona.read.format(\"raster\")` \u2014 the new auto-tiling reader handles GeoTIFFs of any size with the same call shape.\n", |
| 225 | + "3. Computed per-pixel NDVI per scene with single-raster `RS_MapAlgebra`.\n", |
| 226 | + "4. Subtracted the two NDVI rasters with two-raster `RS_MapAlgebra` to get a \u0394NDVI \"greening\" layer.\n", |
| 227 | + "5. Synthesized a 4\u00d74 parcel grid and ranked parcels by mean \u0394NDVI via `RS_ZonalStats`.\n", |
| 228 | + "6. Clipped the delta raster to the top-greening parcel with `RS_Clip`.\n", |
| 229 | + "7. Encoded the result as a Cloud-Optimized GeoTIFF with `RS_AsCOG` and read it back with the `raster` data source \u2014 proving the output is valid for cloud-hosted streaming.\n", |
| 230 | + "8. Plotted the four pipeline stages side by side.\n", |
| 231 | + "\n", |
| 232 | + "Swap `/tmp/scene_*.tif` for a glob over real Sentinel-2 chips and the same SQL runs on a cluster against terabytes of imagery." |
| 233 | + ] |
| 234 | + } |
| 235 | + ], |
| 236 | + "metadata": { |
| 237 | + "kernelspec": { |
| 238 | + "display_name": "Python 3", |
| 239 | + "language": "python", |
| 240 | + "name": "python3" |
| 241 | + }, |
| 242 | + "language_info": { |
| 243 | + "codemirror_mode": { |
| 244 | + "name": "ipython", |
| 245 | + "version": 3 |
| 246 | + }, |
| 247 | + "file_extension": ".py", |
| 248 | + "mimetype": "text/x-python", |
| 249 | + "name": "python", |
| 250 | + "nbconvert_exporter": "python", |
| 251 | + "pygments_lexer": "ipython3", |
| 252 | + "version": "3.11" |
| 253 | + } |
| 254 | + }, |
| 255 | + "nbformat": 4, |
| 256 | + "nbformat_minor": 4 |
| 257 | +} |
0 commit comments