updated classification notebook with more reliable data source and rioxarray loading

brendancol · brendancol · commit 173e719b68f1 · 2025-12-19T10:15:48.000-08:00
diff --git a/examples/xarray-spatial_classification-methods.ipynb b/examples/xarray-spatial_classification-methods.ipynb
@@ -5,7 +5,7 @@
    "id": "89bdd2bf",
    "metadata": {},
    "source": [
-    "## Grand Canyon elevation classification using the NASADEM dataset"
+    "## Elevation binning using the NASADEM dataset"
    ]
   },
   {
@@ -22,7 +22,7 @@
     "2. Classifying the data using xarray-spatial's [natural breaks](https://xarray-spatial.readthedocs.io/reference/_autosummary/xrspatial.classify.natural_breaks.html), [equal interval](https://xarray-spatial.readthedocs.io/reference/_autosummary/xrspatial.classify.equal_interval.html), [quantile](https://xarray-spatial.readthedocs.io/reference/_autosummary/xrspatial.classify.quantile.html), and [reclassify](https://xarray-spatial.readthedocs.io/reference/_autosummary/xrspatial.classify.reclassify.html) functions.\n",
     "\n",
     "\n",
-    "This tutorial uses the [NASADEM](https://github.com/microsoft/AIforEarthDatasets#nasadem) dataset from the [Microsoft Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog). The area of interest roughly covers the Grand Canyon National Park. The [NASADEM](https://github.com/microsoft/AIforEarthDatasets#nasadem) dataset provides global topographic data at 1 arc-second (~30m) horizontal resolution. The data is derived primarily from data captured via the [Shuttle Radar Topography Mission](https://www2.jpl.nasa.gov/srtm/) (SRTM) and is stored on Azure Storage in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.\n"
+    "The data is derived primarily from data captured via the [Shuttle Radar Topography Mission](https://www2.jpl.nasa.gov/srtm/) (SRTM) and is stored on Azure Storage in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.\n"
    ]
   },
   {
@@ -33,111 +33,6 @@
     "### 1. Load the area of interest data"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "90ef8812",
-   "metadata": {},
-   "source": [
-    "To load NASADEM data for the Grand Canyon, use the following approach described in [Accessing NASADEM data on Azure (NetCDF)](https://github.com/microsoft/AIforEarthDataSets/blob/main/data/nasadem-nc.ipynb)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d69cc680-4ee5-4ed6-a785-31aa44173618",
-   "metadata": {},
-   "source": [
-    "First, set up the necessary constants and generate a list of all available GeoTIFF files:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4c4c7fae",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import requests\n",
-    "\n",
-    "nasadem_blob_root = \"https://nasademeuwest.blob.core.windows.net/nasadem-cog/v001/\"\n",
-    "nasadem_file_index_url = nasadem_blob_root + \"index/nasadem_cog_list.txt\"\n",
-    "nasadem_content_extension = \".tif\"\n",
-    "nasadem_file_prefix = \"NASADEM_HGT_\"\n",
-    "nasadem_file_list = None\n",
-    "\n",
-    "nasadem_file_list = requests.get(nasadem_file_index_url).text.split(\"\\n\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b257fbfb-889c-4686-a527-2eb9b81c63e3",
-   "metadata": {},
-   "source": [
-    "Next, define a function that selects a filename from the list generated in the previous step. This function accepts a list of latitude and longitude coordinates and returns the name of the file matching these coordinates:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0fc048ca",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import math\n",
-    "\n",
-    "\n",
-    "def get_nasadem_filename(coord):\n",
-    "    \"\"\"\n",
-    "    Get the NASADEM filename for a specified latitude and longitude.\n",
-    "    \"\"\"\n",
-    "    lat = coord[0]\n",
-    "    lon = coord[1]\n",
-    "\n",
-    "    ns_token = \"n\" if lat >= 0 else \"s\"\n",
-    "    ew_token = \"e\" if lon >= 0 else \"w\"\n",
-    "\n",
-    "    lat_index = abs(math.floor(lat))\n",
-    "    lon_index = abs(math.floor(lon))\n",
-    "\n",
-    "    lat_string = ns_token + \"{:02d}\".format(lat_index)\n",
-    "    lon_string = ew_token + \"{:03d}\".format(lon_index)\n",
-    "\n",
-    "    filename = nasadem_file_prefix + lat_string + lon_string + nasadem_content_extension\n",
-    "\n",
-    "    if filename not in nasadem_file_list:\n",
-    "        print(\"Lat/lon {},{} not available\".format(lat, lon))\n",
-    "        filename = None\n",
-    "\n",
-    "    return filename"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c581d16d-c1a8-4cce-ad77-94b9f98ca9da",
-   "metadata": {},
-   "source": [
-    "Finally, use the function defined above to generate a URL pointing to the geodata for the Grand Canyon area:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "074aa0f3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "grand_canyon_coord = [36.101690, -112.107676]\n",
-    "\n",
-    "url = nasadem_blob_root + get_nasadem_filename(grand_canyon_coord)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9149231c",
-   "metadata": {},
-   "source": [
-    "After retrieving the raw data for the Grand Canyon, use xarray's [open_rasterio](http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html) function to load the data into an array:"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -146,8 +41,16 @@
    "outputs": [],
    "source": [
     "import xarray as xr\n",
+    "import rioxarray\n",
+    "\n",
+    "file_name = 'data/colorado_merge_3arc_resamp.tif'\n",
+    "raster = rioxarray.open_rasterio(file_name).sel(band=1).drop_vars('band')\n",
+    "raster.name = 'Colorado Elevation Raster'\n",
     "\n",
-    "img_arr = xr.open_rasterio(url).squeeze().drop(\"band\")"
+    "xmin, xmax = raster.x.data.min(), raster.x.data.max()\n",
+    "ymin, ymax = raster.y.data.min(), raster.y.data.max()\n",
+    "\n",
+    "xmin, xmax, ymin, ymax"
    ]
   },
   {
@@ -157,7 +60,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "img_arr.plot.imshow(figsize=(15, 10));"
+    "raster.plot.imshow(figsize=(15, 10));"
    ]
   },
   {
@@ -195,7 +98,7 @@
     "import matplotlib.pyplot as plt\n",
     "from xrspatial.classify import natural_breaks\n",
     "\n",
-    "natural_breaks_agg = natural_breaks(img_arr, num_sample=20000, k=15)\n",
+    "natural_breaks_agg = natural_breaks(raster, num_sample=20000, k=15)\n",
     "\n",
     "shade(natural_breaks_agg, cmap=plt.get_cmap(\"terrain\"), how=\"linear\")"
    ]
@@ -233,7 +136,7 @@
    "source": [
     "from xrspatial.classify import equal_interval\n",
     "\n",
-    "equal_interval_agg = equal_interval(img_arr, k=15)\n",
+    "equal_interval_agg = equal_interval(raster, k=15)\n",
     "\n",
     "shade(equal_interval_agg, cmap=plt.get_cmap(\"terrain\"), how=\"linear\")"
    ]
@@ -263,7 +166,7 @@
    "source": [
     "from xrspatial.classify import quantile\n",
     "\n",
-    "quantile_agg = quantile(img_arr, k=15)\n",
+    "quantile_agg = quantile(raster, k=15)\n",
     "\n",
     "shade(quantile_agg, cmap=plt.get_cmap(\"terrain\"), how=\"linear\")"
    ]
@@ -297,60 +200,32 @@
     "new_vals = [val if val > 2500 else 0 for val in bins]\n",
     "\n",
     "reclass_agg = reclassify(\n",
-    "    agg=img_arr,\n",
+    "    agg=raster,\n",
     "    bins=bins,\n",
     "    new_values=new_vals,\n",
     ")\n",
     "\n",
     "shade(reclass_agg, cmap=plt.get_cmap(\"terrain\"), how=\"linear\")"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bd7822fe",
-   "metadata": {},
-   "source": [
-    "## Next steps: classify different datasets"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cea4d59c",
-   "metadata": {},
-   "source": [
-    "The [Microsoft Planetary Computer Data Catalog](https://planetarycomputer.microsoft.com/catalog) includes petabytes of environmental monitoring data. All data sets are available in consistent, analysis-ready formats. You can access them through APIs as well as directly via [Azure Storage](https://docs.microsoft.com/en-us/azure/storage/). \n",
-    "\n",
-    "Try using [xarray-spatial's](https://xarray-spatial.readthedocs.io/index.html) classification methods with these datasets:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5bb5b5a0",
-   "metadata": {},
-   "source": [
-    "<div style=\"width: 100%; overflow: hidden;\">\n",
-    "     <div style=\"width: 50%; float: left;\"> \n",
-    "  \n",
-    "  <center><img src=\"https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/additional_datasets/RE3CbUs.jpg\" /></center>\n",
-    "<br>\n",
-    "<center><font size=\"5\">Daymet</font>\n",
-    "<center><font size=\"2\">Gridded temperature data across North America</font>\n",
-    "<center><a href=\"http://aka.ms/ai4edata-daymet\" target=\"_blank\">Get Daymet temperature data</a>\n",
-    "  </div>\n",
-    "     <div style=\"margin-left: 50%;\"> \n",
-    "  <center><img src=\"https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/additional_datasets/gbif.jpg\" /><center>\n",
-    "<br>\n",
-    "<center><font size=\"5\">GBIF</font>\n",
-    "<center><font size=\"2\">Species occurrences shared through the Global Biodiversity Information Facility</font>\n",
-    "<center><a href=\"http://aka.ms/ai4edata-gbif/\" target=\"_blank\">Get GBIF occurrence data</a>\n",
-    "  </div>\n",
-    "</div>"
-   ]
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.14.2"
   }
  },
  "nbformat": 4,