From 1783f2c88dfb8b30d028d28016a15c2c76b404d3 Mon Sep 17 00:00:00 2001
From: thodson-usgs <thodson@usgs.gov>
Date: Wed, 27 May 2026 08:08:34 -0500
Subject: [PATCH 1/2] docs: add Python ports of the new USGS Water Data API
 vignettes

Port five new R dataRetrieval Water Data API vignettes to the Python
`waterdata` module as executable demo notebooks, wired into the Sphinx
docs under a new "USGS Water Data API vignettes" section:

- USGS_WaterData_Introduction_Examples  (read_waterdata_functions.Rmd)
- USGS_WaterData_DiscreteSamples_Examples  (samples_data.Rmd)
- USGS_WaterData_DailyStatistics_Examples  (daily_data_statistics.Rmd)
- USGS_WaterData_ContinuousData_Examples  (continuous_pr.Rmd)
- USGS_WaterData_ReferenceLists_Examples  (Reference_Lists.Rmd)

Each notebook was executed end-to-end against the live USGS Water Data
API during development; outputs are cleared per the repo convention
(the Sphinx docs build re-executes notebooks at build time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ...GS_WaterData_ContinuousData_Examples.ipynb | 257 +++++++
 ...S_WaterData_DailyStatistics_Examples.ipynb | 437 ++++++++++++
 ...S_WaterData_DiscreteSamples_Examples.ipynb | 541 ++++++++++++++
 ...USGS_WaterData_Introduction_Examples.ipynb | 660 ++++++++++++++++++
 ...GS_WaterData_ReferenceLists_Examples.ipynb | 138 ++++
 ...S_WaterData_ContinuousData_Examples.nblink |   3 +
 ..._WaterData_DailyStatistics_Examples.nblink |   3 +
 ..._WaterData_DiscreteSamples_Examples.nblink |   3 +
 ...SGS_WaterData_Introduction_Examples.nblink |   3 +
 ...S_WaterData_ReferenceLists_Examples.nblink |   3 +
 docs/source/examples/index.rst                |  17 +
 11 files changed, 2065 insertions(+)
 create mode 100644 demos/USGS_WaterData_ContinuousData_Examples.ipynb
 create mode 100644 demos/USGS_WaterData_DailyStatistics_Examples.ipynb
 create mode 100644 demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
 create mode 100644 demos/USGS_WaterData_Introduction_Examples.ipynb
 create mode 100644 demos/USGS_WaterData_ReferenceLists_Examples.ipynb
 create mode 100644 docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink
 create mode 100644 docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink
 create mode 100644 docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink
 create mode 100644 docs/source/examples/USGS_WaterData_Introduction_Examples.nblink
 create mode 100644 docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink

diff --git a/demos/USGS_WaterData_ContinuousData_Examples.ipynb b/demos/USGS_WaterData_ContinuousData_Examples.ipynb
new file mode 100644
index 00000000..735e5439
--- /dev/null
+++ b/demos/USGS_WaterData_ContinuousData_Examples.ipynb
@@ -0,0 +1,257 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d664492b",
+   "metadata": {},
+   "source": [
+    "# Continuous Data\n",
+    "\n",
+    "Continuous data are collected by automated sensors, typically at a fixed\n",
+    "15-minute interval (you may also hear them called \"instantaneous values\" or\n",
+    "\"IV\"). They are described by parameter name and parameter code, and retrieved\n",
+    "with `get_continuous`.\n",
+    "\n",
+    "This notebook covers the two things that matter when a continuous pull gets\n",
+    "large: `dataretrieval` **chunks big requests for you** and can **resume** a pull\n",
+    "that was interrupted partway through, and the one case you still handle yourself\n",
+    "— the service's 3-year-per-request time limit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7e06e81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "from dataretrieval import waterdata\n",
+    "\n",
+    "site = \"USGS-0208458892\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0136bd1",
+   "metadata": {},
+   "source": [
+    "## What continuous data are available?\n",
+    "\n",
+    "Filter the combined metadata to `data_type=\"Continuous values\"` to see which\n",
+    "time series a site offers and how far back each goes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f8a9d87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "continuous_available, _ = waterdata.get_combined_metadata(\n",
+    "    monitoring_location_id=site,\n",
+    "    data_type=\"Continuous values\",\n",
+    ")\n",
+    "avail = continuous_available[[\"parameter_code\", \"parameter_name\", \"begin\", \"end\"]]\n",
+    "avail.sort_values(\"parameter_code\").reset_index(drop=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdaa8150",
+   "metadata": {},
+   "source": [
+    "## Large requests are chunked for you\n",
+    "\n",
+    "Any list-valued argument — a long list of monitoring locations, several parameter\n",
+    "codes, a complex CQL filter — can push a single request URL past the server's\n",
+    "~8 KB limit. `dataretrieval` handles this automatically: it splits the query into\n",
+    "URL-sized sub-requests, issues them, and recombines (and de-duplicates) the\n",
+    "results into one frame. **You never need to loop over sites yourself** — request\n",
+    "everything in one call.\n",
+    "\n",
+    "For example, asking for several parameter codes at once just returns one combined\n",
+    "long-format frame:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bc05102",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multi, _ = waterdata.get_continuous(\n",
+    "    monitoring_location_id=site,\n",
+    "    parameter_code=[\"00095\", \"00010\"],  # specific conductance + water temperature\n",
+    "    time=\"2024-07-01/2024-07-02\",\n",
+    ")\n",
+    "multi.groupby(\"parameter_code\")[\"value\"].agg([\"count\", \"min\", \"max\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353ad4ec",
+   "metadata": {},
+   "source": [
+    "## Resilient pulls: resume after an interruption\n",
+    "\n",
+    "A large request becomes many sub-requests under the hood, so a long pull can be\n",
+    "interrupted partway through by a rate limit (HTTP 429) or a transient server\n",
+    "error (HTTP 5xx). Rather than discard the work already done, `dataretrieval`\n",
+    "raises a `ChunkInterrupted` that **preserves the completed sub-requests** and\n",
+    "lets you continue:\n",
+    "\n",
+    "- `QuotaExhausted` (429) and `ServiceInterrupted` (5xx) both subclass\n",
+    "  `ChunkInterrupted`.\n",
+    "- `exc.partial_frame` holds whatever completed before the failure.\n",
+    "- `exc.retry_after` is the server's suggested wait (when provided).\n",
+    "- `exc.call.resume()` re-issues **only the still-pending** sub-requests and\n",
+    "  returns the full `(data, metadata)`.\n",
+    "\n",
+    "The pattern below waits out the interruption and resumes until the pull\n",
+    "finishes. (In normal conditions the request completes on the first try and the\n",
+    "`except` block never runs.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e2e9ddff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "from dataretrieval.waterdata.chunking import ChunkInterrupted\n",
+    "\n",
+    "try:\n",
+    "    sensor_data, _ = waterdata.get_continuous(\n",
+    "        monitoring_location_id=site,\n",
+    "        parameter_code=\"00095\",\n",
+    "        time=\"2024-07-01/2024-07-08\",\n",
+    "    )\n",
+    "except ChunkInterrupted as exc:\n",
+    "    print(\n",
+    "        f\"interrupted after {exc.completed_chunks}/{exc.total_chunks} chunks; resuming\"\n",
+    "    )\n",
+    "    while True:\n",
+    "        time.sleep(exc.retry_after or 5 * 60)  # honor Retry-After, else back off\n",
+    "        try:\n",
+    "            sensor_data, _ = exc.call.resume()\n",
+    "            break\n",
+    "        except ChunkInterrupted as again:\n",
+    "            exc = again\n",
+    "\n",
+    "print(f\"{len(sensor_data):,} rows\")\n",
+    "sensor_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "397e87b5",
+   "metadata": {},
+   "source": [
+    "## The 3-year window: the one axis you split yourself\n",
+    "\n",
+    "There is one limit the library does **not** chunk for you: the continuous service\n",
+    "returns at most **3 years of data per request**, and a time window is not a\n",
+    "list-shaped axis it can fan out. (With no `time` argument the service returns the\n",
+    "latest year; continuous data also has no geometry column and ignores bounding-box\n",
+    "queries.)\n",
+    "\n",
+    "So a multi-year, single-site pull is the one place you still split by time. The\n",
+    "service is most efficient one calendar year at a time, so build a list of yearly\n",
+    "windows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bd26d199",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Split [start, end] into per-calendar-year (start, end) date strings.\n",
+    "def year_chunks(start, end):\n",
+    "    start, end = pd.Timestamp(start), pd.Timestamp(end)\n",
+    "    edges = pd.to_datetime([f\"{y}-01-01\" for y in range(start.year + 1, end.year + 1)])\n",
+    "    starts = [start, *edges]\n",
+    "    ends = [*(edges - pd.Timedelta(days=1)), end]\n",
+    "    return [\n",
+    "        (s.strftime(\"%Y-%m-%d\"), e.strftime(\"%Y-%m-%d\")) for s, e in zip(starts, ends)\n",
+    "    ]\n",
+    "\n",
+    "\n",
+    "# Covering a full multi-year record (no data downloaded here):\n",
+    "pd.DataFrame(year_chunks(\"2012-10-01\", \"2025-09-30\"), columns=[\"start\", \"end\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc4f40f",
+   "metadata": {},
+   "source": [
+    "Then request each window and concatenate. (We use a short two-window span here so\n",
+    "the notebook runs quickly; widen the dates for a full period of record.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01ebb4a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chunks = year_chunks(\"2023-10-01\", \"2024-03-31\")\n",
+    "\n",
+    "frames = []\n",
+    "for start, end in chunks:\n",
+    "    part, _ = waterdata.get_continuous(\n",
+    "        monitoring_location_id=site,\n",
+    "        parameter_code=\"00095\",\n",
+    "        time=f\"{start}/{end}\",\n",
+    "    )\n",
+    "    frames.append(part)\n",
+    "\n",
+    "por = pd.concat(frames, ignore_index=True)\n",
+    "print(\n",
+    "    f\"{len(por):,} rows from {len(chunks)} windows, \"\n",
+    "    f\"{por['time'].min()} -> {por['time'].max()}\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2487bf4",
+   "metadata": {},
+   "source": [
+    "Wrap each window's call in the resume pattern above for an unattended,\n",
+    "restart-safe pull. USGS also expects to offer a direct full-period-of-record\n",
+    "download before the legacy NWIS services are decommissioned, which may make\n",
+    "time-window splitting unnecessary — check the documentation for updates.\n",
+    "\n",
+    "## More help\n",
+    "\n",
+    "- Documentation: <https://doi-usgs.github.io/dataretrieval-python/>\n",
+    "- Chunking and resume internals: `dataretrieval.waterdata.chunking`\n",
+    "- Issues / questions: <https://github.com/DOI-USGS/dataretrieval-python/issues>\n",
+    "- Equivalent R article: [Continuous Data](https://doi-usgs.github.io/dataRetrieval/articles/continuous_pr.html)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_DailyStatistics_Examples.ipynb b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
new file mode 100644
index 00000000..f35f52c9
--- /dev/null
+++ b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
@@ -0,0 +1,437 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "fe73969b",
+   "metadata": {},
+   "source": [
+    "# Daily statistics: `get_stats_por` and `get_stats_date_range`\n",
+    "\n",
+    "`get_stats_por` and `get_stats_date_range` return pre-computed temporal\n",
+    "statistics from the [modernized statistics API](https://api.waterdata.usgs.gov/statistics/v0/docs),\n",
+    "the modern replacement for the legacy NWIS statistics service. The two functions wrap\n",
+    "endpoints that look similar but answer different questions:\n",
+    "\n",
+    "| Function | API endpoint | Returns |\n",
+    "| --- | --- | --- |\n",
+    "| `get_stats_por` | `observationNormals` | day-of-year and month-of-year statistics across the period of record |\n",
+    "| `get_stats_date_range` | `observationIntervals` | monthly and annual statistics within a requested date range |\n",
+    "\n",
+    "A couple of usage notes:\n",
+    "\n",
+    "- Pass `computation_type=` to choose the statistic — `arithmetic_mean`,\n",
+    "  `median`, `minimum`, `maximum`, or `percentile`.\n",
+    "- There is no dedicated argument to return only day-of-year vs. month-of-year\n",
+    "  (or only calendar vs. water year), so filter the returned `time_of_year_type`\n",
+    "  / `interval_type` column in pandas, as shown below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d6ab1ce4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.dates as mdates\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "\n",
+    "from dataretrieval import waterdata\n",
+    "\n",
+    "%matplotlib inline\n",
+    "\n",
+    "site1 = \"USGS-02037500\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf8868ae",
+   "metadata": {},
+   "source": [
+    "## Fetching day-of-year and month-of-year statistics\n",
+    "\n",
+    "Day-of-year and month-of-year statistics aggregate observations for the same\n",
+    "calendar day or month across many years to describe typical seasonal conditions\n",
+    "(all Januarys, or all January 1sts). Below we request day-of-year discharge\n",
+    "averages for January 1 and 2 — note `start_date`/`end_date` are in `MM-DD`\n",
+    "format:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0ab13bb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "jan_por_mean, _ = waterdata.get_stats_por(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    computation_type=\"arithmetic_mean\",\n",
+    "    start_date=\"01-01\",\n",
+    "    end_date=\"01-02\",\n",
+    ")\n",
+    "jan_por_mean[[\"time_of_year\", \"time_of_year_type\", \"computation\", \"value\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc2b04f",
+   "metadata": {},
+   "source": [
+    "The first two rows are the day-of-year averages. What's the third row? Its\n",
+    "`time_of_year_type` is `month_of_year` — it's the average across all *Januarys*.\n",
+    "This is a quirk of the statistics API: whenever the `start_date`–`end_date` range\n",
+    "overlaps the first day of a month (here `01-01`), you also get the month-of-year\n",
+    "summary.\n",
+    "\n",
+    "To return only one type, filter the `time_of_year_type` column — here,\n",
+    "month-of-year only:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d561aba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "moy = jan_por_mean[jan_por_mean[\"time_of_year_type\"] == \"month_of_year\"]\n",
+    "moy[[\"time_of_year\", \"time_of_year_type\", \"computation\", \"value\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43fe1eef",
+   "metadata": {},
+   "source": [
+    "### Percentile band plot\n",
+    "\n",
+    "Now an example that shows the power of the statistics API: we pull *all*\n",
+    "day-of-year discharge percentiles for the site. Computing these without the API\n",
+    "would mean downloading the entire daily period of record and computing\n",
+    "percentiles by hand.\n",
+    "\n",
+    "By default `get_stats_por` sets `expand_percentiles=True`, returning one row per\n",
+    "percentile with the value in `value` and the threshold in `percentile`\n",
+    "(minimum is reported as percentile 0, maximum as 100)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "18bd842c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "full_por_percentiles, _ = waterdata.get_stats_por(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    computation_type=[\"minimum\", \"maximum\", \"percentile\"],\n",
+    "    start_date=\"01-01\",\n",
+    "    end_date=\"12-31\",\n",
+    ")\n",
+    "# The January 1 day-of-year percentiles (used on the WDFN state pages):\n",
+    "jan1 = full_por_percentiles[\n",
+    "    (full_por_percentiles[\"time_of_year\"] == \"01-01\")\n",
+    "    & (full_por_percentiles[\"time_of_year_type\"] == \"day_of_year\")\n",
+    "]\n",
+    "jan1.sort_values(\"percentile\")[[\"time_of_year\", \"computation\", \"percentile\", \"value\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8fc4a28",
+   "metadata": {},
+   "source": [
+    "Pivoting the day-of-year rows so each percentile is a column lets us draw the\n",
+    "percentile \"ribbons\" — each band spans two adjacent percentiles (min–5th, 5th–10th, …):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aaa72823",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doy = full_por_percentiles[\n",
+    "    full_por_percentiles[\"time_of_year_type\"] == \"day_of_year\"\n",
+    "].copy()\n",
+    "doy[\"value\"] = pd.to_numeric(doy[\"value\"], errors=\"coerce\")  # API returns strings\n",
+    "bands = doy.pivot_table(index=\"time_of_year\", columns=\"percentile\", values=\"value\")\n",
+    "bands.columns = [int(c) for c in bands.columns]\n",
+    "bands = bands.sort_index()  # \"MM-DD\" strings sort chronologically within a year\n",
+    "\n",
+    "# x positions: map MM-DD onto a reference (leap) year so 02-29 is included\n",
+    "x = pd.to_datetime(\"2024-\" + bands.index, format=\"%Y-%m-%d\")\n",
+    "\n",
+    "# (lo, hi) percentile range, fill color, legend label\n",
+    "band_defs = [\n",
+    "    ((95, 100), \"#292f6b\", \"95th Percentile - Max\"),\n",
+    "    ((90, 95), \"#5699c0\", \"90th - 95th Percentile\"),\n",
+    "    ((75, 90), \"#aacee0\", \"75th - 90th Percentile\"),\n",
+    "    ((25, 75), \"#e9e9e9\", \"25th - 75th Percentile\"),\n",
+    "    ((10, 25), \"#ebd6ab\", \"10th - 25th Percentile\"),\n",
+    "    ((5, 10), \"#dcb668\", \"5th - 10th Percentile\"),\n",
+    "    ((0, 5), \"#8f4f1f\", \"Min - 5th Percentile\"),\n",
+    "]\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(9, 5))\n",
+    "for (lo, hi), color, label in band_defs:\n",
+    "    ax.fill_between(x, bands[lo], bands[hi], facecolor=color, alpha=0.7, label=label)\n",
+    "ax.set_yscale(\"log\")\n",
+    "ax.xaxis.set_major_locator(mdates.MonthLocator())\n",
+    "ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%b\"))\n",
+    "ax.set_xlabel(\"Month\")\n",
+    "ax.set_ylabel(\"Discharge, cubic feet per second\")\n",
+    "ax.set_title(\"Day-of-year percentile bands (USGS-02037500)\")\n",
+    "ax.legend(title=\"Historical percentiles\", fontsize=7, loc=\"upper right\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b4075bd",
+   "metadata": {},
+   "source": [
+    "Finally, overlay the actual daily mean discharge so we can see where recent\n",
+    "conditions fall relative to the historical bands — exactly the view on the\n",
+    "[Water Data for the Nation (WDFN) statistical graphs](https://waterdata.usgs.gov/monitoring-location/USGS-02037500/statistical-graphs/).\n",
+    "We pull two water years of daily means and join them to the bands by month-day."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "961eea3a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "daily, _ = waterdata.get_daily(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    statistic_id=\"00003\",\n",
+    "    time=[\"2024-01-01\", \"2025-12-31\"],\n",
+    ")\n",
+    "daily = daily.sort_values(\"time\").reset_index(drop=True)\n",
+    "daily[\"md\"] = daily[\"time\"].dt.strftime(\"%m-%d\")\n",
+    "\n",
+    "# Repeat the day-of-year bands across each actual calendar date\n",
+    "b = bands.reindex(daily[\"md\"]).reset_index(drop=True)\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(9, 5))\n",
+    "for (lo, hi), color, label in band_defs:\n",
+    "    ax.fill_between(\n",
+    "        daily[\"time\"], b[lo], b[hi], facecolor=color, alpha=0.7, label=label\n",
+    "    )\n",
+    "ax.plot(daily[\"time\"], daily[\"value\"], color=\"black\", lw=0.9, label=\"Daily mean\")\n",
+    "prov = daily[daily[\"approval_status\"] == \"Provisional\"]\n",
+    "ax.scatter(prov[\"time\"], prov[\"value\"], color=\"red\", s=5, zorder=3, label=\"Provisional\")\n",
+    "ax.set_yscale(\"log\")\n",
+    "ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))\n",
+    "ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%b %Y\"))\n",
+    "ax.set_ylabel(\"Discharge, cubic feet per second\")\n",
+    "ax.set_title(\"Daily mean discharge vs. historical percentile bands\")\n",
+    "ax.legend(fontsize=7, ncol=2, loc=\"upper right\")\n",
+    "fig.autofmt_xdate()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e31f1726",
+   "metadata": {},
+   "source": [
+    "## Fetching monthly and annual statistics within a date range\n",
+    "\n",
+    "Unlike the day-/month-of-year normals, `get_stats_date_range` summarizes specific\n",
+    "months and years inside a requested window. Here we ask for the average discharge\n",
+    "for January 2024 — note the `YYYY-MM-DD` date format:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0bc8cd83",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "jan_daterange_mean, _ = waterdata.get_stats_date_range(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    computation_type=\"arithmetic_mean\",\n",
+    "    start_date=\"2024-01-01\",\n",
+    "    end_date=\"2024-01-31\",\n",
+    ")\n",
+    "jan_daterange_mean[[\"start_date\", \"end_date\", \"interval_type\", \"value\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d915aed",
+   "metadata": {},
+   "source": [
+    "Instead of `time_of_year`, the output has `start_date`, `end_date`, and\n",
+    "`interval_type`. The first row is the monthly average; the API also returns the\n",
+    "**calendar year** and **water year** averages for any year intersecting the\n",
+    "range. A 93-day window can therefore touch two calendar and two water years:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cfe28029",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multiyear, _ = waterdata.get_stats_date_range(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    computation_type=\"arithmetic_mean\",\n",
+    "    start_date=\"2023-09-30\",\n",
+    "    end_date=\"2024-01-01\",\n",
+    ")\n",
+    "multiyear[[\"start_date\", \"end_date\", \"interval_type\", \"value\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c30978f",
+   "metadata": {},
+   "source": [
+    "Filter the `interval_type` column (values `month`, `calendar_year`,\n",
+    "`water_year`) to keep only certain intervals — here, the annual rows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7ff90e81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "multiyear[multiyear[\"interval_type\"].isin([\"calendar_year\", \"water_year\"])][\n",
+    "    [\"start_date\", \"end_date\", \"interval_type\", \"value\"]\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "061b9cbe",
+   "metadata": {},
+   "source": [
+    "### Monthly mean table\n",
+    "\n",
+    "We can reproduce something like a Water Year Summary monthly-mean table. We pull\n",
+    "the full period of record (no dates), keep the monthly intervals, and aggregate\n",
+    "by calendar month in water-year order. (Values may differ slightly from the\n",
+    "official summaries due to rounding.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1c705056",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "monthly_raw, _ = waterdata.get_stats_date_range(\n",
+    "    monitoring_location_id=site1,\n",
+    "    parameter_code=\"00060\",\n",
+    "    computation_type=\"arithmetic_mean\",\n",
+    ")\n",
+    "m = monthly_raw[monthly_raw[\"interval_type\"] == \"month\"].copy()\n",
+    "m[\"start_date\"] = pd.to_datetime(m[\"start_date\"])\n",
+    "m[\"value\"] = pd.to_numeric(m[\"value\"], errors=\"coerce\")\n",
+    "m = m[(m[\"start_date\"] >= \"2004-10-01\") & (m[\"start_date\"] < \"2025-09-01\")]\n",
+    "m = m.dropna(subset=[\"value\"])\n",
+    "m[\"month\"] = m[\"start_date\"].dt.strftime(\"%b\")\n",
+    "m[\"water_year\"] = (m[\"start_date\"] + pd.DateOffset(months=3)).dt.year\n",
+    "\n",
+    "\n",
+    "def summarize(g):\n",
+    "    hi = g.loc[g[\"value\"].idxmax()]\n",
+    "    lo = g.loc[g[\"value\"].idxmin()]\n",
+    "    return pd.Series(\n",
+    "        {\n",
+    "            \"Mean\": round(g[\"value\"].mean()),\n",
+    "            \"Max (WY)\": f\"{round(hi['value'])} ({int(hi['water_year'])})\",\n",
+    "            \"Min (WY)\": f\"{round(lo['value'])} ({int(lo['water_year'])})\",\n",
+    "        }\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "wy_order = [\n",
+    "    \"Oct\",\n",
+    "    \"Nov\",\n",
+    "    \"Dec\",\n",
+    "    \"Jan\",\n",
+    "    \"Feb\",\n",
+    "    \"Mar\",\n",
+    "    \"Apr\",\n",
+    "    \"May\",\n",
+    "    \"Jun\",\n",
+    "    \"Jul\",\n",
+    "    \"Aug\",\n",
+    "    \"Sep\",\n",
+    "]\n",
+    "table = m.groupby(\"month\")[[\"value\", \"water_year\"]].apply(summarize).reindex(wy_order)\n",
+    "table.T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31cd8b14",
+   "metadata": {},
+   "source": [
+    "## Statistics API tips\n",
+    "\n",
+    "The statistics API does **not** follow the OGC standards used by the\n",
+    "`api.waterdata.usgs.gov/ogcapi/v0/` endpoints. A few things to keep in mind:\n",
+    "\n",
+    "- **Higher rate limits.** At the time of writing the statistics API allows ~4000\n",
+    "  requests/hour per IP (per token if a token is supplied).\n",
+    "- **All columns, always.** There is no `skip_geometry` or `properties` argument —\n",
+    "  the API returns the full column set.\n",
+    "- **Month-of-year normals.** To get month-of-year statistics from\n",
+    "  `get_stats_por`, make the `start_date`–`end_date` range overlap the first of\n",
+    "  the month (e.g. `01-01`–`03-01` returns the January, February, and March\n",
+    "  month-of-year stats in addition to each day-of-year).\n",
+    "- **Monthly/annual intervals.** `get_stats_date_range` returns a summary for\n",
+    "  every calendar month, calendar year, and water year that intersects the range.\n",
+    "- **Median = the 50th percentile.** Requesting both `median` and `percentile`\n",
+    "  duplicates the median; you rarely need both.\n",
+    "- **Min/max are not percentiles.** Use\n",
+    "  `computation_type=[\"minimum\", \"maximum\", \"percentile\"]` for a complete set of\n",
+    "  order statistics (as we did for the band plot).\n",
+    "- **Fixed percentiles.** `percentile` only ever returns the 5th, 10th, 25th,\n",
+    "  50th, 75th, 90th, and 95th. For other percentiles, pull the daily record with\n",
+    "  `get_daily` and compute them yourself.\n",
+    "- **Watch `sample_count`.** It's the number of observations behind a statistic;\n",
+    "  there is no minimum, so a monthly/annual value can rest on a single daily\n",
+    "  observation.\n",
+    "\n",
+    "## More help\n",
+    "\n",
+    "- Documentation: <https://doi-usgs.github.io/dataretrieval-python/>\n",
+    "- Statistics documentation: <https://waterdata.usgs.gov/statistics-documentation/>\n",
+    "- Equivalent R article: [daily statistics](https://doi-usgs.github.io/dataRetrieval/articles/daily_data_statistics.html)\n",
+    "- Issues / questions: <https://github.com/DOI-USGS/dataretrieval-python/issues>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
new file mode 100644
index 00000000..a58e6f56
--- /dev/null
+++ b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
@@ -0,0 +1,541 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "438bbb08",
+   "metadata": {},
+   "source": [
+    "# Discrete water-quality samples: `get_samples`\n",
+    "\n",
+    "As USGS retires the legacy NWIS discrete water-quality services, the new\n",
+    "*Water Data for the Nation* samples service takes their place. In Python it is\n",
+    "exposed through three functions in `dataretrieval.waterdata`:\n",
+    "\n",
+    "- `get_samples` — retrieve discrete water-quality results (or, with `service=`,\n",
+    "  the matching locations, activities, projects, or organizations).\n",
+    "- `get_samples_summary` — summarize what data a single site has.\n",
+    "- `get_codes` — list the allowable values for the categorical query arguments.\n",
+    "\n",
+    "We'll cover retrieving data from a known site, using geographic filters, and\n",
+    "discovering what data are available. The interactive web UI is at\n",
+    "<https://waterdata.usgs.gov/download-samples/> and the API docs are at\n",
+    "<https://api.waterdata.usgs.gov/samples-data/docs>.\n",
+    "\n",
+    "> Column names: unlike the OGC `get_daily` / `get_monitoring_locations`\n",
+    "> functions, the samples service uses WQX3-style names such as\n",
+    "> `Location_Latitude`, `Activity_StartDateTime`, and `Result_Measure`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "257b6197",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "\n",
+    "from dataretrieval import waterdata\n",
+    "from dataretrieval.waterdata import PROFILE_LOOKUP\n",
+    "\n",
+    "%matplotlib inline\n",
+    "plt.rcParams[\"figure.figsize\"] = (7, 4)\n",
+    "\n",
+    "\n",
+    "# Scatter plot of sample-site locations (a static map; use folium for an\n",
+    "# interactive version).\n",
+    "def map_sites(df, title=\"\"):\n",
+    "    lon = pd.to_numeric(df[\"Location_Longitude\"], errors=\"coerce\")\n",
+    "    lat = pd.to_numeric(df[\"Location_Latitude\"], errors=\"coerce\")\n",
+    "    ax = plt.subplots(figsize=(7, 5))[1]\n",
+    "    ax.scatter(lon, lat, s=10, color=\"red\", alpha=0.7)\n",
+    "    ax.set_xlabel(\"Longitude\")\n",
+    "    ax.set_ylabel(\"Latitude\")\n",
+    "    ax.set_title(f\"{title} ({len(df)} sites)\")\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c166c18",
+   "metadata": {},
+   "source": [
+    "## Retrieving data from a known site\n",
+    "\n",
+    "Given a USGS site, `get_samples_summary` reports what discrete-sample data are\n",
+    "available there — one row per (characteristic group, characteristic,\n",
+    "user-supplied characteristic) with result and activity counts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "27e0d33a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "site = \"USGS-04183500\"\n",
+    "data_at_site, _ = waterdata.get_samples_summary(monitoringLocationIdentifier=site)\n",
+    "data_at_site.sort_values(\"resultCount\", ascending=False).head(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e388b2d5",
+   "metadata": {},
+   "source": [
+    "Note the `characteristicUserSupplied` column: asking for a bare characteristic\n",
+    "like *Phosphorus* would return both filtered and unfiltered values mixed\n",
+    "together. `characteristicUserSupplied` is a very specific descriptor (similar to\n",
+    "a long-form USGS parameter code) that lets you isolate exactly the constituent\n",
+    "you want. To pull the underlying data, use `get_samples`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "86bfc2b5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "user_char = \"Phosphorus as phosphorus, water, unfiltered\"\n",
+    "phos_data, _ = waterdata.get_samples(\n",
+    "    monitoringLocationIdentifier=site,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    ")\n",
+    "print(f\"default ('fullphyschem') profile -> {phos_data.shape[1]} columns\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "593529c6",
+   "metadata": {},
+   "source": [
+    "The default profile (`fullphyschem`, the \"Full physical chemical\" profile) is\n",
+    "comprehensive, hence the very wide table. For plotting we usually only need a few\n",
+    "columns, so ask for the `narrow` profile instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "682226d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "phos_narrow, _ = waterdata.get_samples(\n",
+    "    monitoringLocationIdentifier=site,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    profile=\"narrow\",\n",
+    ")\n",
+    "print(f\"'narrow' profile -> {phos_narrow.shape[1]} columns\")\n",
+    "phos_narrow[[\"Activity_StartDateTime\", \"Result_Measure\", \"Result_MeasureUnit\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "697e0827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = pd.to_datetime(phos_narrow[\"Activity_StartDateTime\"], errors=\"coerce\")\n",
+    "y = pd.to_numeric(phos_narrow[\"Result_Measure\"], errors=\"coerce\")\n",
+    "fig, ax = plt.subplots(figsize=(7, 4))\n",
+    "ax.scatter(x, y, s=10)\n",
+    "ax.set_xlabel(\"Date\")\n",
+    "ax.set_ylabel(user_char, wrap=True)\n",
+    "ax.set_title(phos_narrow[\"Location_Name\"].iloc[0])\n",
+    "fig.autofmt_xdate()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6573353a",
+   "metadata": {},
+   "source": [
+    "## Return data types\n",
+    "\n",
+    "Two arguments control what comes back: `service` defines the *kind* of data and\n",
+    "`profile` defines which columns of that kind are returned. The valid combinations\n",
+    "are published in `PROFILE_LOOKUP`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "49ceacca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PROFILE_LOOKUP"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "380fb6d1",
+   "metadata": {},
+   "source": [
+    "## Geographic filters\n",
+    "\n",
+    "Often you don't know a site number but you do have an area of interest. Below we\n",
+    "keep the queries lightweight by setting `service=\"locations\"` and\n",
+    "`profile=\"site\"` (so we get *where* data exists, not the result values\n",
+    "themselves) and filter on our phosphorus characteristic.\n",
+    "\n",
+    "### Bounding box\n",
+    "\n",
+    "A bounding box is `[west, south, east, north]` (longitudes then latitudes):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2d582ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bbox = [-90.8, 44.2, -89.9, 45.0]\n",
+    "bbox_sites, _ = waterdata.get_samples(\n",
+    "    boundingBox=bbox,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    service=\"locations\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "map_sites(bbox_sites, \"Phosphorus sites in bounding box\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c05a5786",
+   "metadata": {},
+   "source": [
+    "### Hydrologic unit codes (HUCs)\n",
+    "\n",
+    "HUCs identify drainage areas; this filter accepts 2-, 4-, 6-, 8-, 10-, or\n",
+    "12-digit codes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fbbf7898",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "huc_sites, _ = waterdata.get_samples(\n",
+    "    hydrologicUnit=\"070700\",\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    service=\"locations\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "map_sites(huc_sites, \"Phosphorus sites in HUC 070700\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "151d88ba",
+   "metadata": {},
+   "source": [
+    "### Distance from a point\n",
+    "\n",
+    "Supply a latitude, longitude, and radius in miles:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9711e26c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "point_sites, _ = waterdata.get_samples(\n",
+    "    pointLocationLatitude=43.074680,\n",
+    "    pointLocationLongitude=-89.428054,\n",
+    "    pointLocationWithinMiles=20,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    service=\"locations\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "map_sites(point_sites, \"Phosphorus sites within 20 mi of Madison, WI\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec22beac",
+   "metadata": {},
+   "source": [
+    "### County FIPS\n",
+    "\n",
+    "County FIPS codes take the form `US:SS:CCC`. Wisconsin's state code is available\n",
+    "from `dataretrieval.codes`, and Dane County's full FIPS is `US:55:025`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f07b210b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dataretrieval.codes import states\n",
+    "\n",
+    "wi = states.fips_codes[\"Wisconsin\"]  # \"55\"\n",
+    "dane_county = f\"US:{wi}:025\"\n",
+    "county_sites, _ = waterdata.get_samples(\n",
+    "    countyFips=dane_county,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    service=\"locations\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "map_sites(county_sites, \"Phosphorus sites in Dane County, WI\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e43b993",
+   "metadata": {},
+   "source": [
+    "### State FIPS\n",
+    "\n",
+    "State FIPS codes take the form `US:SS`. A whole-state query can return a lot of\n",
+    "sites, so here we also constrain the activity start date to October–November 2024\n",
+    "(see *Additional query parameters* below):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "83519737",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "state_fip = f\"US:{wi}\"  # \"US:55\"\n",
+    "state_sites_recent, _ = waterdata.get_samples(\n",
+    "    stateFips=state_fip,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    service=\"locations\",\n",
+    "    activityStartDateLower=\"2024-10-01\",\n",
+    "    activityStartDateUpper=\"2024-11-30\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "map_sites(state_sites_recent, \"WI phosphorus sites, Oct-Nov 2024\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0aab190b",
+   "metadata": {},
+   "source": [
+    "## Additional query parameters\n",
+    "\n",
+    "Several parameters narrow the results further. The allowable values for the\n",
+    "categorical ones come from `get_codes`. Note that `get_codes` returns a plain\n",
+    "`DataFrame` (no metadata tuple).\n",
+    "\n",
+    "### `siteTypeCode` / `siteTypeName`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f21e23e7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "site_type_info = waterdata.get_codes(code_service=\"sitetype\")\n",
+    "site_type_info[[\"typeCode\", \"typeLongName\"]].head(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcdf0025",
+   "metadata": {},
+   "source": [
+    "### `activityMediaName`\n",
+    "\n",
+    "The environmental medium that was sampled or analyzed:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64369260",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "waterdata.get_codes(code_service=\"samplemedia\")[\"activityMedia\"].tolist()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "647de77a",
+   "metadata": {},
+   "source": [
+    "### `characteristicGroup`\n",
+    "\n",
+    "A broad category describing the measurement (generally following the Water\n",
+    "Quality Portal groups):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d1b139a9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "waterdata.get_codes(code_service=\"characteristicgroup\")[\"characteristicGroup\"].tolist()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfa4bfbf",
+   "metadata": {},
+   "source": [
+    "### `characteristic` and `usgsPCode`\n",
+    "\n",
+    "The `characteristics` table lists specific constituents along with their USGS\n",
+    "parameter codes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72c32873",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "characteristic_info = waterdata.get_codes(code_service=\"characteristics\")\n",
+    "print(\"unique characteristic names:\")\n",
+    "print(characteristic_info[\"characteristicName\"].drop_duplicates().head().tolist())\n",
+    "print(\"\\nexample USGS parameter codes:\")\n",
+    "print(characteristic_info[\"parameterCode\"].dropna().drop_duplicates().head().tolist())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0872a69",
+   "metadata": {},
+   "source": [
+    "### `characteristicUserSupplied`\n",
+    "\n",
+    "The USGS \"observed property\" — the detailed descriptor that replaces the old\n",
+    "parameter name / pcode for discrete data, and the value we filtered on above:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "236c0f76",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "waterdata.get_codes(code_service=\"observedproperty\")[\"observedProperty\"].head().tolist()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3caa694d",
+   "metadata": {},
+   "source": [
+    "Other filters worth knowing about: `projectIdentifier` (needs prior project\n",
+    "info), `recordIdentifierUserSupplied` (needs the supplier's record id), and\n",
+    "`activityStartDateLower` / `activityStartDateUpper` for date ranges (used above)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dfb0d2f",
+   "metadata": {},
+   "source": [
+    "## Data discovery\n",
+    "\n",
+    "Combining a geographic filter with site-type and characteristic filters lets you\n",
+    "zero in on candidate sites. For example, lakes in Dane County, WI that measured\n",
+    "our phosphorus characteristic:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8af3af88",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "county_lake_sites, _ = waterdata.get_samples(\n",
+    "    countyFips=dane_county,\n",
+    "    characteristicUserSupplied=user_char,\n",
+    "    siteTypeName=\"Lake, Reservoir, Impoundment\",\n",
+    "    service=\"locations\",\n",
+    "    profile=\"site\",\n",
+    ")\n",
+    "print(f\"{len(county_lake_sites)} lake sites measuring phosphorus in Dane County, WI\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87f31bda",
+   "metadata": {},
+   "source": [
+    "`get_samples_summary` accepts one site at a time, so we loop over the candidate\n",
+    "sites to tally how much phosphorus data each has — useful for deciding which\n",
+    "sites to actually pull results from."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "421b6982",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rows = []\n",
+    "for loc_id in county_lake_sites[\"Location_Identifier\"]:\n",
+    "    avail, _ = waterdata.get_samples_summary(monitoringLocationIdentifier=loc_id)\n",
+    "    rows.append(avail[avail[\"characteristicUserSupplied\"] == user_char])\n",
+    "\n",
+    "all_data = pd.concat(rows, ignore_index=True)\n",
+    "all_data.sort_values(\"resultCount\", ascending=False)[\n",
+    "    [\n",
+    "        \"monitoringLocationIdentifier\",\n",
+    "        \"resultCount\",\n",
+    "        \"activityCount\",\n",
+    "        \"firstActivity\",\n",
+    "        \"mostRecentActivity\",\n",
+    "    ]\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2614654",
+   "metadata": {},
+   "source": [
+    "This summary helps narrow down which sites to request data from — whether you\n",
+    "need sites with recent data, lots of data, or just any measurement at all.\n",
+    "\n",
+    "## More help\n",
+    "\n",
+    "- Documentation: <https://doi-usgs.github.io/dataretrieval-python/>\n",
+    "- Samples API docs: <https://api.waterdata.usgs.gov/samples-data/docs>\n",
+    "- Equivalent R article: [Introducing read_waterdata_samples](https://doi-usgs.github.io/dataRetrieval/articles/samples_data.html)\n",
+    "- Issues / questions: <https://github.com/DOI-USGS/dataretrieval-python/issues>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_Introduction_Examples.ipynb b/demos/USGS_WaterData_Introduction_Examples.ipynb
new file mode 100644
index 00000000..3ca30420
--- /dev/null
+++ b/demos/USGS_WaterData_Introduction_Examples.ipynb
@@ -0,0 +1,660 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a8818c89",
+   "metadata": {},
+   "source": [
+    "# Introduction to the USGS Water Data APIs\n",
+    "\n",
+    "The [USGS Water Data APIs](https://api.waterdata.usgs.gov/ogcapi/v0/) are the\n",
+    "modern, OGC-based replacement for the legacy NWIS web services. In Python they are\n",
+    "exposed through the `dataretrieval.waterdata` module, which will gradually replace\n",
+    "the older `dataretrieval.nwis` functions.\n",
+    "\n",
+    "This notebook tours each new function. The NWIS shut-down timeline is still\n",
+    "uncertain, so we recommend migrating to the `waterdata` functions sooner rather\n",
+    "than later.\n",
+    "\n",
+    "If you are coming from the R `dataRetrieval` package, the functions map across as\n",
+    "follows:\n",
+    "\n",
+    "| R `dataRetrieval` | Python `dataretrieval.waterdata` |\n",
+    "| --- | --- |\n",
+    "| `read_waterdata_monitoring_location` | `get_monitoring_locations` |\n",
+    "| `read_waterdata_ts_meta` / `read_waterdata_combined_meta` | `get_time_series_metadata` / `get_combined_metadata` |\n",
+    "| `read_waterdata_parameter_codes` | `get_reference_table(collection=\"parameter-codes\")` |\n",
+    "| `read_waterdata_daily` | `get_daily` |\n",
+    "| `read_waterdata_continuous` | `get_continuous` |\n",
+    "| `read_waterdata_field_measurements` | `get_field_measurements` |\n",
+    "| `read_waterdata_channel` | `get_channel` |\n",
+    "| `read_waterdata_latest_continuous` / `read_waterdata_latest_daily` | `get_latest_continuous` / `get_latest_daily` |\n",
+    "| `read_waterdata` (CQL) | the `filter` / `filter_lang` arguments on any function |\n",
+    "| `read_waterdata_metadata` | `get_reference_table` |\n",
+    "| `read_waterdata_samples` | `get_samples` |\n",
+    "| `read_waterdata_stats_por` / `read_waterdata_stats_daterange` | `get_stats_por` / `get_stats_date_range` |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03b51493",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "\n",
+    "from dataretrieval import waterdata\n",
+    "\n",
+    "%matplotlib inline\n",
+    "plt.rcParams[\"figure.figsize\"] = (7, 4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27cea444",
+   "metadata": {},
+   "source": [
+    "> **Return values.** Every `dataretrieval.waterdata` function returns a\n",
+    "> `(data, metadata)` tuple. The first element is a `pandas.DataFrame` (or a\n",
+    "> `geopandas.GeoDataFrame` when the service returns a geometry column); the\n",
+    "> second is a small metadata object describing the request. Throughout this\n",
+    "> notebook we unpack the tuple as `df, md = waterdata.get_...(...)`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e38880f",
+   "metadata": {},
+   "source": [
+    "## New features\n",
+    "\n",
+    "The new API endpoints each deliver a different type of USGS water data, and they\n",
+    "all share features the legacy services lacked.\n",
+    "\n",
+    "### Flexible queries\n",
+    "\n",
+    "The new functions expose **all** of the query parameters the API supports, each\n",
+    "defaulting to `None`. You do **not** need to (and usually should not) specify\n",
+    "them all. Filters are combined with a Boolean *AND*: passing both a list of\n",
+    "monitoring locations and a list of parameter codes returns only the\n",
+    "combinations of the two. Because every argument is named, your IDE can\n",
+    "autocomplete the options.\n",
+    "\n",
+    "### Flexible columns returned\n",
+    "\n",
+    "Use the `properties` argument to choose which columns come back. The full set of\n",
+    "available properties for a collection is published in that collection's schema,\n",
+    "e.g. <https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily/queryables>."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d59a461b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Ask for just a few columns instead of the full ~40-column record.\n",
+    "site_info, _ = waterdata.get_monitoring_locations(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    properties=[\n",
+    "        \"monitoring_location_id\",\n",
+    "        \"site_type\",\n",
+    "        \"drainage_area\",\n",
+    "        \"monitoring_location_name\",\n",
+    "    ],\n",
+    ")\n",
+    "site_info.drop(columns=\"geometry\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54ace335",
+   "metadata": {},
+   "source": [
+    "### API tokens\n",
+    "\n",
+    "USGS now rate-limits requests per IP address per hour. If you hit the limit you\n",
+    "can request a free API token at <https://api.waterdata.usgs.gov/signup/>. Keep it\n",
+    "out of shared scripts and version control. (At the time of writing the Python\n",
+    "`dataretrieval` package does not yet wire a token into these calls; the rate\n",
+    "limits are generous for the queries below.)\n",
+    "\n",
+    "### Contextual Query Language (CQL2)\n",
+    "\n",
+    "The APIs accept [CQL2](https://www.loc.gov/standards/sru/cql/) expressions for\n",
+    "complex queries through the `filter` / `filter_lang` arguments. See the\n",
+    "[General retrieval and CQL2](#General-retrieval-and-CQL2) section below.\n",
+    "\n",
+    "### Simple features\n",
+    "\n",
+    "Spatial collections return a `geometry` column, so `get_*` calls give you a\n",
+    "`geopandas.GeoDataFrame` that drops straight into geospatial workflows. Pass `skip_geometry=True` to get a plain `DataFrame`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20c5dd03",
+   "metadata": {},
+   "source": [
+    "## Lessons learned\n",
+    "\n",
+    "### Request many sites in one call\n",
+    "\n",
+    "`dataretrieval` automatically splits a large request — many monitoring\n",
+    "locations, several parameter codes, or a complex filter — into URL-sized\n",
+    "sub-requests and recombines the results, and it can resume a long pull that hits\n",
+    "a rate limit or transient server error without refetching completed work. So\n",
+    "pass all your sites in one call rather than looping over them.\n",
+    "\n",
+    "The main exception is **continuous** data, which is capped at 3 years per\n",
+    "request. See the *Continuous Data* notebook for large continuous pulls."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49f3ad0",
+   "metadata": {},
+   "source": [
+    "## New functions\n",
+    "\n",
+    "### Monitoring location\n",
+    "\n",
+    "`get_monitoring_locations` returns site metadata. To browse the service in a\n",
+    "web browser, visit\n",
+    "<https://api.waterdata.usgs.gov/ogcapi/v0/collections/monitoring-locations>.\n",
+    "\n",
+    "A simple request for one known USGS site:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d42fc61a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sites_information, _ = waterdata.get_monitoring_locations(\n",
+    "    monitoring_location_id=\"USGS-01491000\"\n",
+    ")\n",
+    "print(f\"{sites_information.shape[1]} columns returned\")\n",
+    "sites_information.drop(columns=\"geometry\").T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29d1be4d",
+   "metadata": {},
+   "source": [
+    "Any returned column can also be used as an input filter. For example, to find\n",
+    "every stream site in Wisconsin:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cf090884",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sites_wi, _ = waterdata.get_monitoring_locations(\n",
+    "    state_name=\"Wisconsin\",\n",
+    "    site_type=\"Stream\",\n",
+    ")\n",
+    "print(f\"{len(sites_wi)} Wisconsin stream sites\")\n",
+    "sites_wi[[\"monitoring_location_id\", \"monitoring_location_name\", \"geometry\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4096bf3",
+   "metadata": {},
+   "source": [
+    "Because the result is a `GeoDataFrame`, plotting the locations is a one-liner.\n",
+    "For an interactive map, `folium` works well with the same data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce4c88a7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ax = sites_wi.plot(markersize=4, figsize=(7, 5))\n",
+    "ax.set_title(\"USGS stream monitoring locations in Wisconsin\")\n",
+    "ax.set_xlabel(\"Longitude\")\n",
+    "ax.set_ylabel(\"Latitude\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e162fcf",
+   "metadata": {},
+   "source": [
+    "### Time series & combined metadata\n",
+    "\n",
+    "`get_combined_metadata` merges time-series metadata\n",
+    "(`get_time_series_metadata`) and field-measurement metadata by site, telling you\n",
+    "which time series a site offers and the span of each."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a593b5e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ts_available, _ = waterdata.get_combined_metadata(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    parameter_code=[\"00060\", \"00010\"],\n",
+    ")\n",
+    "cols = [\"parameter_name\", \"statistic_id\", \"begin\", \"end\", \"last_modified\"]\n",
+    "ts_available[cols]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c294fee3",
+   "metadata": {},
+   "source": [
+    "### Parameter codes\n",
+    "\n",
+    "Parameter-code descriptions come from the `parameter-codes` reference table:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cc1601c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pcode_info, _ = waterdata.get_reference_table(\n",
+    "    collection=\"parameter-codes\",\n",
+    "    query={\"id\": \"00660\"},\n",
+    ")\n",
+    "pcode_info.T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "330064f2",
+   "metadata": {},
+   "source": [
+    "### Daily values\n",
+    "\n",
+    "`get_daily` returns daily values. Browse it at\n",
+    "<https://api.waterdata.usgs.gov/ogcapi/v0/collections/daily>."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d1fef3df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "daily_modern, _ = waterdata.get_daily(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    parameter_code=[\"00060\", \"00010\"],\n",
+    "    statistic_id=\"00003\",\n",
+    "    time=[\"2023-10-01\", \"2024-09-30\"],\n",
+    ")\n",
+    "daily_modern[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4817c8c9",
+   "metadata": {},
+   "source": [
+    "Notice the data come back in **long** format — one observation per row. Long\n",
+    "data are usually easier to work with; here we facet by `parameter_code`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0578529",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "params = sorted(daily_modern[\"parameter_code\"].unique())\n",
+    "fig, axes = plt.subplots(len(params), 1, figsize=(7, 5), sharex=True, squeeze=False)\n",
+    "axes = axes[:, 0]  # squeeze=False -> always a 2-D array, even for one param\n",
+    "for ax, pcode in zip(axes, params):\n",
+    "    sub = daily_modern[daily_modern[\"parameter_code\"] == pcode]\n",
+    "    ax.scatter(sub[\"time\"], sub[\"value\"], s=4)\n",
+    "    ax.set_ylabel(pcode)\n",
+    "axes[0].set_title(\"Daily values at USGS-01491000 (water year 2024)\")\n",
+    "axes[-1].set_xlabel(\"time\")\n",
+    "fig.autofmt_xdate()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8af565c7",
+   "metadata": {},
+   "source": [
+    "### Continuous\n",
+    "\n",
+    "`get_continuous` returns instantaneous (sensor) values. Browse it at\n",
+    "<https://api.waterdata.usgs.gov/ogcapi/v0/collections/continuous>.\n",
+    "\n",
+    "This service currently allows at most **3 years** of data per request; with no\n",
+    "`time` argument it returns the latest year. Continuous data have no geometry\n",
+    "column and do not support bounding-box queries. For large pulls, see the\n",
+    "*Continuous Data* notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2dbcdd47",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sensor_data, _ = waterdata.get_continuous(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    parameter_code=\"00060\",\n",
+    "    time=\"2024-09-01/2024-09-03\",\n",
+    ")\n",
+    "sensor_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b4fa772",
+   "metadata": {},
+   "source": [
+    "### Field measurements\n",
+    "\n",
+    "`get_field_measurements` returns discrete field measurements, including\n",
+    "groundwater levels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12d4649a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "field_modern, _ = waterdata.get_field_measurements(\n",
+    "    monitoring_location_id=[\n",
+    "        \"USGS-451605097071701\",\n",
+    "        \"USGS-263819081585801\",\n",
+    "    ],\n",
+    "    time=[\"2023-10-01\", \"2024-09-30\"],\n",
+    ")\n",
+    "field_modern[[\"time\", \"monitoring_location_id\", \"parameter_code\", \"value\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6a6f9ba4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig, ax = plt.subplots(figsize=(7, 4))\n",
+    "for site, sub in field_modern.groupby(\"monitoring_location_id\"):\n",
+    "    ax.scatter(sub[\"time\"], sub[\"value\"], s=12, label=site)\n",
+    "ax.set_ylabel(\"value\")\n",
+    "ax.set_title(\"Field measurements\")\n",
+    "ax.legend(fontsize=7)\n",
+    "fig.autofmt_xdate()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a709774",
+   "metadata": {},
+   "source": [
+    "### Channel measurements\n",
+    "\n",
+    "`get_channel` returns channel-geometry measurements that accompany\n",
+    "`get_field_measurements`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb3105ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "channel, _ = waterdata.get_channel(monitoring_location_id=\"USGS-02238500\")\n",
+    "channel[[\"time\", \"channel_width\", \"channel_area\", \"channel_velocity\"]].head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0495e7ac",
+   "metadata": {},
+   "source": [
+    "### Latest continuous & latest daily\n",
+    "\n",
+    "`get_latest_continuous` and `get_latest_daily` have no NWIS equivalent — they\n",
+    "return the single most recent observation for each time series."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d82d74ba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "latest_uv, _ = waterdata.get_latest_continuous(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    parameter_code=\"00060\",\n",
+    ")\n",
+    "cols = [\"time\", \"value\", \"approval_status\", \"parameter_code\", \"unit_of_measure\"]\n",
+    "latest_uv[cols].T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a624271d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "latest_dv, _ = waterdata.get_latest_daily(\n",
+    "    monitoring_location_id=\"USGS-01491000\",\n",
+    "    parameter_code=\"00060\",\n",
+    ")\n",
+    "latest_dv[cols].T"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65390398",
+   "metadata": {},
+   "source": [
+    "### General retrieval and CQL2\n",
+    "\n",
+    "The OGC `get_*` functions accept a CQL2 expression through the `filter` /\n",
+    "`filter_lang` arguments, so even complex queries run against these same\n",
+    "functions — there is no separate \"general retrieval\" call.\n",
+    "\n",
+    "CQL2 supports a wildcard via `LIKE` (`%` matches any trailing characters). This\n",
+    "is handy for hydrologic unit codes, which may be stored as `02070010` or as a\n",
+    "longer code beginning with those digits. To get every site whose HUC starts with\n",
+    "`02070010`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "455de9d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "what_huc_sites, _ = waterdata.get_monitoring_locations(\n",
+    "    filter=\"hydrologic_unit_code LIKE '02070010%'\",\n",
+    "    filter_lang=\"cql-text\",\n",
+    ")\n",
+    "print(f\"{len(what_huc_sites)} sites in HUC 02070010\")\n",
+    "ax = what_huc_sites.plot(markersize=2, figsize=(7, 5))\n",
+    "ax.set_title(\"Sites within HUC 02070010\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fb5d920",
+   "metadata": {},
+   "source": [
+    "> **Numeric filters.** Every queryable on the Water Data API is typed as a\n",
+    "> *string*, so an unquoted numeric comparison like `drainage_area > 1000` is\n",
+    "> rejected by the server (and quoting it gives a misleading lexicographic\n",
+    "> comparison). `dataretrieval` catches this and raises a `ValueError`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "82f8f1b5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    waterdata.get_monitoring_locations(\n",
+    "        filter=\"drainage_area > 1000\",\n",
+    "        filter_lang=\"cql-text\",\n",
+    "    )\n",
+    "except ValueError as e:\n",
+    "    print(type(e).__name__, \"->\", str(e)[:120], \"...\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8ae008",
+   "metadata": {},
+   "source": [
+    "The recommended pattern is to filter on the string-valued attributes the server\n",
+    "understands (state, site type, HUC, …) and then do the **numeric** reduction in\n",
+    "pandas. For example, large-drainage stream sites in Wisconsin and Minnesota:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a13e984e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sites, _ = waterdata.get_monitoring_locations(\n",
+    "    state_name=[\"Wisconsin\", \"Minnesota\"],\n",
+    "    site_type=\"Stream\",\n",
+    "    properties=[\n",
+    "        \"monitoring_location_id\",\n",
+    "        \"monitoring_location_name\",\n",
+    "        \"state_name\",\n",
+    "        \"drainage_area\",\n",
+    "    ],\n",
+    ")\n",
+    "big = sites[pd.to_numeric(sites[\"drainage_area\"], errors=\"coerce\") > 1000]\n",
+    "print(f\"{len(big)} of {len(sites)} WI/MN stream sites drain > 1000 sq mi\")\n",
+    "big.drop(columns=\"geometry\").head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10f0a74d",
+   "metadata": {},
+   "source": [
+    "### Reference tables\n",
+    "\n",
+    "`get_reference_table` exposes a variety of metadata tables. Any returned column\n",
+    "can be filtered on. See the\n",
+    "*USGS Reference Lists* notebook for the full list of collections.\n",
+    "\n",
+    "### Discrete samples\n",
+    "\n",
+    "Discrete USGS water-quality data are served from a separate (non-OGC) endpoint\n",
+    "via `get_samples`. See the *Discrete water-quality samples* notebook.\n",
+    "\n",
+    "### Daily data statistics\n",
+    "\n",
+    "Pre-computed temporal summary statistics are available through `get_stats_por`\n",
+    "(day-of-year / month-of-year) and `get_stats_date_range` (calendar month, calendar\n",
+    "year, water year). See the *Daily statistics* notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21d144d3",
+   "metadata": {},
+   "source": [
+    "## More notes\n",
+    "\n",
+    "### `limit` and paging\n",
+    "\n",
+    "The `limit` argument sets how many rows come back **per page**, not the overall\n",
+    "total — by default `dataretrieval` pages through everything. You rarely need to\n",
+    "touch it; lowering it can help on a spotty connection.\n",
+    "\n",
+    "### The `id` column\n",
+    "\n",
+    "Each endpoint natively returns an `id` column, and that value is used as an input\n",
+    "to *other* endpoints under a different name (the monitoring-locations `id` is the\n",
+    "`monitoring_location_id` everywhere else). `dataretrieval` renames `id`\n",
+    "accordingly, but you can request the raw `id` column via `properties`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fa2f8528",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "site = \"USGS-02238500\"\n",
+    "site_1, _ = waterdata.get_monitoring_locations(\n",
+    "    monitoring_location_id=site,\n",
+    "    properties=[\"monitoring_location_id\", \"state_name\", \"country_name\"],\n",
+    ")\n",
+    "site_2, _ = waterdata.get_monitoring_locations(\n",
+    "    monitoring_location_id=site,\n",
+    "    properties=[\"id\", \"state_name\", \"country_name\"],\n",
+    ")\n",
+    "print(\"renamed:\", [c for c in site_1.columns if c != \"geometry\"])\n",
+    "print(\"raw id :\", [c for c in site_2.columns if c != \"geometry\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7dcc03a9",
+   "metadata": {},
+   "source": [
+    "## More help\n",
+    "\n",
+    "- Documentation: <https://doi-usgs.github.io/dataretrieval-python/>\n",
+    "- R package docs (source of these examples): <https://doi-usgs.github.io/dataRetrieval/>\n",
+    "- Issues / questions: <https://github.com/DOI-USGS/dataretrieval-python/issues>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/demos/USGS_WaterData_ReferenceLists_Examples.ipynb b/demos/USGS_WaterData_ReferenceLists_Examples.ipynb
new file mode 100644
index 00000000..9799ba16
--- /dev/null
+++ b/demos/USGS_WaterData_ReferenceLists_Examples.ipynb
@@ -0,0 +1,138 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "aef324b2",
+   "metadata": {},
+   "source": [
+    "# USGS Reference Lists\n",
+    "\n",
+    "`get_reference_table` returns the metadata \"reference\" tables for the USGS Water\n",
+    "Data API. These tables enumerate the allowable values for the filter arguments\n",
+    "used elsewhere in the `waterdata` module — for example, the `site-types` table\n",
+    "lists every valid `site_type_code`, and `parameter-codes` lists every valid\n",
+    "`parameter_code`.\n",
+    "\n",
+    "`get_reference_table` returns a `(data, metadata)` tuple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f365047",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import get_args\n",
+    "\n",
+    "from IPython.display import Markdown, display\n",
+    "\n",
+    "from dataretrieval import waterdata\n",
+    "from dataretrieval.waterdata.types import METADATA_COLLECTIONS\n",
+    "\n",
+    "collections = list(get_args(METADATA_COLLECTIONS))\n",
+    "collections"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af9731ff",
+   "metadata": {},
+   "source": [
+    "## A single reference table\n",
+    "\n",
+    "Fetch one table by name. The first column is the table's primary code (the\n",
+    "collection name, singularized, with hyphens turned into underscores — e.g.\n",
+    "`site-types` -> `site_type`):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9840b289",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "site_types, _ = waterdata.get_reference_table(collection=\"site-types\")\n",
+    "print(f\"{len(site_types)} rows\")\n",
+    "site_types.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec3a00c8",
+   "metadata": {},
+   "source": [
+    "You can also pass a `query` to retrieve a subset — for instance specific\n",
+    "parameter codes by `id`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "09b5de2d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "some_pcodes, _ = waterdata.get_reference_table(\n",
+    "    collection=\"parameter-codes\",\n",
+    "    query={\"id\": \"00060,00065,00010\"},\n",
+    ")\n",
+    "some_pcodes[[\"parameter_code\", \"parameter_name\", \"unit_of_measure\"]]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b02202cf",
+   "metadata": {},
+   "source": [
+    "## All reference tables\n",
+    "\n",
+    "The full set of collections is enumerated in `METADATA_COLLECTIONS`. Below we\n",
+    "preview the first few rows of each. (Most are small lookup tables; a couple —\n",
+    "notably `parameter-codes` and `hydrologic-unit-codes` — are large, so we only\n",
+    "display the head.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "514392c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for collection in collections:\n",
+    "    df, _ = waterdata.get_reference_table(collection=collection)\n",
+    "    preview = df.drop(columns=\"geometry\") if \"geometry\" in df.columns else df\n",
+    "    display(Markdown(f\"### `{collection}`  \\n{len(df):,} rows\"))\n",
+    "    display(preview.head(3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d820806",
+   "metadata": {},
+   "source": [
+    "## More help\n",
+    "\n",
+    "- Documentation: <https://doi-usgs.github.io/dataretrieval-python/>\n",
+    "- See the *Introduction to the USGS Water Data APIs* notebook for how these reference\n",
+    "  values feed the `get_*` filter arguments.\n",
+    "- Equivalent R article: [USGS Reference Lists](https://doi-usgs.github.io/dataRetrieval/articles/Reference_Lists.html)\n",
+    "- Issues / questions: <https://github.com/DOI-USGS/dataretrieval-python/issues>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink b/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink
new file mode 100644
index 00000000..b169abdf
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_ContinuousData_Examples.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../demos/USGS_WaterData_ContinuousData_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink b/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink
new file mode 100644
index 00000000..c12f7840
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_DailyStatistics_Examples.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../demos/USGS_WaterData_DailyStatistics_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink b/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink
new file mode 100644
index 00000000..4729fe36
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_DiscreteSamples_Examples.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../demos/USGS_WaterData_DiscreteSamples_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink b/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink
new file mode 100644
index 00000000..9a442fe4
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_Introduction_Examples.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../demos/USGS_WaterData_Introduction_Examples.ipynb"
+}
diff --git a/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink b/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink
new file mode 100644
index 00000000..0600ecac
--- /dev/null
+++ b/docs/source/examples/USGS_WaterData_ReferenceLists_Examples.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../demos/USGS_WaterData_ReferenceLists_Examples.ipynb"
+}
diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst
index 6011fc4b..de6f1b25 100644
--- a/docs/source/examples/index.rst
+++ b/docs/source/examples/index.rst
@@ -15,6 +15,23 @@ covers a basic introduction to module functions and usage.
 
     WaterData_demo
 
+USGS Water Data API vignettes
+-----------------------------
+These notebooks are Python ports of the new USGS Water Data API vignettes from
+the R `dataRetrieval`_ package. Each introduces a family of ``waterdata``
+functions and is executed against the live USGS Water Data API.
+
+.. _dataRetrieval: https://doi-usgs.github.io/dataRetrieval/
+
+.. toctree::
+    :maxdepth: 1
+
+    USGS_WaterData_Introduction_Examples
+    USGS_WaterData_DiscreteSamples_Examples
+    USGS_WaterData_DailyStatistics_Examples
+    USGS_WaterData_ContinuousData_Examples
+    USGS_WaterData_ReferenceLists_Examples
+
 Simple uses of the ``dataretrieval`` package
 --------------------------------------------
 

From 95c44d523f3eee16c5a7ce2a05f518b9bbfcf22b Mon Sep 17 00:00:00 2001
From: thodson-usgs <thodson@usgs.gov>
Date: Thu, 28 May 2026 10:28:50 -0400
Subject: [PATCH 2/2] docs: clean up Water Data API vignettes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Rename `daily_modern`/`field_modern` → `daily_data`/`field_data` (the
  "_modern" suffix was a leftover NWIS-vs-modern comparison artifact)
- Rename `what_huc_sites` → `huc_sites`, `sites_information` → `sites_info`,
  `site_1`/`site_2` → `renamed`/`raw_id`, and `site1` → `site` in the daily
  statistics notebook (no `site2` ever existed)
- Fix broken intra-notebook anchor `#General-retrieval-and-CQL2`
  → `#general-retrieval-and-cql2`
- Simplify the daily-values facet plot by dropping the unnecessary
  `squeeze=False` + axes-indexing workaround
- Clean up the `map_sites` helper in the samples notebook to use the
  conventional `fig, ax = plt.subplots(...)` unpacking and a docstring

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 ...S_WaterData_DailyStatistics_Examples.ipynb | 14 ++---
 ...S_WaterData_DiscreteSamples_Examples.ipynb | 18 ++++--
 ...USGS_WaterData_Introduction_Examples.ipynb | 56 +++++++++++--------
 3 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/demos/USGS_WaterData_DailyStatistics_Examples.ipynb b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
index f35f52c9..ffe9647d 100644
--- a/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
+++ b/demos/USGS_WaterData_DailyStatistics_Examples.ipynb
@@ -41,7 +41,7 @@
     "\n",
     "%matplotlib inline\n",
     "\n",
-    "site1 = \"USGS-02037500\""
+    "site = \"USGS-02037500\""
    ]
   },
   {
@@ -66,7 +66,7 @@
    "outputs": [],
    "source": [
     "jan_por_mean, _ = waterdata.get_stats_por(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    computation_type=\"arithmetic_mean\",\n",
     "    start_date=\"01-01\",\n",
@@ -126,7 +126,7 @@
    "outputs": [],
    "source": [
     "full_por_percentiles, _ = waterdata.get_stats_por(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    computation_type=[\"minimum\", \"maximum\", \"percentile\"],\n",
     "    start_date=\"01-01\",\n",
@@ -210,7 +210,7 @@
    "outputs": [],
    "source": [
     "daily, _ = waterdata.get_daily(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    statistic_id=\"00003\",\n",
     "    time=[\"2024-01-01\", \"2025-12-31\"],\n",
@@ -259,7 +259,7 @@
    "outputs": [],
    "source": [
     "jan_daterange_mean, _ = waterdata.get_stats_date_range(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    computation_type=\"arithmetic_mean\",\n",
     "    start_date=\"2024-01-01\",\n",
@@ -287,7 +287,7 @@
    "outputs": [],
    "source": [
     "multiyear, _ = waterdata.get_stats_date_range(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    computation_type=\"arithmetic_mean\",\n",
     "    start_date=\"2023-09-30\",\n",
@@ -338,7 +338,7 @@
    "outputs": [],
    "source": [
     "monthly_raw, _ = waterdata.get_stats_date_range(\n",
-    "    monitoring_location_id=site1,\n",
+    "    monitoring_location_id=site,\n",
     "    parameter_code=\"00060\",\n",
     "    computation_type=\"arithmetic_mean\",\n",
     ")\n",
diff --git a/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
index a58e6f56..ea8deac2 100644
--- a/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
+++ b/demos/USGS_WaterData_DiscreteSamples_Examples.ipynb
@@ -43,12 +43,11 @@
     "plt.rcParams[\"figure.figsize\"] = (7, 4)\n",
     "\n",
     "\n",
-    "# Scatter plot of sample-site locations (a static map; use folium for an\n",
-    "# interactive version).\n",
     "def map_sites(df, title=\"\"):\n",
+    "    \"\"\"Static scatter plot of sample-site locations. Use folium for interactive.\"\"\"\n",
     "    lon = pd.to_numeric(df[\"Location_Longitude\"], errors=\"coerce\")\n",
     "    lat = pd.to_numeric(df[\"Location_Latitude\"], errors=\"coerce\")\n",
-    "    ax = plt.subplots(figsize=(7, 5))[1]\n",
+    "    fig, ax = plt.subplots(figsize=(7, 5))\n",
     "    ax.scatter(lon, lat, s=10, color=\"red\", alpha=0.7)\n",
     "    ax.set_xlabel(\"Longitude\")\n",
     "    ax.set_ylabel(\"Latitude\")\n",
@@ -528,12 +527,21 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
   }
  },
  "nbformat": 4,
diff --git a/demos/USGS_WaterData_Introduction_Examples.ipynb b/demos/USGS_WaterData_Introduction_Examples.ipynb
index 3ca30420..4c8b9935 100644
--- a/demos/USGS_WaterData_Introduction_Examples.ipynb
+++ b/demos/USGS_WaterData_Introduction_Examples.ipynb
@@ -97,7 +97,7 @@
    "outputs": [],
    "source": [
     "# Ask for just a few columns instead of the full ~40-column record.\n",
-    "site_info, _ = waterdata.get_monitoring_locations(\n",
+    "sites_info, _ = waterdata.get_monitoring_locations(\n",
     "    monitoring_location_id=\"USGS-01491000\",\n",
     "    properties=[\n",
     "        \"monitoring_location_id\",\n",
@@ -106,7 +106,7 @@
     "        \"monitoring_location_name\",\n",
     "    ],\n",
     ")\n",
-    "site_info.drop(columns=\"geometry\")"
+    "sites_info.drop(columns=\"geometry\")"
    ]
   },
   {
@@ -126,7 +126,7 @@
     "\n",
     "The APIs accept [CQL2](https://www.loc.gov/standards/sru/cql/) expressions for\n",
     "complex queries through the `filter` / `filter_lang` arguments. See the\n",
-    "[General retrieval and CQL2](#General-retrieval-and-CQL2) section below.\n",
+    "[General retrieval and CQL2](#general-retrieval-and-cql2) section below.\n",
     "\n",
     "### Simple features\n",
     "\n",
@@ -176,11 +176,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "sites_information, _ = waterdata.get_monitoring_locations(\n",
+    "sites_info, _ = waterdata.get_monitoring_locations(\n",
     "    monitoring_location_id=\"USGS-01491000\"\n",
     ")\n",
-    "print(f\"{sites_information.shape[1]} columns returned\")\n",
-    "sites_information.drop(columns=\"geometry\").T"
+    "print(f\"{sites_info.shape[1]} columns returned\")\n",
+    "sites_info.drop(columns=\"geometry\").T"
    ]
   },
   {
@@ -299,13 +299,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "daily_modern, _ = waterdata.get_daily(\n",
+    "daily_data, _ = waterdata.get_daily(\n",
     "    monitoring_location_id=\"USGS-01491000\",\n",
     "    parameter_code=[\"00060\", \"00010\"],\n",
     "    statistic_id=\"00003\",\n",
     "    time=[\"2023-10-01\", \"2024-09-30\"],\n",
     ")\n",
-    "daily_modern[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
+    "daily_data[[\"time\", \"parameter_code\", \"value\", \"approval_status\"]].head()"
    ]
   },
   {
@@ -324,11 +324,10 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "params = sorted(daily_modern[\"parameter_code\"].unique())\n",
-    "fig, axes = plt.subplots(len(params), 1, figsize=(7, 5), sharex=True, squeeze=False)\n",
-    "axes = axes[:, 0]  # squeeze=False -> always a 2-D array, even for one param\n",
+    "params = sorted(daily_data[\"parameter_code\"].unique())\n",
+    "fig, axes = plt.subplots(len(params), 1, figsize=(7, 5), sharex=True)\n",
     "for ax, pcode in zip(axes, params):\n",
-    "    sub = daily_modern[daily_modern[\"parameter_code\"] == pcode]\n",
+    "    sub = daily_data[daily_data[\"parameter_code\"] == pcode]\n",
     "    ax.scatter(sub[\"time\"], sub[\"value\"], s=4)\n",
     "    ax.set_ylabel(pcode)\n",
     "axes[0].set_title(\"Daily values at USGS-01491000 (water year 2024)\")\n",
@@ -386,14 +385,14 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "field_modern, _ = waterdata.get_field_measurements(\n",
+    "field_data, _ = waterdata.get_field_measurements(\n",
     "    monitoring_location_id=[\n",
     "        \"USGS-451605097071701\",\n",
     "        \"USGS-263819081585801\",\n",
     "    ],\n",
     "    time=[\"2023-10-01\", \"2024-09-30\"],\n",
     ")\n",
-    "field_modern[[\"time\", \"monitoring_location_id\", \"parameter_code\", \"value\"]].head()"
+    "field_data[[\"time\", \"monitoring_location_id\", \"parameter_code\", \"value\"]].head()"
    ]
   },
   {
@@ -404,7 +403,7 @@
    "outputs": [],
    "source": [
     "fig, ax = plt.subplots(figsize=(7, 4))\n",
-    "for site, sub in field_modern.groupby(\"monitoring_location_id\"):\n",
+    "for site, sub in field_data.groupby(\"monitoring_location_id\"):\n",
     "    ax.scatter(sub[\"time\"], sub[\"value\"], s=12, label=site)\n",
     "ax.set_ylabel(\"value\")\n",
     "ax.set_title(\"Field measurements\")\n",
@@ -499,12 +498,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "what_huc_sites, _ = waterdata.get_monitoring_locations(\n",
+    "huc_sites, _ = waterdata.get_monitoring_locations(\n",
     "    filter=\"hydrologic_unit_code LIKE '02070010%'\",\n",
     "    filter_lang=\"cql-text\",\n",
     ")\n",
-    "print(f\"{len(what_huc_sites)} sites in HUC 02070010\")\n",
-    "ax = what_huc_sites.plot(markersize=2, figsize=(7, 5))\n",
+    "print(f\"{len(huc_sites)} sites in HUC 02070010\")\n",
+    "ax = huc_sites.plot(markersize=2, figsize=(7, 5))\n",
     "ax.set_title(\"Sites within HUC 02070010\")\n",
     "plt.show()"
    ]
@@ -620,16 +619,16 @@
    "outputs": [],
    "source": [
     "site = \"USGS-02238500\"\n",
-    "site_1, _ = waterdata.get_monitoring_locations(\n",
+    "renamed, _ = waterdata.get_monitoring_locations(\n",
     "    monitoring_location_id=site,\n",
     "    properties=[\"monitoring_location_id\", \"state_name\", \"country_name\"],\n",
     ")\n",
-    "site_2, _ = waterdata.get_monitoring_locations(\n",
+    "raw_id, _ = waterdata.get_monitoring_locations(\n",
     "    monitoring_location_id=site,\n",
     "    properties=[\"id\", \"state_name\", \"country_name\"],\n",
     ")\n",
-    "print(\"renamed:\", [c for c in site_1.columns if c != \"geometry\"])\n",
-    "print(\"raw id :\", [c for c in site_2.columns if c != \"geometry\"])"
+    "print(\"renamed:\", [c for c in renamed.columns if c != \"geometry\"])\n",
+    "print(\"raw id :\", [c for c in raw_id.columns if c != \"geometry\"])"
    ]
   },
   {
@@ -647,12 +646,21 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
   }
  },
  "nbformat": 4,