Skip to content

Commit 2bf5962

Browse files
thodson-usgsclaude
andcommitted
docs(demos): fix notebook narrative and code inconsistencies
Every docs notebook now executes end-to-end against the live USGS Water Data API; each fix below was verified against live output before editing. - DiscreteSamples: `get_codes` returns a `(df, md)` tuple, but five cells indexed the tuple with a column list (raising TypeError) and the prose claimed it "returns a plain DataFrame". Unpack the tuple and correct the claim. (This notebook previously failed to execute.) - SiteInfo: `state_code="UT"` returned an empty frame under an "all locations in a state" heading; `state_code` is a two-digit ANSI code, so use "49" (Utah). - UnitValues: two notes claimed returned timestamps are "in local time" -- they are tz-aware UTC. Removed a dead duplicate of Example 5. - Samples: "181 fields" -> the default profile returns 187 columns; replaced six references to nonexistent `*_lookup()` helpers with the real `get_codes(code_service=...)`. - GroundwaterLevels: stale comment said partial dates "show up as NaT" in the index -- the index is a plain RangeIndex and dates live in a normalized UTC `time` column; print that instead. Relabel a y-axis that mixed depth-below-surface with NGVD29/NAVD88 elevations. - Introduction: `get_combined_metadata` joins monitoring-location and time-series metadata, not "field-measurement metadata". - R vignette: `nwis.get_water_use()` is defunct (raises NameError); note that instead of presenting it as runnable. - SiteInventory: Example 3 duplicated Example 2 verbatim -- repurpose it as a `skip_geometry=True` demonstration. - peak_streamflow_trends: the live-data migration (ad4e980) removed the CSV-load cell but left narrative cells describing it; rewrite them to describe the live `final_df`, and correct the chunker comment (Rhode Island's 350 gages return in a single request). - Disambiguate the two identical `get_field_measurements()` notebook titles (Surface-Water vs Groundwater-Level). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent c52e9b3 commit 2bf5962

10 files changed

Lines changed: 62 additions & 84 deletions

demos/R Python Vignette equivalents.ipynb

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -307,17 +307,8 @@
307307
" header # the response headers\n",
308308
"```\n",
309309
"\n",
310-
"Note: USGS *water use* data has no Water Data API equivalent yet, so it remains available only through the deprecated `nwis` module:\n",
311-
"\n",
312-
"```\n",
313-
"national, md = nwis.get_water_use()\n",
314-
"```"
310+
"Note: USGS *water use* data has no Water Data API equivalent yet. The legacy `nwis.get_water_use()` service has been decommissioned and now raises a \"defunct\" error, so there is currently no runnable way to retrieve water-use data through `dataretrieval`."
315311
]
316-
},
317-
{
318-
"cell_type": "markdown",
319-
"metadata": {},
320-
"source": []
321312
}
322313
],
323314
"metadata": {

demos/USGS_WaterData_DiscreteSamples_Examples.ipynb

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -331,8 +331,9 @@
331331
"## Additional query parameters\n",
332332
"\n",
333333
"Several parameters narrow the results further. The allowable values for the\n",
334-
"categorical ones come from `get_codes`. Note that `get_codes` returns a plain\n",
335-
"`DataFrame` (no metadata tuple).\n",
334+
"categorical ones come from `get_codes`, which — like the other `waterdata`\n",
335+
"functions — returns a `(DataFrame, metadata)` tuple; we unpack it and keep the\n",
336+
"DataFrame.\n",
336337
"\n",
337338
"### `siteTypeCode` / `siteTypeName`"
338339
]
@@ -344,7 +345,7 @@
344345
"metadata": {},
345346
"outputs": [],
346347
"source": [
347-
"site_type_info = waterdata.get_codes(code_service=\"sitetype\")\n",
348+
"site_type_info, _ = waterdata.get_codes(code_service=\"sitetype\")\n",
348349
"site_type_info[[\"typeCode\", \"typeLongName\"]].head(10)"
349350
]
350351
},
@@ -365,7 +366,8 @@
365366
"metadata": {},
366367
"outputs": [],
367368
"source": [
368-
"waterdata.get_codes(code_service=\"samplemedia\")[\"activityMedia\"].tolist()"
369+
"media, _ = waterdata.get_codes(code_service=\"samplemedia\")\n",
370+
"media[\"activityMedia\"].tolist()"
369371
]
370372
},
371373
{
@@ -386,7 +388,8 @@
386388
"metadata": {},
387389
"outputs": [],
388390
"source": [
389-
"waterdata.get_codes(code_service=\"characteristicgroup\")[\"characteristicGroup\"].tolist()"
391+
"char_groups, _ = waterdata.get_codes(code_service=\"characteristicgroup\")\n",
392+
"char_groups[\"characteristicGroup\"].tolist()"
390393
]
391394
},
392395
{
@@ -407,7 +410,7 @@
407410
"metadata": {},
408411
"outputs": [],
409412
"source": [
410-
"characteristic_info = waterdata.get_codes(code_service=\"characteristics\")\n",
413+
"characteristic_info, _ = waterdata.get_codes(code_service=\"characteristics\")\n",
411414
"print(\"unique characteristic names:\")\n",
412415
"print(characteristic_info[\"characteristicName\"].drop_duplicates().head().tolist())\n",
413416
"print(\"\\nexample USGS parameter codes:\")\n",
@@ -432,7 +435,8 @@
432435
"metadata": {},
433436
"outputs": [],
434437
"source": [
435-
"waterdata.get_codes(code_service=\"observedproperty\")[\"observedProperty\"].head().tolist()"
438+
"observed, _ = waterdata.get_codes(code_service=\"observedproperty\")\n",
439+
"observed[\"observedProperty\"].head().tolist()"
436440
]
437441
},
438442
{

demos/USGS_WaterData_Introduction_Examples.ipynb

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -237,9 +237,11 @@
237237
"source": [
238238
"### Time series & combined metadata\n",
239239
"\n",
240-
"`get_combined_metadata` merges time-series metadata\n",
241-
"(`get_time_series_metadata`) and field-measurement metadata by site, telling you\n",
242-
"which time series a site offers and the span of each."
240+
"`get_combined_metadata` joins the monitoring-location catalog\n",
241+
"(`get_monitoring_locations`) with the time-series metadata\n",
242+
"(`get_time_series_metadata`), returning one row per available time series with\n",
243+
"both the site attributes and the series' period of record — a convenient \"what\n",
244+
"data is available\" view."
243245
]
244246
},
245247
{

demos/hydroshare/USGS_WaterData_GroundwaterLevels_Examples.ipynb

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# USGS dataretrieval Python Package `get_field_measurements()` Examples\n",
7+
"# USGS dataretrieval Python Package Groundwater-Level `get_field_measurements()` Examples\n",
88
"\n",
99
"This notebook provides examples of using the Python dataretrieval package to retrieve groundwater level field measurements for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data."
1010
]
@@ -152,9 +152,11 @@
152152
"metadata": {},
153153
"outputs": [],
154154
"source": [
155+
"# This site reports several quantities (depth below land surface as well as\n",
156+
"# water-surface elevations above NGVD29/NAVD88), so use a datum-neutral label.\n",
155157
"ax = data[0][[\"time\", \"value\"]].plot(x=\"time\", y=\"value\", style=\".\")\n",
156158
"ax.set_xlabel(\"Date\")\n",
157-
"ax.set_ylabel(\"Water Level (feet below land surface)\")"
159+
"ax.set_ylabel(\"Water level (ft)\")"
158160
]
159161
},
160162
{
@@ -233,9 +235,11 @@
233235
"data3 = waterdata.get_field_measurements(monitoring_location_id=\"USGS-425957088141001\")\n",
234236
"print(\"Retrieved \" + str(len(data3[0])) + \" data values.\")\n",
235237
"\n",
236-
"# Print the date/time index values, which show up as NaT because\n",
237-
"# the dates can't be converted to a date/time data type\n",
238-
"print(data3[0].index)"
238+
"# Observation dates live in the 'time' column (the data frame uses a plain\n",
239+
"# integer index). Where the original record gave only a year or a year and\n",
240+
"# month, the Water Data API normalizes the value to a UTC timestamp with the\n",
241+
"# missing day/time defaulted, so these appear as ordinary timestamps.\n",
242+
"print(data3[0][\"time\"].head(10))"
239243
]
240244
},
241245
{

demos/hydroshare/USGS_WaterData_Measurements_Examples.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# USGS dataretrieval Python Package `get_field_measurements()` Examples\n",
7+
"# USGS dataretrieval Python Package Surface-Water `get_field_measurements()` Examples\n",
88
"\n",
99
"This notebook provides examples of using the Python dataretrieval package to retrieve surface water field measurement data for a United States Geological Survey (USGS) monitoring location. The dataretrieval package provides a collection of functions to get data from the USGS Water Data API and other online sources of hydrology and water quality data."
1010
]

demos/hydroshare/USGS_WaterData_Samples_Examples.ipynb

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
"source": [
5252
"### Basic Usage\n",
5353
"\n",
54-
"The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_samples()` function to retrieve water quality sample data for USGS monitoring locations from Samples. The following arguments are supported:\n",
54+
"The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the `get_samples()` function to retrieve water quality sample data for USGS monitoring locations from Samples. The allowable values for the categorical arguments below come from `waterdata.get_codes()` (see the *Discrete water-quality samples* notebook). The following arguments are supported:\n",
5555
"\n",
5656
"* **ssl_check** : boolean, optional\n",
5757
" Check the SSL certificate.\n",
@@ -72,8 +72,7 @@
7272
" organizations - \"organization\", \"count\"\n",
7373
"* **activityMediaName** : string or list of strings, optional\n",
7474
" Name or code indicating environmental medium in which sample was taken.\n",
75-
" Check the `activityMediaName_lookup()` function in this module for all\n",
76-
" possible inputs.\n",
75+
" Use `get_codes(code_service=\"samplemedia\")` for all possible inputs.\n",
7776
" Example: \"Water\".\n",
7877
"* **activityStartDateLower** : string, optional\n",
7978
" The start date if using a date range. Takes the format YYYY-MM-DD.\n",
@@ -90,16 +89,16 @@
9089
" Example: \"Sample-Routine, regular\".\n",
9190
"* **characteristicGroup** : string or list of strings, optional\n",
9291
" Characteristic group is a broad category of characteristics\n",
93-
" describing one or more results. Check the `characteristicGroup_lookup()`\n",
94-
" function in this module for all possible inputs.\n",
92+
" describing one or more results. Use\n",
93+
" `get_codes(code_service=\"characteristicgroup\")` for all possible inputs.\n",
9594
" Example: \"Organics, PFAS\"\n",
9695
"* **characteristic** : string or list of strings, optional\n",
9796
" Characteristic is a specific category describing one or more results.\n",
98-
" Check the `characteristic_lookup()` function in this module for all\n",
99-
" possible inputs.\n",
97+
" Use `get_codes(code_service=\"characteristics\")` for all possible inputs.\n",
10098
" Example: \"Suspended Sediment Discharge\"\n",
10199
"* **characteristicUserSupplied** : string or list of strings, optional\n",
102100
" A user supplied characteristic name describing one or more results.\n",
101+
" Use `get_codes(code_service=\"observedproperty\")` for all possible inputs.\n",
103102
"* **boundingBox**: list of four floats, optional\n",
104103
" Filters on the the associated monitoring location's point location\n",
105104
" by checking if it is located within the specified geographic area. \n",
@@ -116,27 +115,22 @@
116115
"* **countryFips** : string or list of strings, optional\n",
117116
" Example: \"US\" (United States)\n",
118117
"* **stateFips** : string or list of strings, optional\n",
119-
" Check the `stateFips_lookup()` function in this module for all\n",
120-
" possible inputs.\n",
121118
" Example: \"US:15\" (United States: Hawaii)\n",
122119
"* **countyFips** : string or list of strings, optional\n",
123-
" Check the `countyFips_lookup()` function in this module for all\n",
124-
" possible inputs.\n",
125120
" Example: \"US:15:001\" (United States: Hawaii, Hawaii County)\n",
126121
"* **siteTypeCode** : string or list of strings, optional\n",
127-
" An abbreviation for a certain site type. Check the `siteType_lookup()`\n",
128-
" function in this module for all possible inputs.\n",
122+
" An abbreviation for a certain site type. Use\n",
123+
" `get_codes(code_service=\"sitetype\")` for all possible inputs.\n",
129124
" Example: \"GW\" (Groundwater site)\n",
130125
"* **siteTypeName** : string or list of strings, optional\n",
131-
" A full name for a certain site type. Check the `siteType_lookup()`\n",
132-
" function in this module for all possible inputs.\n",
126+
" A full name for a certain site type. Use\n",
127+
" `get_codes(code_service=\"sitetype\")` for all possible inputs.\n",
133128
" Example: \"Well\"\n",
134129
"* **usgsPCode** : string or list of strings, optional\n",
135130
" 5-digit number used in the US Geological Survey computerized\n",
136131
" data system, National Water Information System (NWIS), to\n",
137-
" uniquely identify a specific constituent. Check the \n",
138-
" `characteristic_lookup()` function in this module for all possible\n",
139-
" inputs.\n",
132+
" uniquely identify a specific constituent. Use\n",
133+
" `get_codes(code_service=\"characteristics\")` for all possible inputs.\n",
140134
" Example: \"00060\" (Discharge, cubic feet per second)\n",
141135
"* **hydrologicUnit** : string or list of strings, optional\n",
142136
" Max 12-digit number used to describe a hydrologic unit.\n",
@@ -300,7 +294,7 @@
300294
"source": [
301295
"#### Example 4: Retrieve water quality sample data for one site and convert to a wide format\n",
302296
"\n",
303-
"Note that the USGS Samples database returns multiple parameters in a \"long\" format: each row in the resulting table represents a single observation of a single parameter. Furthermore, every observation has 181 fields of metadata. However, if you wanted to place your water quality data into a \"wide\" format, where each column represents a water quality parameter code, the code below details one solution."
297+
"Note that the USGS Samples database returns multiple parameters in a \"long\" format: each row in the resulting table represents a single observation of a single parameter. Furthermore, every observation comes with more than 180 fields of metadata (the default `fullphyschem` profile returns 187 columns). However, if you wanted to place your water quality data into a \"wide\" format, where each column represents a water quality parameter code, the code below details one solution."
304298
]
305299
},
306300
{

demos/hydroshare/USGS_WaterData_SiteInfo_Examples.ipynb

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -172,8 +172,9 @@
172172
"metadata": {},
173173
"outputs": [],
174174
"source": [
175-
"# Get the site information for a state\n",
176-
"siteINFO_state = waterdata.get_monitoring_locations(state_code=\"UT\")\n",
175+
"# Get the site information for a state. state_code is a two-digit ANSI code;\n",
176+
"# 49 is Utah. (The postal abbreviation \"UT\" returns no results.)\n",
177+
"siteINFO_state = waterdata.get_monitoring_locations(state_code=\"49\")\n",
177178
"display(siteINFO_state[0])"
178179
]
179180
},

demos/hydroshare/USGS_WaterData_SiteInventory_Examples.ipynb

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,9 @@
148148
"cell_type": "markdown",
149149
"metadata": {},
150150
"source": [
151-
"#### Example 3: Retrieve information for a single monitoring location"
151+
"#### Example 3: Retrieve a single monitoring location without geometry\n",
152+
"\n",
153+
"Pass `skip_geometry=True` to get a plain `pandas.DataFrame` (no `geometry` column) instead of a `geopandas.GeoDataFrame`."
152154
]
153155
},
154156
{
@@ -157,8 +159,11 @@
157159
"metadata": {},
158160
"outputs": [],
159161
"source": [
160-
"oneSite = waterdata.get_monitoring_locations(monitoring_location_id=\"USGS-05114000\")\n",
161-
"display(oneSite[0])"
162+
"oneSite_nogeom = waterdata.get_monitoring_locations(\n",
163+
" monitoring_location_id=\"USGS-05114000\", skip_geometry=True\n",
164+
")\n",
165+
"print(\"geometry column present:\", \"geometry\" in oneSite_nogeom[0].columns)\n",
166+
"display(oneSite_nogeom[0])"
162167
]
163168
},
164169
{

demos/hydroshare/USGS_WaterData_UnitValues_Examples.ipynb

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@
181181
"\n",
182182
"#### Example 2: Get unit values for an individual monitoring location and parameter between a start and end date.\n",
183183
"\n",
184-
"NOTE: By default, start and end date are evaluated as local time, and the result is returned with the timestamps in the local time of the monitoring location."
184+
"NOTE: By default, the start and end dates are interpreted in the monitoring location's local time. Regardless of the input time zone, the returned `time` column is tz-aware UTC (see Example 4 for supplying UTC input explicitly)."
185185
]
186186
},
187187
{
@@ -228,7 +228,7 @@
228228
"source": [
229229
"#### Example 4: Retrieve data using UTC times\n",
230230
"\n",
231-
"NOTE: Adding 'Z' to the input time parameters indicates that they are in UTC rather than local time. The time stamps associated with the data returned are still in the local time of the USGS monitoring location."
231+
"NOTE: Adding 'Z' to the input time parameters indicates that they are in UTC rather than local time. The returned timestamps are tz-aware UTC in either case."
232232
]
233233
},
234234
{
@@ -267,29 +267,6 @@
267267
"print(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\n",
268268
"display(discharge_multisite[0])"
269269
]
270-
},
271-
{
272-
"cell_type": "markdown",
273-
"metadata": {},
274-
"source": [
275-
"The following example requests the same two-location data as the previous example."
276-
]
277-
},
278-
{
279-
"cell_type": "code",
280-
"execution_count": null,
281-
"metadata": {},
282-
"outputs": [],
283-
"source": [
284-
"discharge_multisite = waterdata.get_continuous(\n",
285-
" monitoring_location_id=[\"USGS-04024430\", \"USGS-04024000\"],\n",
286-
" parameter_code=parameterCode,\n",
287-
" time=\"2013-10-01/2013-10-01\",\n",
288-
" \n",
289-
")\n",
290-
"print(\"Retrieved \" + str(len(discharge_multisite[0])) + \" data values.\")\n",
291-
"display(discharge_multisite[0])"
292-
]
293270
}
294271
],
295272
"metadata": {

demos/peak_streamflow_trends.ipynb

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"## Introduction\n",
1010
"\n",
11-
"This notebook demonstrates a slightly more advanced application of the `dataretrieval.waterdata` module: assembling a dataset of historical annual peak streamflow and regressing peak discharge against time to look for trends \u2014 not at a single station, but across many."
11+
"This notebook demonstrates a slightly more advanced application of the `dataretrieval.waterdata` module: assembling a dataset of historical annual peak streamflow and regressing peak discharge against time to look for trends not at a single station, but across many."
1212
]
1313
},
1414
{
@@ -151,7 +151,7 @@
151151
"cell_type": "markdown",
152152
"metadata": {},
153153
"source": [
154-
"To run the analysis for all states since 1970, one would only need to uncomment and run the following lines. However, pulling all that data from the Water Data API takes time and could put a burden on resources."
154+
"Running the analysis for every state since 1970 would pull a large amount of data from the Water Data API and could burden the service. To keep this demo light, we run a single small state — Rhode Island — below."
155155
]
156156
},
157157
{
@@ -160,10 +160,10 @@
160160
"metadata": {},
161161
"outputs": [],
162162
"source": [
163-
"# Download peak discharge for every stream gage in Rhode Island and run\n",
164-
"# the trend regression on each. The async chunker (default concurrency 16)\n",
165-
"# fans the ``get_peaks`` call across all sites in a single pool; the full\n",
166-
"# run completes in roughly two seconds.\n",
163+
"# Download peak discharge for every stream gage in Rhode Island and run the\n",
164+
"# trend regression on each. For larger states the async chunker (default\n",
165+
"# concurrency 16) automatically fans the ``get_peaks`` call across many sites;\n",
166+
"# Rhode Island is small enough to return in a single request.\n",
167167
"start = \"1970-01-01\"\n",
168168
"states = [\"Rhode Island\"]\n",
169169
"final_df = peak_trend_analysis(state_names=states, start_date=start)\n",
@@ -174,22 +174,22 @@
174174
"cell_type": "markdown",
175175
"metadata": {},
176176
"source": [
177-
"Instead, let's quickly load some pre-generated results bundled with this notebook. (This example dataset was produced by an earlier run of the analysis and retains the column names from that run.)"
177+
"The cell above ran the full analysis for a single small state (Rhode Island) and returned `final_df` — one row per gage whose peak-discharge trend is statistically significant (`p_value < 0.05`), carrying the regression slope, intercept, p-value, and standard error."
178178
]
179179
},
180180
{
181181
"cell_type": "markdown",
182182
"metadata": {},
183183
"source": [
184-
"Notice how the data has been transformed. In addition to statistics about the peak streamflow trends, the analysis joined monitoring-location metadata to add latitude and longitude for each station."
184+
"`final_df` pairs each station's trend statistics with its monitoring-location metadata (`monitoring_location_id`, `state_name`, `site_type_code`, …). Because `peak_trend_analysis` requests the locations with `skip_geometry=True`, `final_df` carries no coordinate columns; drop that argument if you want the `geometry` for mapping."
185185
]
186186
},
187187
{
188188
"cell_type": "markdown",
189189
"metadata": {},
190190
"source": [
191191
"## Plotting the results\n",
192-
"Finally we'll use `basemap` and `matplotlib`, along with the location information from the Water Data API, to plot the results on a map (shown below). Monitoring locations with increasing peak annual discharge are shown in red, and those with decreasing peaks in blue."
192+
"The commented cell below sketches how one might map the results with `basemap` and `matplotlib`, coloring monitoring locations with increasing peak annual discharge in red and decreasing peaks in blue. It is left commented because `basemap` is awkward to install on a remote machine. Note that it refers to a pre-generated national result set using the legacy NWIS coordinate columns (`dec_lat_va` / `dec_long_va`) rather than the Rhode Island `final_df` computed above; to map `final_df` directly, rerun `peak_trend_analysis` without `skip_geometry=True` and use the resulting `geometry` column."
193193
]
194194
},
195195
{

0 commit comments

Comments
 (0)