Skip to content

Commit 5dc0146

Browse files
thodson-usgsclaude
andcommitted
Add waterdata.get_samples_summary for per-location sample inventory
Wraps the Samples database /summary/{monitoringLocationIdentifier} endpoint, mirroring the R package's summarize_waterdata_samples. Returns per-characteristic result and activity counts plus first / most recent activity dates for a single monitoring location — useful for taking inventory of what discrete-sample data exists at a site before pulling observations with get_samples. The Samples summary endpoint accepts only a single monitoring location per request, so the function takes a string (not a list). Closes #261. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent dd70cfa commit 5dc0146

5 files changed

Lines changed: 96 additions & 0 deletions

File tree

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**05/05/2026:** Added `waterdata.get_samples_summary(monitoringLocationIdentifier=...)` — wraps the Samples database `/summary/{id}` endpoint, returning per-characteristic result and activity counts plus first / most recent activity dates for a single monitoring location. Useful for taking inventory of available discrete-sample data before pulling observations with `get_samples`.
2+
13
**05/01/2026:** The `nadp` module is now deprecated. Calling any of `get_annual_MDN_map`, `get_annual_NTN_map`, or `get_zip` will emit a `DeprecationWarning`. The module is scheduled for removal on or after **2026-11-01**. NADP is not a USGS data source; users should retrieve NADP data directly from https://nadp.slh.wisc.edu/.
24

35
**04/23/2026:** Added `waterdata.get_nearest_continuous(targets, ...)` — for each of N target timestamps, fetches the single continuous observation closest to that timestamp in one HTTP round-trip (auto-chunked when the resulting CQL filter is long, via the facility added in #238). The helper is designed for workflows that pair many discrete-measurement timestamps with surrounding instantaneous data, which the OGC `time` parameter can't express since it only accepts one instant or one interval per request. Ties at window midpoints are resolved per a configurable `on_tie` ∈ {`"first"`, `"last"`, `"mean"`}; the default `window="PT7M30S"` matches a 15-minute continuous gauge.

dataretrieval/waterdata/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
get_monitoring_locations,
2222
get_reference_table,
2323
get_samples,
24+
get_samples_summary,
2425
get_stats_date_range,
2526
get_stats_por,
2627
get_time_series_metadata,
@@ -51,6 +52,7 @@
5152
"get_nearest_continuous",
5253
"get_reference_table",
5354
"get_samples",
55+
"get_samples_summary",
5456
"get_stats_date_range",
5557
"get_stats_por",
5658
"get_time_series_metadata",

dataretrieval/waterdata/api.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1800,6 +1800,64 @@ def get_samples(
18001800
return df, BaseMetadata(response)
18011801

18021802

1803+
def get_samples_summary(
1804+
monitoringLocationIdentifier: str,
1805+
ssl_check: bool = True,
1806+
) -> tuple[pd.DataFrame, BaseMetadata]:
1807+
"""Get a summary of samples available at a single monitoring location.
1808+
1809+
Wraps the Samples database ``/summary/{monitoringLocationIdentifier}``
1810+
endpoint, which returns one row per (characteristic group, characteristic,
1811+
user-supplied characteristic) combination with result and activity counts
1812+
and the first / most recent activity dates. This is useful for taking an
1813+
inventory of what discrete-sample data exists at a site before pulling
1814+
the underlying observations with :func:`get_samples`.
1815+
1816+
The Samples summary endpoint only accepts a single monitoring location
1817+
per request.
1818+
1819+
See https://api.waterdata.usgs.gov/samples-data/docs#/summaries for the
1820+
full API reference.
1821+
1822+
Parameters
1823+
----------
1824+
monitoringLocationIdentifier : string
1825+
A monitoring location identifier in ``AGENCY-ID`` format, e.g.
1826+
``"USGS-04183500"``.
1827+
ssl_check : bool, optional
1828+
Check the SSL certificate. Default is True.
1829+
1830+
Returns
1831+
-------
1832+
df : ``pandas.DataFrame``
1833+
Formatted data returned from the API query.
1834+
md : :obj:`dataretrieval.utils.Metadata`
1835+
Custom ``dataretrieval`` metadata object pertaining to the query.
1836+
1837+
Examples
1838+
--------
1839+
.. code::
1840+
1841+
>>> # What discrete-sample data is available at this site?
1842+
>>> df, md = dataretrieval.waterdata.get_samples_summary(
1843+
... monitoringLocationIdentifier="USGS-04183500"
1844+
... )
1845+
1846+
"""
1847+
url = f"{SAMPLES_URL}/summary/{monitoringLocationIdentifier}"
1848+
params = {"mimeType": "text/csv"}
1849+
1850+
response = requests.get(
1851+
url, params=params, verify=ssl_check, headers=_default_headers()
1852+
)
1853+
1854+
response.raise_for_status()
1855+
1856+
df = pd.read_csv(StringIO(response.text), delimiter=",")
1857+
1858+
return df, BaseMetadata(response)
1859+
1860+
18031861
def get_stats_por(
18041862
approval_status: str | None = None,
18051863
computation_type: str | list[str] | None = None,

tests/data/samples_summary.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
monitoringLocationIdentifier,characteristicGroup,characteristic,characteristicUserSupplied,resultCount,activityCount,firstActivity,mostRecentActivity
2+
USGS-04183500,Information,Bottle or bag sampler material (construction),Bottle or bag sampler material (construction),893,893,2017-01-02,2026-04-28
3+
USGS-04183500,Information,NWIS lot number,"NWIS lot number, sulfuric acid, 4.5 normal (1:7), 1 milliliter, National Field Supply Service (NFSS) stock number Q438FLD",893,893,2017-01-02,2026-04-28
4+
USGS-04183500,Information,NWIS lot number,"NWIS lot number, vacuum tube, 10.5 milliliters, FCCVT (filtered, chilled, vacuum tube)",877,877,2017-01-02,2026-04-28
5+
USGS-04183500,Information,Number of sampling points,Number of sampling points,136,136,2013-10-23,2026-04-28
6+
USGS-04183500,Information,Sampler nozzle diameter,Sampler nozzle diameter,97,97,2017-01-24,2026-04-28

tests/waterdata_test.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
get_monitoring_locations,
1818
get_reference_table,
1919
get_samples,
20+
get_samples_summary,
2021
get_stats_date_range,
2122
get_stats_por,
2223
get_time_series_metadata,
@@ -57,6 +58,33 @@ def test_mock_get_samples(requests_mock):
5758
assert md.comment is None
5859

5960

61+
def test_mock_get_samples_summary(requests_mock):
62+
"""Tests USGS Samples summary query"""
63+
request_url = (
64+
"https://api.waterdata.usgs.gov/samples-data/summary/USGS-04183500"
65+
"?mimeType=text%2Fcsv"
66+
)
67+
response_file_path = "tests/data/samples_summary.txt"
68+
mock_request(requests_mock, request_url, response_file_path)
69+
df, md = get_samples_summary(monitoringLocationIdentifier="USGS-04183500")
70+
assert type(df) is DataFrame
71+
assert list(df.columns) == [
72+
"monitoringLocationIdentifier",
73+
"characteristicGroup",
74+
"characteristic",
75+
"characteristicUserSupplied",
76+
"resultCount",
77+
"activityCount",
78+
"firstActivity",
79+
"mostRecentActivity",
80+
]
81+
assert (df["monitoringLocationIdentifier"] == "USGS-04183500").all()
82+
assert md.url == request_url
83+
assert isinstance(md.query_time, datetime.timedelta)
84+
assert md.header == {"mock_header": "value"}
85+
assert md.comment is None
86+
87+
6088
def test_check_profiles():
6189
"""Tests that correct errors are raised for invalid profiles."""
6290
with pytest.raises(ValueError):

0 commit comments

Comments
 (0)