Skip to content

Commit fca3d6c

Browse files
thodson-usgsclaude
andauthored
Add waterdata.get_peaks for the annual peak-streamflow OGC collection (#267)
Wraps the new /ogcapi/v0/collections/peaks collection. Returns the annual peak record for a monitoring location — one row per (location, parameter, water year) — which is the standard input to flood- frequency analysis (log-Pearson Type III etc). The collection covers stage (parameter 00065) and discharge (00060); typical streamgages have a series for each. Implementation reuses the existing get_ogc_data infrastructure: - service = "peaks" - output_id = "peak_id" (the API's `id` field is renamed for users, matching the project's other get_* functions) R has no equivalent yet; the docstring was written from scratch following the project's existing get_* style. Two live tests cover the happy path (single-site, both parameters present) and a water-year filter. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ba11d6f commit fca3d6c

4 files changed

Lines changed: 147 additions & 0 deletions

File tree

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
**05/06/2026:** Added `waterdata.get_field_measurements_metadata(...)` — wraps the OGC `field-measurements-metadata` collection. Returns one row per (location, parameter) field-measurement series describing its period of record, units, etc., without the underlying observations. Discrete-measurement analogue to `get_time_series_metadata`. Mirrors R's `read_waterdata_field_meta`.
44

5+
**05/06/2026:** Added `waterdata.get_peaks(...)` — wraps the new OGC `peaks` collection, returning the annual peak streamflow / stage record for a monitoring location (one row per water year, per parameter). Standard input to flood-frequency analysis. Supports calendar/water-year filters and the usual location/parameter/CQL knobs shared with the other OGC getters.
6+
57
**05/05/2026:** Added `waterdata.get_combined_metadata(...)` — wraps the Water Data API's `combined-metadata` collection, which joins the monitoring-locations catalog with the time-series-metadata catalog and returns one row per (location, parameter, statistic) inventory entry. This is the most flexible "what data is available" endpoint in the API: any location attribute (state, HUC, site type, drainage area, well-construction depth, …) can be combined with any time-series attribute (parameter code, statistic, data type, period of record, …) in a single query. Mirrors R's `read_waterdata_combined_meta`.
68

79
**05/05/2026:** Added `waterdata.get_samples_summary(monitoringLocationIdentifier=...)` — wraps the Samples database `/summary/{id}` endpoint, returning per-characteristic result and activity counts plus first / most recent activity dates for a single monitoring location. Useful for taking inventory of available discrete-sample data before pulling observations with `get_samples`.

dataretrieval/waterdata/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
get_latest_continuous,
2222
get_latest_daily,
2323
get_monitoring_locations,
24+
get_peaks,
2425
get_reference_table,
2526
get_samples,
2627
get_samples_summary,
@@ -55,6 +56,7 @@
5556
"get_latest_daily",
5657
"get_monitoring_locations",
5758
"get_nearest_continuous",
59+
"get_peaks",
5860
"get_ratings",
5961
"get_reference_table",
6062
"get_samples",

dataretrieval/waterdata/api.py

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1878,6 +1878,126 @@ def get_field_measurements_metadata(
18781878
return get_ogc_data(args, output_id, service)
18791879

18801880

1881+
def get_peaks(
1882+
monitoring_location_id: str | list[str] | None = None,
1883+
parameter_code: str | list[str] | None = None,
1884+
time_series_id: str | list[str] | None = None,
1885+
unit_of_measure: str | list[str] | None = None,
1886+
time: str | list[str] | None = None,
1887+
last_modified: str | list[str] | None = None,
1888+
water_year: int | list[int] | None = None,
1889+
year: int | list[int] | None = None,
1890+
month: int | list[int] | None = None,
1891+
day: int | list[int] | None = None,
1892+
peak_since: int | list[int] | None = None,
1893+
properties: str | list[str] | None = None,
1894+
skip_geometry: bool | None = None,
1895+
bbox: list[float] | None = None,
1896+
limit: int | None = None,
1897+
filter: str | None = None,
1898+
filter_lang: FILTER_LANG | None = None,
1899+
convert_type: bool = True,
1900+
) -> tuple[pd.DataFrame, BaseMetadata]:
1901+
"""Get the annual peak streamflow / stage record for a monitoring location.
1902+
1903+
Peaks are the largest values observed at a site each water year and are
1904+
the standard input to flood-frequency analysis (e.g. log-Pearson Type III
1905+
fits). The endpoint returns one row per (monitoring location, parameter,
1906+
water year), with the peak ``value`` and the ``time`` it occurred.
1907+
1908+
The collection covers both stage (parameter ``"00065"``, ``ft``) and
1909+
discharge (parameter ``"00060"``, ``ft^3/s``); a typical streamgage has a
1910+
series for each. Reference docs:
1911+
https://api.waterdata.usgs.gov/ogcapi/v0/openapi?f=html#/peaks
1912+
1913+
Parameters
1914+
----------
1915+
monitoring_location_id : string or list of strings, optional
1916+
A unique identifier representing a single monitoring location, in
1917+
``AGENCY-ID`` form (e.g. ``"USGS-02238500"``).
1918+
parameter_code : string or list of strings, optional
1919+
5-digit parameter code. Most peaks records are ``"00060"`` (discharge)
1920+
or ``"00065"`` (stage / gage height). Full list at
1921+
https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
1922+
time_series_id : string or list of strings, optional
1923+
ID of the time series the peak belongs to.
1924+
unit_of_measure : string or list of strings, optional
1925+
Human-readable units (e.g. ``"ft^3/s"``, ``"ft"``).
1926+
time : string, optional
1927+
Datetime, interval, or duration filter on the peak's date.
1928+
See :func:`get_time_series_metadata` for the full grammar.
1929+
last_modified : string, optional
1930+
Same datetime grammar as ``time``; filters on the database
1931+
last-modified timestamp (useful for incremental ETL polling).
1932+
water_year, year, month, day : int or list of ints, optional
1933+
Calendar / water-year filters on the peak event. The water year ends
1934+
September 30 (e.g. WY2024 = Oct 1, 2023 – Sep 30, 2024).
1935+
peak_since : int or list of ints, optional
1936+
Filter on the year since which the peak value has stood as the
1937+
record (the API serves this field as an integer; many rows are
1938+
``null``).
1939+
properties : string or list of strings, optional
1940+
Subset of columns to return. Defaults to every available property.
1941+
skip_geometry : boolean, optional
1942+
Skip per-feature geometries; the returned object will be a plain
1943+
``DataFrame`` with no spatial information.
1944+
bbox : list of numbers, optional
1945+
Only features whose geometry intersects the bounding box are
1946+
selected. Format: ``[xmin, ymin, xmax, ymax]`` in CRS 4326
1947+
(longitude / latitude, west-south-east-north).
1948+
limit : numeric, optional
1949+
Page size; the maximum allowable value is 50000. Default
1950+
(``None``) requests the maximum allowable limit.
1951+
filter, filter_lang : optional
1952+
Server-side CQL filter passed through as the OGC ``filter`` /
1953+
``filter-lang`` query parameters. See
1954+
:mod:`dataretrieval.waterdata.filters` for syntax, auto-chunking,
1955+
and the lexicographic-comparison pitfall.
1956+
convert_type : boolean, optional
1957+
If True, converts columns to appropriate types.
1958+
1959+
Returns
1960+
-------
1961+
df : ``pandas.DataFrame`` or ``geopandas.GeoDataFrame``
1962+
Formatted data returned from the API query.
1963+
md : :obj:`dataretrieval.utils.Metadata`
1964+
A custom metadata object pertaining to the query.
1965+
1966+
Examples
1967+
--------
1968+
.. code::
1969+
1970+
>>> # Full annual peak record at one site (both stage and discharge)
1971+
>>> df, md = dataretrieval.waterdata.get_peaks(
1972+
... monitoring_location_id="USGS-02238500"
1973+
... )
1974+
1975+
>>> # Discharge peaks only
1976+
>>> df, md = dataretrieval.waterdata.get_peaks(
1977+
... monitoring_location_id="USGS-02238500",
1978+
... parameter_code="00060",
1979+
... )
1980+
1981+
>>> # Multi-site peaks for a parameter, narrowed to a water-year range
1982+
>>> df, md = dataretrieval.waterdata.get_peaks(
1983+
... monitoring_location_id=[
1984+
... "USGS-07069000",
1985+
... "USGS-07064000",
1986+
... "USGS-07068000",
1987+
... ],
1988+
... parameter_code="00060",
1989+
... water_year=[2020, 2021, 2022, 2023],
1990+
... )
1991+
1992+
"""
1993+
service = "peaks"
1994+
output_id = "peak_id"
1995+
1996+
args = _get_args(locals())
1997+
1998+
return get_ogc_data(args, output_id, service)
1999+
2000+
18812001
def get_reference_table(
18822002
collection: str,
18832003
limit: int | None = None,

tests/waterdata_test.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
get_latest_continuous,
1818
get_latest_daily,
1919
get_monitoring_locations,
20+
get_peaks,
2021
get_reference_table,
2122
get_samples,
2223
get_samples_summary,
@@ -399,6 +400,28 @@ def test_get_field_measurements_metadata_multi_site():
399400
}
400401

401402

403+
def test_get_peaks():
404+
df, md = get_peaks(monitoring_location_id="USGS-02238500", skip_geometry=True)
405+
assert "peak_id" in df.columns
406+
assert "value" in df.columns
407+
assert "water_year" in df.columns
408+
assert (df["monitoring_location_id"] == "USGS-02238500").all()
409+
assert set(df["parameter_code"].unique()).issubset({"00060", "00065"})
410+
assert hasattr(md, "url")
411+
assert hasattr(md, "query_time")
412+
413+
414+
def test_get_peaks_water_year_filter():
415+
df, _ = get_peaks(
416+
monitoring_location_id="USGS-02238500",
417+
parameter_code="00060",
418+
water_year=[2020, 2021, 2022],
419+
skip_geometry=True,
420+
)
421+
assert (df["parameter_code"] == "00060").all()
422+
assert set(df["water_year"].unique()).issubset({2020, 2021, 2022})
423+
424+
402425
def test_get_reference_table():
403426
df, md = get_reference_table("agency-codes")
404427
assert "agency_code" in df.columns

0 commit comments

Comments
 (0)