Skip to content

Commit 714d3cd

Browse files
thodson-usgsclaude
andcommitted
feat(waterdata): Auto-chunk OGC requests over the URL byte limit
The OGC `waterdata` getters previously failed with HTTP 414 when the request URL exceeded the server's ~8 KB byte limit. A common pattern — pulling a long site list from `get_monitoring_locations` and feeding it into `get_daily` — was the main offender: sites_df, _ = get_monitoring_locations(state_name="Ohio") df, md = get_daily( monitoring_location_id=sites_df["monitoring_location_id"].tolist(), parameter_code="00060", time="P7D", ) Introduces a joint chunker that models every multi-value list parameter and the cql-text `filter` (split on top-level `OR`) as a chunkable axis. Greedy halving splits the biggest chunk across all axes until each sub-request URL fits; the chunker fans out under the hood and returns one combined DataFrame. Callers see no API change. Mid-stream 429 / 5xx surface as `ChunkInterrupted` subclasses (`QuotaExhausted` / `ServiceInterrupted`) carrying the partial result plus a `.call` resumable handle — `exc.call.resume()` continues only the still-pending sub-requests. Pre-emptive `RequestExceedsQuota` catches plans that won't fit the remaining rate-limit window; `API_USGS_LIMIT=0` bypasses the check. Behavior changes for paginated / chunked calls: - `BaseMetadata.url` still reflects the user's original query. - `BaseMetadata.header` now carries the LAST page's headers so `x-ratelimit-remaining` is current (was: first page's). - `BaseMetadata.query_time` is now cumulative wall-clock across pages (was: first page's elapsed). Mirrors R `dataRetrieval`'s [#870](DOI-USGS/dataRetrieval#870), generalized from one filter axis to N joint axes. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 4a65fb1 commit 714d3cd

9 files changed

Lines changed: 3374 additions & 676 deletions

File tree

NEWS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
2+
13
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
24

35
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
@@ -36,4 +38,4 @@
3638

3739
**03/01/2024:** USGS data availability and format have changed on Water Quality Portal (WQP). Since March 2024, data obtained from WQP legacy profiles will not include new USGS data or recent updates to existing data. All USGS data (up to and beyond March 2024) are available using the new WQP beta services. You can access the beta services by setting `legacy=False` in the functions in the `wqp` module.
3840

39-
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html
41+
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html

dataretrieval/waterdata/api.py

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def get_daily(
113113
data are released on the condition that neither the USGS nor the United
114114
States Government may be held liable for any damages resulting from its
115115
use. This field reflects the approval status of each record, and is either
116-
"Approved", meaining processing review has been completed and the data is
116+
"Approved", meaning processing review has been completed and the data is
117117
approved for publication, or "Provisional" and subject to revision. For
118118
more information about provisional data, go to:
119119
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -230,6 +230,21 @@ def get_daily(
230230
... parameter_code="00060",
231231
... last_modified="P7D",
232232
... )
233+
234+
>>> # Chain queries: pull all stream sites in a state, then their
235+
>>> # daily discharge for the last week. The site list can be hundreds
236+
>>> # of values long — the request is transparently chunked across
237+
>>> # multiple sub-requests so the URL stays under the server's byte
238+
>>> # limit. Combined output looks like a single query.
239+
>>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
240+
... state_name="Ohio",
241+
... site_type="Stream",
242+
... )
243+
>>> df, md = dataretrieval.waterdata.get_daily(
244+
... monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
245+
... parameter_code="00060",
246+
... time="P7D",
247+
... )
233248
"""
234249
service = "daily"
235250
output_id = "daily_id"
@@ -259,7 +274,7 @@ def get_continuous(
259274
convert_type: bool = True,
260275
) -> tuple[pd.DataFrame, BaseMetadata]:
261276
"""
262-
Continuous data provide instantanous water conditions.
277+
Continuous data provide instantaneous water conditions.
263278
264279
This is an early version of the continuous endpoint that is feature-complete
265280
and is being made available for limited use. Geometries are not included
@@ -320,7 +335,7 @@ def get_continuous(
320335
data are released on the condition that neither the USGS nor the United
321336
States Government may be held liable for any damages resulting from its
322337
use. This field reflects the approval status of each record, and is either
323-
"Approved", meaining processing review has been completed and the data is
338+
"Approved", meaning processing review has been completed and the data is
324339
approved for publication, or "Provisional" and subject to revision. For
325340
more information about provisional data, go to:
326341
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1254,7 +1269,7 @@ def get_latest_continuous(
12541269
data are released on the condition that neither the USGS nor the United
12551270
States Government may be held liable for any damages resulting from its
12561271
use. This field reflects the approval status of each record, and is either
1257-
"Approved", meaining processing review has been completed and the data is
1272+
"Approved", meaning processing review has been completed and the data is
12581273
approved for publication, or "Provisional" and subject to revision. For
12591274
more information about provisional data, go to:
12601275
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1451,7 +1466,7 @@ def get_latest_daily(
14511466
data are released on the condition that neither the USGS nor the United
14521467
States Government may be held liable for any damages resulting from its
14531468
use. This field reflects the approval status of each record, and is either
1454-
"Approved", meaining processing review has been completed and the data is
1469+
"Approved", meaning processing review has been completed and the data is
14551470
approved for publication, or "Provisional" and subject to revision. For
14561471
more information about provisional data, go to:
14571472
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1633,7 +1648,7 @@ def get_field_measurements(
16331648
data are released on the condition that neither the USGS nor the United
16341649
States Government may be held liable for any damages resulting from its
16351650
use. This field reflects the approval status of each record, and is either
1636-
"Approved", meaining processing review has been completed and the data is
1651+
"Approved", meaning processing review has been completed and the data is
16371652
approved for publication, or "Provisional" and subject to revision. For
16381653
more information about provisional data, go to:
16391654
https://waterdata.usgs.gov/provisional-data-statement/.

0 commit comments

Comments
 (0)