You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(waterdata): Auto-chunk OGC requests over the URL byte limit (#283)
The OGC `waterdata` getters previously failed with HTTP 414 when the
request URL exceeded the server's ~8 KB byte limit.
This PR introduces a joint chunker that models every multi-value list
parameter and the cql-text `filter` as a chunkable axis.
Greedy halving splits the biggest chunk across all
axes until each sub-request URL fits; the chunker fans out under the
hood and returns one combined DataFrame. Callers see no API change.
Mid-stream 429 / 5xx surface as `ChunkInterrupted` subclasses
(`QuotaExhausted` / `ServiceInterrupted`) carrying the partial result
plus a `.call` resumable handle — `exc.call.resume()` continues only
the still-pending sub-requests. Pre-emptive `RequestExceedsQuota`
catches plans that won't fit the remaining rate-limit window;
`API_USGS_LIMIT=0` bypasses the check.
Behavior changes for paginated / chunked calls:
- `BaseMetadata.url` still reflects the user's original query.
- `BaseMetadata.header` now carries the LAST page's headers so
`x-ratelimit-remaining` is current (was: first page's).
- `BaseMetadata.query_time` is now cumulative wall-clock across pages
(was: first page's elapsed).
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: NEWS.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,5 @@
1
+
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
2
+
1
3
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
2
4
3
5
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
@@ -36,4 +38,4 @@
36
38
37
39
**03/01/2024:** USGS data availability and format have changed on Water Quality Portal (WQP). Since March 2024, data obtained from WQP legacy profiles will not include new USGS data or recent updates to existing data. All USGS data (up to and beyond March 2024) are available using the new WQP beta services. You can access the beta services by setting `legacy=False` in the functions in the `wqp` module.
38
40
39
-
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html
41
+
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html
0 commit comments