Skip to content

Commit fcee959

Browse files
committed
Merge remote-tracking branch 'origin/main' into worktree-waterdata-progress-bar
# Conflicts: # README.md # dataretrieval/waterdata/filters.py # dataretrieval/waterdata/utils.py # tests/waterdata_utils_test.py
2 parents 16fc920 + 092f1b0 commit fcee959

13 files changed

Lines changed: 3595 additions & 811 deletions

NEWS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
**05/14/2026:** Fixed two latent bugs in the paginated `waterdata` request loop (`_walk_pages` and `get_stats_data`). Previously, when `requests.Session.request(...)` itself raised mid-pagination (network error, timeout), the except block called `_error_body()` on the *prior page's* response, so the logged "error" described the wrong request and could itself crash on non-JSON bodies. Separately, no status-code check was performed on subsequent paginated responses, so a 5xx body that didn't include `numberReturned` was silently treated as an empty page — pagination quietly stopped and the user got truncated data with no error logged. The loop now status-checks each page like the initial request and reports the actual exception. The "best-effort" behavior (return whatever pages were collected) is preserved.
1+
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
2+
3+
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
24

35
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
46

@@ -36,4 +38,4 @@
3638

3739
**03/01/2024:** USGS data availability and format have changed on Water Quality Portal (WQP). Since March 2024, data obtained from WQP legacy profiles will not include new USGS data or recent updates to existing data. All USGS data (up to and beyond March 2024) are available using the new WQP beta services. You can access the beta services by setting `legacy=False` in the functions in the `wqp` module.
3840

39-
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html
41+
To view the status of changes in data availability and code functionality, visit: https://doi-usgs.github.io/dataRetrieval/articles/Status.html

README.md

Lines changed: 30 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,6 @@
44
![Conda Version](https://img.shields.io/conda/v/conda-forge/dataretrieval)
55
![Downloads](https://static.pepy.tech/badge/dataretrieval)
66

7-
## Latest Announcements
8-
9-
**02/24/2026** The `get_gwlevels`, `get_discharge_measurements` functions in the `nwis` module are defunct and have been replaced with the `get_field_measurements` function in the `waterdata` module. The `get_pmcodes` function in the `nwis` module has been replaced with the `get_reference_table(collection='parameter_code)` function. Finally, the `get_water_use` function in the `nwis` module is defunct with no current replacement.
10-
11-
:mega: **01/16/2026:** `dataretrieval` now features the `waterdata` module,
12-
which provides access to USGS's modernized [Water Data
13-
APIs](https://api.waterdata.usgs.gov/). The Water Data API endpoints include
14-
daily values, instantaneous values, field measurements, time series metadata, statistics,
15-
and discrete water quality data from the [Samples database](https://waterdata.usgs.gov/download-samples/#dataProfile=site). This new module replaces the `nwis` module, which provides access to the legacy [NWIS
16-
Water Services](https://waterservices.usgs.gov/). Take a look at the new [`waterdata` module demo notebook](demos/WaterData_demo.ipynb), which walks through an extended example using a majority of the available `waterdata` functions.
17-
18-
Check out the [NEWS](NEWS.md) file for all updates and announcements.
19-
20-
**Important:** Users of the Water Data APIs are strongly encouraged to obtain an
21-
API key for higher rate limits and greater access to USGS data. [Register for
22-
an API key](https://api.waterdata.usgs.gov/signup/) and set it as an
23-
environment variable:
24-
25-
```python
26-
import os
27-
os.environ["API_USGS_PAT"] = "your_api_key_here"
28-
```
29-
307
## What is dataretrieval?
318

329
`dataretrieval` simplifies the process of loading hydrologic data into Python.
@@ -36,6 +13,8 @@ U.S. Geological Survey (USGS) hydrology data types available on the Web, as well
3613
as data from the Water Quality Portal (WQP) and Network Linked Data Index
3714
(NLDI).
3815

16+
Check the [NEWS](NEWS.md) for all updates and announcements.
17+
3918
## Installation
4019

4120
Install dataretrieval using pip:
@@ -44,13 +23,13 @@ Install dataretrieval using pip:
4423
pip install dataretrieval
4524
```
4625

47-
Or using conda:
26+
Or conda:
4827

4928
```bash
5029
conda install -c conda-forge dataretrieval
5130
```
5231

53-
To install the "main" branch directly from GitHub, use:
32+
Or directly from GitHub:
5433

5534
```bash
5635
pip install git+https://github.com/DOI-USGS/dataretrieval-python.git
@@ -60,11 +39,20 @@ pip install git+https://github.com/DOI-USGS/dataretrieval-python.git
6039

6140
### Water Data API (Recommended - Modern USGS Data)
6241

63-
The `waterdata` module provides access to modern USGS Water Data APIs.
42+
Access USGS water-monitoring data.
6443

65-
Some basic usage examples include retrieving daily streamflow data for a
66-
specific monitoring location, where the `/` in the `time` argument indicates
67-
the desired range:
44+
**Important:** Users are strongly encouraged to obtain an API key for higher
45+
rate limits. [Register for an API key](https://api.waterdata.usgs.gov/signup/)
46+
and set it as an environment variable:
47+
48+
```python
49+
import os
50+
os.environ["API_USGS_PAT"] = "your_api_key_here"
51+
```
52+
53+
The following example retrieves daily streamflow data for a specific
54+
monitoring location. The `/` in the `time` argument separates the start and
55+
end of the desired range:
6856

6957
```python
7058
from dataretrieval import waterdata
@@ -80,7 +68,7 @@ print(f"Retrieved {len(df)} records")
8068
print(f"Site: {df['monitoring_location_id'].iloc[0]}")
8169
print(f"Mean discharge: {df['value'].mean():.2f} {df['unit_of_measure'].iloc[0]}")
8270
```
83-
Retrieving streamflow at multiple locations from October 1, 2024 to the present:
71+
Retrieve streamflow at multiple locations from October 1, 2024 to the present:
8472

8573
```python
8674
df, metadata = waterdata.get_daily(
@@ -91,8 +79,8 @@ df, metadata = waterdata.get_daily(
9179

9280
print(f"Retrieved {len(df)} records")
9381
```
94-
Retrieving location information for all monitoring locations categorized as
95-
stream sites in the state of Maryland:
82+
Retrieve location information for all monitoring locations categorized as
83+
stream sites in Maryland:
9684

9785
```python
9886
# Get monitoring location information
@@ -103,8 +91,9 @@ df, metadata = waterdata.get_monitoring_locations(
10391

10492
print(f"Found {len(df)} stream monitoring locations in Maryland")
10593
```
106-
Finally, retrieving continuous (a.k.a. "instantaneous") data
107-
for one location. We *strongly advise* breaking up continuous data requests into smaller time periods and collections to avoid timeouts and other issues:
94+
Finally, retrieve continuous (a.k.a. "instantaneous") data for one location.
95+
We *strongly advise* breaking continuous data requests into smaller time
96+
windows to avoid timeouts and other issues:
10897

10998
```python
11099
# Get continuous data for a single monitoring location and water year
@@ -118,14 +107,15 @@ print(f"Retrieved {len(df)} continuous gage height measurements")
118107

119108
Visit the
120109
[API Reference](https://doi-usgs.github.io/dataretrieval-python/reference/waterdata.html)
121-
for more information and examples on available services and input parameters.
110+
for more information and examples on available services and input parameters.
122111

123112
**Tracking progress:** Paginated and chunked `waterdata` queries report their
124113
progress on a single, self-updating line on `stderr` — showing the chunk and
125-
page counts, rows retrieved so far, and the API requests remaining for the hour:
114+
page counts, rows retrieved so far, and the API requests remaining (with the
115+
time until the hourly limit resets, when the server reports it):
126116

127117
```text
128-
Progress: chunk 2/5 · 14 pages · 8,421 rows · 4,870 requests left
118+
Progress: chunk 2/5 · 14 pages · 8,421 rows · 4,870 requests remaining, resets in 47m
129119
```
130120

131121
The line appears automatically when `stderr` is an interactive terminal.
@@ -139,29 +129,6 @@ import logging
139129
logging.basicConfig(level=logging.DEBUG)
140130
```
141131

142-
### Legacy NWIS Services (Deprecated but still functional)
143-
144-
The `nwis` module accesses legacy NWIS Water Services:
145-
146-
```python
147-
from dataretrieval import nwis
148-
149-
# Get site information
150-
info, metadata = nwis.get_info(sites='01646500')
151-
152-
print(f"Site name: {info['station_nm'].iloc[0]}")
153-
154-
# Get daily values
155-
dv, metadata = nwis.get_dv(
156-
sites='01646500',
157-
start='2024-10-01',
158-
end='2024-10-02',
159-
parameterCd='00060',
160-
)
161-
162-
print(f"Retrieved {len(dv)} daily values")
163-
```
164-
165132
### Water Quality Portal (WQP)
166133

167134
Access water quality data from multiple agencies:
@@ -247,13 +214,13 @@ print(f"Found {len(flowlines)} upstream tributaries within 50km")
247214
## More Examples
248215

249216
Explore additional examples in the
250-
[`demos`](https://github.com/USGS-python/dataretrieval/tree/main/demos)
217+
[`demos`](https://github.com/DOI-USGS/dataretrieval-python/tree/main/demos)
251218
directory, including Jupyter notebooks demonstrating advanced usage patterns.
252219

253220
## Getting Help
254221

255-
- **Issue tracker**: Report bugs and request features at https://github.com/USGS-python/dataretrieval/issues
256-
- **Documentation**: Full API documentation available in the source code docstrings
222+
- **Issue tracker**: Report bugs and request features at https://github.com/DOI-USGS/dataretrieval-python/issues
223+
- **Documentation**: https://doi-usgs.github.io/dataretrieval-python/
257224

258225
## Contributing
259226

dataretrieval/waterdata/_progress.py

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
and ``utils.get_stats_data``). This module surfaces that work as one line on
77
stderr, rewritten in place as data arrives::
88
9-
Progress: chunk 2/5 · 14 pages · 8,421 rows · 4,870 requests left
9+
Progress: chunk 2/5 · 14 pages · 8,421 rows · 4,870 requests remaining
1010
1111
It replaces the per-page ``logger.info`` calls that previously narrated the same
1212
events one line at a time.
@@ -27,10 +27,24 @@
2727
import contextvars
2828
import os
2929
import sys
30+
import time
3031
from collections.abc import Iterator
3132
from contextlib import contextmanager
3233
from typing import TextIO
3334

35+
36+
def _format_duration(seconds: float) -> str:
37+
"""Compact human duration: ``45s``, ``12m``, ``1h03m`` (clamped at 0)."""
38+
secs = int(max(0, seconds))
39+
if secs < 60:
40+
return f"{secs}s"
41+
if secs < 3600:
42+
return f"{secs // 60}m"
43+
hours, rem = divmod(secs, 3600)
44+
minutes = rem // 60
45+
return f"{hours}h{minutes:02d}m" if minutes else f"{hours}h"
46+
47+
3448
# The reporter active for the current query. A ContextVar (not a module global)
3549
# so concurrent queries — threads or async tasks sharing a client — each track
3650
# their own progress line.
@@ -74,6 +88,9 @@ def __init__(
7488
self.pages = 0
7589
self.rows = 0
7690
self.rate_remaining: str | None = None
91+
# Absolute epoch second when the rate-limit window resets, derived from
92+
# the server's reset header so the rendered countdown stays live.
93+
self._reset_at: float | None = None
7794
self._last_len = 0
7895
self._closed = False
7996

@@ -82,24 +99,42 @@ def set_chunks(self, total: int) -> None:
8299
self.total_chunks = max(int(total), 1)
83100

84101
def start_chunk(self, index: int) -> None:
85-
"""Mark the start of chunk ``index`` (1-based) and redraw."""
102+
"""Mark the start of chunk ``index`` (1-based) and redraw.
103+
104+
Only redraws when actually chunking (``total_chunks > 1``); a
105+
single-chunk plan has nothing chunk-specific to show yet, so it
106+
avoids a premature "0 pages" frame before the first page arrives.
107+
"""
86108
self.current_chunk = index
87-
self._render()
109+
if self.total_chunks > 1:
110+
self._render()
88111

89112
def add_page(self, rows: int = 0) -> None:
90113
"""Record one fetched page carrying ``rows`` rows and redraw."""
91114
self.pages += 1
92115
self.rows += int(rows)
93116
self._render()
94117

95-
def set_rate_remaining(self, value: str | int | None) -> None:
96-
"""Update the remaining-requests count from an ``x-ratelimit-remaining`` header.
97-
98-
Ignores empty/missing values so a page that omits the header doesn't
99-
blank out the last known count.
118+
def set_rate_remaining(
119+
self, value: str | int | None, reset: str | int | None = None
120+
) -> None:
121+
"""Update the rate-limit display from the response headers.
122+
123+
``value`` is ``x-ratelimit-remaining``; ``reset`` is the optional
124+
``x-ratelimit-reset`` companion. Empty/missing values are ignored so a
125+
page that omits a header doesn't blank out the last known value. The
126+
reset value is interpreted as an absolute epoch second when large
127+
(the conventional form) and as seconds-until-reset otherwise; either
128+
way it's stored as an absolute deadline so the countdown stays live.
100129
"""
101130
if value not in (None, ""):
102131
self.rate_remaining = str(value)
132+
if reset not in (None, ""):
133+
try:
134+
secs = float(reset)
135+
except (TypeError, ValueError):
136+
return
137+
self._reset_at = secs if secs > 1_000_000 else time.time() + secs
103138

104139
def _format(self) -> str:
105140
parts: list[str] = []
@@ -114,7 +149,11 @@ def _format(self) -> str:
114149
# alone is True for non-decimal unicode digits that ``int`` rejects.)
115150
rate = self.rate_remaining
116151
rate = f"{int(rate):,}" if rate.isascii() and rate.isdigit() else rate
117-
parts.append(f"{rate} requests left")
152+
segment = f"{rate} requests remaining"
153+
if self._reset_at is not None:
154+
eta = _format_duration(self._reset_at - time.time())
155+
segment += f", resets in {eta}"
156+
parts.append(segment)
118157
return "Progress: " + " · ".join(parts)
119158

120159
def _render(self) -> None:

dataretrieval/waterdata/api.py

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def get_daily(
113113
data are released on the condition that neither the USGS nor the United
114114
States Government may be held liable for any damages resulting from its
115115
use. This field reflects the approval status of each record, and is either
116-
"Approved", meaining processing review has been completed and the data is
116+
"Approved", meaning processing review has been completed and the data is
117117
approved for publication, or "Provisional" and subject to revision. For
118118
more information about provisional data, go to:
119119
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -230,6 +230,21 @@ def get_daily(
230230
... parameter_code="00060",
231231
... last_modified="P7D",
232232
... )
233+
234+
>>> # Chain queries: pull all stream sites in a state, then their
235+
>>> # daily discharge for the last week. The site list can be hundreds
236+
>>> # of values long — the request is transparently chunked across
237+
>>> # multiple sub-requests so the URL stays under the server's byte
238+
>>> # limit. Combined output looks like a single query.
239+
>>> sites_df, _ = dataretrieval.waterdata.get_monitoring_locations(
240+
... state_name="Ohio",
241+
... site_type="Stream",
242+
... )
243+
>>> df, md = dataretrieval.waterdata.get_daily(
244+
... monitoring_location_id=sites_df["monitoring_location_id"].tolist(),
245+
... parameter_code="00060",
246+
... time="P7D",
247+
... )
233248
"""
234249
service = "daily"
235250
output_id = "daily_id"
@@ -259,7 +274,7 @@ def get_continuous(
259274
convert_type: bool = True,
260275
) -> tuple[pd.DataFrame, BaseMetadata]:
261276
"""
262-
Continuous data provide instantanous water conditions.
277+
Continuous data provide instantaneous water conditions.
263278
264279
This is an early version of the continuous endpoint that is feature-complete
265280
and is being made available for limited use. Geometries are not included
@@ -320,7 +335,7 @@ def get_continuous(
320335
data are released on the condition that neither the USGS nor the United
321336
States Government may be held liable for any damages resulting from its
322337
use. This field reflects the approval status of each record, and is either
323-
"Approved", meaining processing review has been completed and the data is
338+
"Approved", meaning processing review has been completed and the data is
324339
approved for publication, or "Provisional" and subject to revision. For
325340
more information about provisional data, go to:
326341
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1254,7 +1269,7 @@ def get_latest_continuous(
12541269
data are released on the condition that neither the USGS nor the United
12551270
States Government may be held liable for any damages resulting from its
12561271
use. This field reflects the approval status of each record, and is either
1257-
"Approved", meaining processing review has been completed and the data is
1272+
"Approved", meaning processing review has been completed and the data is
12581273
approved for publication, or "Provisional" and subject to revision. For
12591274
more information about provisional data, go to:
12601275
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1451,7 +1466,7 @@ def get_latest_daily(
14511466
data are released on the condition that neither the USGS nor the United
14521467
States Government may be held liable for any damages resulting from its
14531468
use. This field reflects the approval status of each record, and is either
1454-
"Approved", meaining processing review has been completed and the data is
1469+
"Approved", meaning processing review has been completed and the data is
14551470
approved for publication, or "Provisional" and subject to revision. For
14561471
more information about provisional data, go to:
14571472
https://waterdata.usgs.gov/provisional-data-statement/.
@@ -1633,7 +1648,7 @@ def get_field_measurements(
16331648
data are released on the condition that neither the USGS nor the United
16341649
States Government may be held liable for any damages resulting from its
16351650
use. This field reflects the approval status of each record, and is either
1636-
"Approved", meaining processing review has been completed and the data is
1651+
"Approved", meaning processing review has been completed and the data is
16371652
approved for publication, or "Provisional" and subject to revision. For
16381653
more information about provisional data, go to:
16391654
https://waterdata.usgs.gov/provisional-data-statement/.

0 commit comments

Comments
 (0)