Skip to content

Commit 13b9032

Browse files
thodson-usgsclaude
andcommitted
Surface real failures in paginated waterdata responses
Two bugs in _walk_pages (and the parallel get_stats_data pagination loop) caused silent or misleading failures when a paginated request was interrupted mid-walk: 1. The except handler called _error_body(resp) — but when client.request() itself raised (ConnectionError, Timeout, etc.), `resp` still pointed at the *previous successful page*. The "incomplete" log message therefore described the wrong request, and on a non-JSON 200 body would itself raise inside resp.json() and bury the original failure. 2. No status-code check was performed on paginated responses. The initial request guards `if resp.status_code != 200`, but the loop doesn't. A 5xx body that didn't include "numberReturned" was silently turned into an empty DataFrame by _get_resp_data, the `next` link wasn't found, and the loop quietly exited — handing the user truncated data with no error logged anywhere. This affects every paginated waterdata getter (get_daily, get_continuous, get_monitoring_locations, …). Fix: - Guard the loop body with `if resp.status_code != 200: raise RuntimeError(_error_body(resp))`, mirroring the initial request. - Capture the exception with `as e` and log `e` instead of touching the stale `resp`. Same change in get_stats_data. Behavior preserved: on failure the loop still logs and returns whatever pages were collected ("best effort"). The change is purely to make the failure observable and the log message accurate. Two new tests cover (a) network-exception mid-pagination and (b) 5xx mid-pagination, asserting that the actual failure surfaces in the error log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 18f831f commit 13b9032

3 files changed

Lines changed: 104 additions & 7 deletions

File tree

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**05/08/2026:** Fixed two latent bugs in the paginated `waterdata` request loop (`_walk_pages` and `get_stats_data`). Previously, when `requests.Session.request(...)` itself raised mid-pagination (network error, timeout), the except block called `_error_body()` on the *prior page's* response, so the logged "error" described the wrong request and could itself crash on non-JSON bodies. Separately, no status-code check was performed on subsequent paginated responses, so a 5xx body that didn't include `numberReturned` was silently treated as an empty page — pagination quietly stopped and the user got truncated data with no error logged. The loop now status-checks each page like the initial request and reports the actual exception. The "best-effort" behavior (return whatever pages were collected) is preserved.
2+
13
**05/06/2026:** Each remaining active function in `dataretrieval.nwis` now emits a per-function `DeprecationWarning` naming the `waterdata` replacement to migrate to (visible the first time users call each getter). The `nwis` module is scheduled for removal on or after **2027-05-06**.
24

35
**05/06/2026:** Added `waterdata.get_ratings(...)` — wraps the new Water Data STAC catalog (`api.waterdata.usgs.gov/stac/v0/search`) for USGS stage-discharge rating curves. Returns parsed `exsa` / `base` / `corr` rating tables as a dict of DataFrames keyed by feature ID, or just the list of available STAC features when `download_and_parse=False`. Mirrors R's `read_waterdata_ratings`.

dataretrieval/waterdata/utils.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -635,11 +635,18 @@ def _walk_pages(
635635
headers=headers,
636636
data=content if method == "POST" else None,
637637
)
638+
# Mirror the initial-request check; otherwise a 5xx body
639+
# without "numberReturned" silently yields an empty frame
640+
# and the loop quietly stops.
641+
if resp.status_code != 200:
642+
raise RuntimeError(_error_body(resp))
638643
dfs.append(_get_resp_data(resp, geopd=geopd))
639644
curr_url = _next_req_url(resp)
640-
except Exception: # noqa: BLE001
641-
error_text = _error_body(resp)
642-
logger.error("Request incomplete. %s", error_text)
645+
except Exception as e: # noqa: BLE001
646+
# Report the *actual* failure — not _error_body(resp) on a
647+
# stale prior-page response (which would describe the wrong
648+
# request and can itself raise on non-JSON bodies).
649+
logger.error("Request incomplete: %s", e)
643650
logger.warning(
644651
"Request failed for URL: %s. Data download interrupted.", curr_url
645652
)
@@ -1099,14 +1106,15 @@ def get_stats_data(
10991106
params=args,
11001107
headers=headers,
11011108
)
1109+
if resp.status_code != 200:
1110+
raise RuntimeError(_error_body(resp))
11021111
body = resp.json()
11031112
all_dfs.append(_handle_stats_nesting(body, geopd=GEOPANDAS))
11041113
next_token = body["next"]
1105-
except Exception: # noqa: BLE001
1106-
error_text = _error_body(resp)
1107-
logger.error("Request incomplete. %s", error_text)
1114+
except Exception as e: # noqa: BLE001
1115+
logger.error("Request incomplete: %s", e)
11081116
logger.warning(
1109-
"Request failed for URL: %s. Data download interrupted.", resp.url
1117+
"Request failed for URL: %s. Data download interrupted.", url
11101118
)
11111119
next_token = None
11121120

tests/waterdata_utils_test.py

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,93 @@ def test_walk_pages_multiple_mocked():
8686
assert mock_client.request.call_args[0][1] == "https://example.com/page2"
8787

8888

89+
def _resp_ok(features):
90+
"""Build a 200-OK mock response carrying the given features list."""
91+
resp = mock.MagicMock()
92+
resp.json.return_value = {
93+
"numberReturned": len(features),
94+
"features": features,
95+
"links": [{"rel": "next", "href": "https://example.com/page2"}]
96+
if features
97+
else [],
98+
}
99+
resp.headers = {}
100+
resp.links = {"next": {"url": "https://example.com/page2"}} if features else {}
101+
resp.status_code = 200
102+
resp.url = "https://example.com/page1"
103+
return resp
104+
105+
106+
def _walk_pages_with_failure(failure_resp_or_exc):
107+
"""Run _walk_pages where page 1 succeeds and page 2 fails as given."""
108+
resp1 = _resp_ok([{"id": "1", "properties": {"val": "a"}}])
109+
110+
mock_client = mock.MagicMock(spec=requests.Session)
111+
mock_client.send.return_value = resp1
112+
if isinstance(failure_resp_or_exc, BaseException):
113+
mock_client.request.side_effect = failure_resp_or_exc
114+
else:
115+
mock_client.request.return_value = failure_resp_or_exc
116+
117+
mock_req = mock.MagicMock(spec=requests.PreparedRequest)
118+
mock_req.method = "GET"
119+
mock_req.headers = {}
120+
mock_req.url = "https://example.com/page1"
121+
122+
return _walk_pages(geopd=False, req=mock_req, client=mock_client)
123+
124+
125+
def test_walk_pages_logs_actual_exception_when_request_raises(caplog):
126+
"""When client.request() itself raises (e.g. a connection error), the
127+
logged failure must reflect that exception — not _error_body() on the
128+
stale prior-page response, which described the wrong request and could
129+
itself crash on non-JSON bodies."""
130+
import logging
131+
132+
caplog.set_level(logging.ERROR, logger="dataretrieval.waterdata.utils")
133+
134+
df, _ = _walk_pages_with_failure(requests.ConnectionError("boom"))
135+
136+
# First page's data is preserved (best-effort behavior).
137+
assert list(df["val"]) == ["a"]
138+
# Logged error mentions the actual ConnectionError, not a stale page body.
139+
error_messages = [
140+
r.getMessage() for r in caplog.records if r.levelno >= logging.ERROR
141+
]
142+
assert any("boom" in m for m in error_messages), error_messages
143+
144+
145+
def test_walk_pages_surfaces_5xx_mid_pagination(caplog):
146+
"""A non-200 response on a paginated page must be detected; previously
147+
the loop happily called _get_resp_data() on the 5xx body, which —
148+
lacking 'numberReturned' — silently produced an empty DataFrame and
149+
the loop quietly stopped with no error logged."""
150+
import logging
151+
152+
caplog.set_level(logging.ERROR, logger="dataretrieval.waterdata.utils")
153+
154+
page2_500 = mock.MagicMock()
155+
page2_500.status_code = 503
156+
page2_500.json.return_value = {
157+
"code": "ServiceUnavailable",
158+
"description": "upstream timeout",
159+
}
160+
page2_500.url = "https://example.com/page2"
161+
162+
df, _ = _walk_pages_with_failure(page2_500)
163+
164+
assert list(df["val"]) == ["a"]
165+
error_messages = [
166+
r.getMessage() for r in caplog.records if r.levelno >= logging.ERROR
167+
]
168+
# The 5xx is now visible in the error log (it would previously have
169+
# been silently swallowed because _get_resp_data returned an empty
170+
# frame and the loop stopped quietly).
171+
assert any("503" in m or "ServiceUnavailable" in m for m in error_messages), (
172+
error_messages
173+
)
174+
175+
89176
def test_handle_stats_nesting_tolerates_missing_drop_columns():
90177
"""If the upstream stats response shape ever changes such that one of
91178
the columns we try to drop ("type", "properties.data") is absent, the

0 commit comments

Comments
 (0)