Skip to content

Commit ff8f535

Browse files
thodson-usgsclaude
andauthored
refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy (#319)
Every request failure raises a subclass of DataRetrievalError, so a caller can handle any of them with a single `except dataretrieval.DataRetrievalError`. The taxonomy stays small -- it adds only what the underlying httpx exceptions can't express: DataRetrievalError(Exception) # .status_code / .retry_after / .retryable |- HTTPError # .status_code -- the server returned an error status | '- TransientError # .retry_after -- retryable (429 / 5xx) | |- RateLimited # 429 | '- ServiceUnavailable # 5xx |- RequestTooLarge # the request can't fit | |- URLTooLong # 414 / client-side over-long URL | '- Unchunkable # the Water Data chunker can't split the call |- NetworkError # no response: timeout / DNS / refused connection '- NoSitesError # no-data on the legacy nwis path (see below) One factory -- error_for_status(status, message, *, retry_after) -- maps a status to its type, and every request path routes through it (the legacy `query` path, the Water Data chunker, nldi, nadp, streamstats), so a given status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError carrying .status_code (inspect the code rather than a class per code). The chunker keys retry/resume on TransientError. Every DataRetrievalError exposes three read-anywhere fields -- .status_code (None when there is no HTTP status), .retry_after, and .retryable -- so a single `except DataRetrievalError as e` clause can branch on the status or drive a backoff loop without importing or isinstance-checking the concrete subclass. Connection-level failures (no HTTP response: timeout, DNS, refused connection) are wrapped as NetworkError, with the underlying httpx exception on __cause__, so one `except DataRetrievalError` truly spans every failure. The single-shot paths route their GETs through a thin `utils._get` wrapper that does the translation; the chunker keeps its own client and wraps transport failures as resumable interruptions instead. NetworkError carries no .status_code but is .retryable; with TransientError it forms the retryable set. A no-data result is not an error: the modern getters (waterdata, wqp, nldi) return an empty DataFrame when nothing matches. Only the deprecated nwis (waterservices) path still raises NoSitesError on no data. The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they survive a pickle / deepcopy back from a multiprocessing / lithops worker. A chunk-interruption error sheds its live resume handle (.call) on that trip -- keeping the diagnostic counts and partial frame/response -- while in-process callers still get full `exc.call.resume()`. A too-long-URL status (413 / 414) on the legacy `query` path keeps the actionable "split your query" remediation message (the same one the client-side over-long-URL case raises), rather than degrading to a bare HTTP-status line. Also adds a dataretrieval.exceptions API docs page, a "Handling errors" user guide, and a NEWS.md changelog entry. ruff clean (pre-commit hooks); mypy --strict and the full pytest suite are re-verified by CI on push. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ecf2833 commit ff8f535

21 files changed

Lines changed: 726 additions & 324 deletions

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**06/03/2026:** The request-error hierarchy is now unified. Every module (`nwis`, `wqp`, `nldi`, `waterdata`, `nadp`, `streamstats`) raises a subclass of `dataretrieval.DataRetrievalError` on a failed request, so a single `except dataretrieval.DataRetrievalError` spans them all. An HTTP error status surfaces as an `HTTPError` carrying `.status_code` (inspect it to branch on a specific code); the retryable 429/5xx subset is `TransientError` (`RateLimited` / `ServiceUnavailable`, carrying `.retry_after`); and a request too large to satisfy is a `RequestTooLarge` (`URLTooLong` for an over-long single request, `Unchunkable` when the Water Data chunker cannot split a call small enough). Connection-level failures (timeouts, DNS, refused connections) are wrapped as a `NetworkError`, with the underlying `httpx` exception on `__cause__`. Every `DataRetrievalError` also exposes `.status_code` (`None` when there is no HTTP status), `.retry_after`, and `.retryable`, so a single `except dataretrieval.DataRetrievalError as e` clause can branch on the status or retry transient failures without knowing the concrete subclass. **Breaking change:** these exceptions no longer multiply-inherit a built-in — code that caught request failures with `except ValueError` or `except RuntimeError` should switch to `except dataretrieval.DataRetrievalError` (or a specific subclass). A no-data result is **not** an error: the modern getters (`waterdata`, `wqp`, `nldi`) return an empty DataFrame when nothing matches. Only the deprecated `nwis` (waterservices) path still raises `NoSitesError` on no data.
2+
13
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
24

35
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

dataretrieval/__init__.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@
1717
``nldi`` requires geopandas (``pip install dataretrieval[nldi]``) and is
1818
imported on demand: ``from dataretrieval import nldi``.
1919
20-
Every request failure raises a subclass of :class:`dataretrieval.DataRetrievalError`;
21-
the taxonomy lives in ``dataretrieval.exceptions``.
20+
A failed request raises a subclass of :class:`dataretrieval.DataRetrievalError`
21+
(the taxonomy lives in ``dataretrieval.exceptions``); connection-level failures
22+
(timeouts, DNS) are wrapped as :class:`dataretrieval.NetworkError`.
2223
"""
2324

2425
from importlib.metadata import PackageNotFoundError, version
@@ -29,10 +30,10 @@
2930
__version__ = "version-unknown"
3031

3132
from dataretrieval.exceptions import (
32-
BadRequestError,
3333
DataRetrievalError,
34+
HTTPError,
35+
NetworkError,
3436
NoSitesError,
35-
NotFoundError,
3637
RateLimited,
3738
RequestTooLarge,
3839
ServiceUnavailable,
@@ -64,10 +65,10 @@
6465
# error taxonomy (canonical home: ``dataretrieval.exceptions``), re-exported
6566
# so callers can ``except dataretrieval.DataRetrievalError``
6667
"exceptions",
67-
"BadRequestError",
6868
"DataRetrievalError",
69+
"HTTPError",
70+
"NetworkError",
6971
"NoSitesError",
70-
"NotFoundError",
7172
"RateLimited",
7273
"RequestTooLarge",
7374
"ServiceUnavailable",

0 commit comments

Comments
 (0)