Skip to content

Commit 4587964

Browse files
thodson-usgsclaude
andcommitted
refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy
Every request failure raises a subclass of DataRetrievalError, so a caller can handle any of them with a single `except dataretrieval.DataRetrievalError`. The taxonomy stays small -- it adds only what the underlying httpx exceptions can't express: DataRetrievalError(Exception) |- HTTPError # .status_code -- the server returned an error status | '- TransientError # .retry_after -- retryable (429 / 5xx) | |- RateLimited # 429 | '- ServiceUnavailable # 5xx |- RequestTooLarge # the request can't fit | |- URLTooLong # 414 / client-side over-long URL | '- Unchunkable # the Water Data chunker can't split the call |- NetworkError # no response: timeout / DNS / refused connection '- NoDataError # a 200 response with no data One factory -- error_for_status(status, message, *, retry_after) -- maps a status to its type, and every request path routes through it (the legacy `query` path, the Water Data chunker, nldi, nadp, streamstats), so a given status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError carrying .status_code (inspect the code rather than a class per code). The chunker keys retry/resume on TransientError. Connection-level failures (no HTTP response: timeout, DNS, refused connection) are wrapped as NetworkError, with the underlying httpx exception on __cause__, so one `except DataRetrievalError` truly spans every failure. The single-shot paths route their GETs through a thin `utils._get` wrapper that does the translation; the chunker keeps its own client and wraps transport failures as resumable interruptions instead. NetworkError carries no .status_code (no response arrived); with TransientError it forms the retryable set. The typed errors are picklable via the standard __getstate__/__setstate__ protocol, so they survive a pickle / deepcopy back from a multiprocessing / lithops worker. A chunk-interruption error sheds its live resume handle (.call) on that trip -- keeping the diagnostic counts and partial frame/response -- while in-process callers still get full `exc.call.resume()`. A too-long-URL status (413 / 414) on the legacy `query` path keeps the actionable "split your query" remediation message (the same one the client-side over-long-URL case raises), rather than degrading to a bare HTTP-status line. BREAKING CHANGES - Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at DataRetrievalError(Exception) and no longer also inherit ValueError / RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins. This now also covers ChunkInterrupted (previously a RuntimeError) and mid-pagination failures (previously a bare RuntimeError). - Connection-level failures are wrapped as NetworkError instead of surfacing as raw httpx exceptions on the single-shot paths -- catch NetworkError (or DataRetrievalError); the httpx exception is preserved on __cause__. - A fatal 4xx raises HTTPError (read .status_code); there are no per-code types. - The empty-result error is renamed NoSitesError -> NoDataError (it is raised from the shared query path for any module, not just NWIS "sites"). NoSitesError stays as a deprecated alias -- referencing it now emits a DeprecationWarning -- and will be removed in a future release. Also adds a dataretrieval.exceptions API docs page and a NEWS.md changelog entry. mypy --strict clean; ruff clean; full suite green (489 passed, 2 skipped); the Water Data chunker's resume tests pass unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ecf2833 commit 4587964

18 files changed

Lines changed: 632 additions & 318 deletions

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**06/03/2026:** The request-error hierarchy is now unified. Every module (`nwis`, `wqp`, `nldi`, `waterdata`, `nadp`, `streamstats`) raises a subclass of `dataretrieval.DataRetrievalError` on a failed request, so a single `except dataretrieval.DataRetrievalError` spans them all. An HTTP error status surfaces as an `HTTPError` carrying `.status_code` (inspect it to branch on a specific code); the retryable 429/5xx subset is `TransientError` (`RateLimited` / `ServiceUnavailable`, carrying `.retry_after`); and a request too large to satisfy is a `RequestTooLarge` (`URLTooLong` for an over-long single request, `Unchunkable` when the Water Data chunker cannot split a call small enough). Connection-level failures (timeouts, DNS, refused connections) are wrapped as a `NetworkError`, with the underlying `httpx` exception on `__cause__`. **Breaking change:** these exceptions no longer multiply-inherit a built-in — code that caught request failures with `except ValueError` or `except RuntimeError` should switch to `except dataretrieval.DataRetrievalError` (or a specific subclass). The error raised on a 200-but-empty result, formerly `NoSitesError`, is renamed `NoDataError` (the old name leaked NWIS-era "sites" terminology and the condition is general); `NoSitesError` remains as a deprecated alias and will be removed in a future release.
2+
13
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
24

35
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

dataretrieval/__init__.py

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,22 +17,24 @@
1717
``nldi`` requires geopandas (``pip install dataretrieval[nldi]``) and is
1818
imported on demand: ``from dataretrieval import nldi``.
1919
20-
Every request failure raises a subclass of :class:`dataretrieval.DataRetrievalError`;
21-
the taxonomy lives in ``dataretrieval.exceptions``.
20+
A failed request raises a subclass of :class:`dataretrieval.DataRetrievalError`
21+
(the taxonomy lives in ``dataretrieval.exceptions``); connection-level failures
22+
(timeouts, DNS) are wrapped as :class:`dataretrieval.NetworkError`.
2223
"""
2324

2425
from importlib.metadata import PackageNotFoundError, version
26+
from typing import Any
2527

2628
try:
2729
__version__ = version("dataretrieval")
2830
except PackageNotFoundError:
2931
__version__ = "version-unknown"
3032

3133
from dataretrieval.exceptions import (
32-
BadRequestError,
3334
DataRetrievalError,
34-
NoSitesError,
35-
NotFoundError,
35+
HTTPError,
36+
NetworkError,
37+
NoDataError,
3638
RateLimited,
3739
RequestTooLarge,
3840
ServiceUnavailable,
@@ -64,10 +66,10 @@
6466
# error taxonomy (canonical home: ``dataretrieval.exceptions``), re-exported
6567
# so callers can ``except dataretrieval.DataRetrievalError``
6668
"exceptions",
67-
"BadRequestError",
6869
"DataRetrievalError",
69-
"NoSitesError",
70-
"NotFoundError",
70+
"HTTPError",
71+
"NetworkError",
72+
"NoDataError",
7173
"RateLimited",
7274
"RequestTooLarge",
7375
"ServiceUnavailable",
@@ -76,3 +78,12 @@
7678
"Unchunkable",
7779
"__version__",
7880
]
81+
82+
83+
def __getattr__(name: str) -> Any:
84+
# ``NoSitesError`` is the pre-rename alias of ``NoDataError``; resolve it
85+
# lazily so referencing it emits the deprecation warning (see
86+
# ``dataretrieval.exceptions``) rather than being a silent re-export.
87+
if name == "NoSitesError":
88+
return exceptions._deprecated_nosites()
89+
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

0 commit comments

Comments
 (0)