Skip to content

Commit 276970a

Browse files
thodson-usgsclaude
andcommitted
refactor(errors)!: a lean, idiomatic DataRetrievalError taxonomy
Every request failure raises a subclass of DataRetrievalError, so a caller can handle any of them with a single `except dataretrieval.DataRetrievalError`. The taxonomy stays small -- it adds only what the underlying httpx exceptions can't express: DataRetrievalError(Exception) |- HTTPError # .status_code -- the server returned an error status | '- TransientError # .retry_after -- retryable (429 / 5xx) | |- RateLimited # 429 | '- ServiceUnavailable # 5xx |- RequestTooLarge # the request can't fit | |- URLTooLong # 414 / client-side over-long URL | '- Unchunkable # the Water Data chunker can't split the call '- NoDataError # a 200 response with no data One factory -- error_for_status(status, message, *, retry_after) -- maps a status to its type, and every request path routes through it (the legacy `query` path, the Water Data chunker, nldi, nadp, streamstats), so a given status surfaces as the same type everywhere. A fatal 4xx is a generic HTTPError carrying .status_code (inspect the code rather than a class per code). The chunker keys retry/resume on TransientError; connection-level failures (timeouts, DNS) surface as httpx exceptions on the single-shot paths. BREAKING CHANGES - Request failures raise typed DataRetrievalError subclasses instead of bare ValueError / RuntimeError / httpx.HTTPStatusError. The exceptions root only at DataRetrievalError(Exception) and no longer also inherit ValueError / RuntimeError -- catch DataRetrievalError (or a subclass), not the builtins. - A fatal 4xx raises HTTPError (read .status_code); there are no per-code types. - The empty-result error is renamed NoSitesError -> NoDataError (it is raised from the shared query path for any module, not just NWIS "sites"). NoSitesError stays as a deprecated alias and will be removed in a future release. Also adds a dataretrieval.exceptions API docs page and a NEWS.md changelog entry. mypy --strict clean; ruff clean; full suite green (483 passed, 2 skipped); the Water Data chunker's resume tests pass unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ecf2833 commit 276970a

17 files changed

Lines changed: 300 additions & 260 deletions

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**06/03/2026:** The request-error hierarchy is now unified. Every module (`nwis`, `wqp`, `nldi`, `waterdata`, `nadp`, `streamstats`) raises a subclass of `dataretrieval.DataRetrievalError` on a failed request, so a single `except dataretrieval.DataRetrievalError` spans them all. An HTTP error status surfaces as an `HTTPError` carrying `.status_code` (inspect it to branch on a specific code); the retryable 429/5xx subset is `TransientError` (`RateLimited` / `ServiceUnavailable`, carrying `.retry_after`); and a request too large to satisfy is a `RequestTooLarge` (`URLTooLong` for an over-long single request, `Unchunkable` when the Water Data chunker cannot split a call small enough). Connection-level failures (timeouts, DNS) still surface as `httpx` exceptions on the single-shot paths. **Breaking change:** these exceptions no longer multiply-inherit a built-in — code that caught request failures with `except ValueError` or `except RuntimeError` should switch to `except dataretrieval.DataRetrievalError` (or a specific subclass). The error raised on a 200-but-empty result, formerly `NoSitesError`, is renamed `NoDataError` (the old name leaked NWIS-era "sites" terminology and the condition is general); `NoSitesError` remains as a deprecated alias and will be removed in a future release.
2+
13
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
24

35
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).

dataretrieval/__init__.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,10 @@
1717
``nldi`` requires geopandas (``pip install dataretrieval[nldi]``) and is
1818
imported on demand: ``from dataretrieval import nldi``.
1919
20-
Every request failure raises a subclass of :class:`dataretrieval.DataRetrievalError`;
21-
the taxonomy lives in ``dataretrieval.exceptions``.
20+
When a request gets an HTTP error response it raises a subclass of
21+
:class:`dataretrieval.DataRetrievalError` (the taxonomy lives in
22+
``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still
23+
surface as ``httpx`` exceptions on the single-shot service paths.
2224
"""
2325

2426
from importlib.metadata import PackageNotFoundError, version
@@ -29,10 +31,10 @@
2931
__version__ = "version-unknown"
3032

3133
from dataretrieval.exceptions import (
32-
BadRequestError,
3334
DataRetrievalError,
35+
HTTPError,
36+
NoDataError,
3437
NoSitesError,
35-
NotFoundError,
3638
RateLimited,
3739
RequestTooLarge,
3840
ServiceUnavailable,
@@ -64,10 +66,10 @@
6466
# error taxonomy (canonical home: ``dataretrieval.exceptions``), re-exported
6567
# so callers can ``except dataretrieval.DataRetrievalError``
6668
"exceptions",
67-
"BadRequestError",
6869
"DataRetrievalError",
69-
"NoSitesError",
70-
"NotFoundError",
70+
"HTTPError",
71+
"NoDataError",
72+
"NoSitesError", # deprecated alias for NoDataError
7173
"RateLimited",
7274
"RequestTooLarge",
7375
"ServiceUnavailable",

dataretrieval/exceptions.py

Lines changed: 130 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,137 @@
11
"""Exception taxonomy for ``dataretrieval``.
22
3-
A failed request from any service module (``nwis``, ``wqp``, ``waterdata``,
4-
``nldi``, ...) raises a subclass of :class:`DataRetrievalError`, so a caller can
5-
handle any request failure with a single ``except dataretrieval.DataRetrievalError``.
6-
7-
The tree has two intermediate bases a caller can catch to span a whole family:
8-
:class:`RequestTooLarge` (the request can't fit, however it was issued) and
9-
:class:`TransientError` (a temporary failure worth retrying).
10-
11-
This module deliberately has no third-party dependencies, so any module can
12-
import it without pulling in pandas/httpx.
3+
When a request gets an HTTP error response, the service modules (``nwis``,
4+
``wqp``, ``nldi``, ``waterdata``, ``nadp``, ``streamstats``) raise a subclass of
5+
:class:`DataRetrievalError`, so a caller can handle any of them with one
6+
``except dataretrieval.DataRetrievalError``. Connection-level failures (timeouts,
7+
DNS, refused connections) surface as ``httpx`` exceptions on the single-shot
8+
request paths.
9+
10+
A status error is an :class:`HTTPError` carrying ``.status_code`` (inspect it to
11+
branch on the specific code); :class:`TransientError` is the retryable subset
12+
(429 / 5xx). A few failures are not a plain status -- :class:`RequestTooLarge`
13+
(:class:`URLTooLong` / :class:`Unchunkable`) and :class:`NoDataError`.
14+
15+
This module imports only ``httpx`` (the package's core HTTP dependency, always
16+
installed) -- not pandas/geopandas -- so it stays cheap to import and free of
17+
import cycles.
1318
"""
1419

1520
from __future__ import annotations
1621

17-
from typing import TYPE_CHECKING
18-
19-
if TYPE_CHECKING:
20-
import httpx
22+
import httpx
2123

2224
__all__ = [
2325
"DataRetrievalError",
24-
"BadRequestError",
25-
"NotFoundError",
26-
"RequestTooLarge",
27-
"URLTooLong",
28-
"Unchunkable",
29-
"NoSitesError",
26+
"HTTPError",
3027
"TransientError",
3128
"RateLimited",
3229
"ServiceUnavailable",
30+
"RequestTooLarge",
31+
"URLTooLong",
32+
"Unchunkable",
33+
"NoDataError",
34+
"NoSitesError", # deprecated alias for NoDataError
35+
"error_for_status",
3336
]
3437

3538

3639
class DataRetrievalError(Exception):
37-
"""Base class for errors raised when a request to a USGS or EPA web
40+
"""Base class for every error raised when a request to a USGS or EPA web
3841
service fails.
3942
40-
Every service module (``nwis``, ``wqp``, ``waterdata``, ``nldi``, ...)
41-
raises a subclass of this when a request fails, so a caller can handle any
42-
request failure uniformly::
43+
Service modules raise a subclass of this on a failed request, so a caller
44+
can handle them uniformly::
4345
4446
try:
4547
df, md = dataretrieval.wqp.get_results(...)
4648
except dataretrieval.DataRetrievalError:
4749
...
4850
49-
Subclasses also inherit from the built-in exception this package has
50-
historically raised for the condition's *kind* -- :class:`ValueError` for a
51-
request that can't succeed as written (bad params, too large), and
52-
:class:`RuntimeError` for a transient transport failure -- so existing
53-
``except ValueError`` / ``except RuntimeError`` handlers keep working.
51+
Connection-level failures (timeouts, DNS) still surface as ``httpx``
52+
exceptions on the single-shot request paths.
5453
"""
5554

5655

57-
# --- Fatal client errors -------------------------------------------------
58-
# The request can't succeed as written; retrying it unchanged won't help. Each
59-
# is also a ``ValueError`` -- the built-in the legacy ``query`` path has always
60-
# raised -- so existing ``except ValueError`` handlers keep working.
56+
# --- HTTP status errors --------------------------------------------------
6157

6258

63-
class BadRequestError(DataRetrievalError, ValueError):
64-
"""The service rejected the request parameters (HTTP 400)."""
59+
class HTTPError(DataRetrievalError):
60+
"""The service returned an error HTTP status.
61+
62+
The numeric status is on :attr:`status_code`; inspect it to branch on the
63+
specific code, e.g. ``except HTTPError as e: ... e.status_code == 404``.
64+
:class:`TransientError` (429 / 5xx) is the retryable subset and is itself an
65+
``HTTPError``. The one carve-out: a 413/414 surfaces as :class:`URLTooLong`
66+
(a :class:`RequestTooLarge`), *not* an ``HTTPError`` -- catch
67+
:class:`DataRetrievalError` to span every failure.
68+
69+
Parameters
70+
----------
71+
message : str
72+
Human-readable error message.
73+
status_code : int
74+
The HTTP status the service returned.
75+
"""
76+
77+
def __init__(self, message: str, *, status_code: int) -> None:
78+
super().__init__(message)
79+
self.status_code = status_code
6580

6681

67-
class NotFoundError(DataRetrievalError, ValueError):
68-
"""The requested resource was not found; often an empty query (HTTP 404)."""
82+
class TransientError(HTTPError):
83+
"""A 429 or 5xx the server may serve on a later try (:class:`RateLimited`
84+
for 429, :class:`ServiceUnavailable` for 5xx).
6985
86+
This classifies the HTTP condition; it does not by itself retry the request.
87+
Whether a transient is retried is up to the calling path -- a single-shot
88+
request raises it for the caller to handle (e.g. wait :attr:`retry_after`
89+
and re-issue).
7090
71-
class RequestTooLarge(DataRetrievalError, ValueError):
91+
Parameters
92+
----------
93+
message : str
94+
Human-readable error message.
95+
status_code : int
96+
The HTTP status the service returned.
97+
retry_after : float, optional
98+
Seconds to wait before retrying, parsed from the ``Retry-After``
99+
response header; ``None`` when the header is absent or unparseable.
100+
"""
101+
102+
def __init__(
103+
self, message: str, *, status_code: int, retry_after: float | None = None
104+
) -> None:
105+
super().__init__(message, status_code=status_code)
106+
self.retry_after = retry_after
107+
108+
109+
class RateLimited(TransientError):
110+
"""A request was rejected with HTTP 429 (too many requests)."""
111+
112+
def __init__(
113+
self, message: str, *, status_code: int = 429, retry_after: float | None = None
114+
) -> None:
115+
super().__init__(message, status_code=status_code, retry_after=retry_after)
116+
117+
118+
class ServiceUnavailable(TransientError):
119+
"""A request was rejected with a server error (HTTP 5xx).
120+
121+
Raised by both the legacy ``query`` path and the Water Data path, so a 5xx
122+
surfaces as one type regardless of which subsystem issued the request.
123+
"""
124+
125+
def __init__(
126+
self, message: str, *, status_code: int = 503, retry_after: float | None = None
127+
) -> None:
128+
super().__init__(message, status_code=status_code, retry_after=retry_after)
129+
130+
131+
# --- Request can't fit (not necessarily an HTTP status) ------------------
132+
133+
134+
class RequestTooLarge(DataRetrievalError):
72135
"""The request is too large for the service to satisfy.
73136
74137
A base for the two ways a request can exceed what the service accepts;
@@ -99,56 +162,45 @@ class Unchunkable(RequestTooLarge):
99162
"""
100163

101164

102-
class NoSitesError(DataRetrievalError):
103-
"""The selection criteria matched no sites/data."""
165+
# --- Empty result --------------------------------------------------------
166+
167+
168+
class NoDataError(DataRetrievalError):
169+
"""The request succeeded (HTTP 200) but the selection criteria matched
170+
no data."""
104171

105172
def __init__(self, url: httpx.URL) -> None:
106173
self.url = url
107174

108175
def __str__(self) -> str:
109176
return (
110-
"No sites/data found using the selection criteria specified in "
111-
f"url: {self.url}"
177+
f"No data found using the selection criteria specified in url: {self.url}"
112178
)
113179

114180

115-
# --- Transient transport errors ------------------------------------------
116-
# The service was reachable but temporarily refused the request; the same call
117-
# may succeed if retried. Each is also a ``RuntimeError`` (the built-in the
118-
# waterdata path has always raised). The Water Data chunker recognizes them via
119-
# ``isinstance(exc, TransientError)`` and wraps them as resumable
120-
# ``ChunkInterrupted`` subclasses.
121-
122-
123-
class TransientError(DataRetrievalError, RuntimeError):
124-
"""Base for transient HTTP failures that are worth an automatic retry.
125-
126-
One subclass per recoverable HTTP status family (429 -> :class:`RateLimited`,
127-
5xx -> :class:`ServiceUnavailable`); the Water Data chunker recognizes them
128-
by this shared base and wraps them as resumable interruptions.
129-
130-
Parameters
131-
----------
132-
message : str
133-
Human-readable error message.
134-
retry_after : float, optional
135-
Seconds to wait before retrying, parsed from the ``Retry-After``
136-
response header; stored on the :attr:`retry_after` attribute (``None``
137-
when the header is absent or unparseable).
138-
"""
139-
140-
def __init__(self, message: str, *, retry_after: float | None = None) -> None:
141-
super().__init__(message)
142-
self.retry_after = retry_after
181+
#: Deprecated alias for :class:`NoDataError`. The original name leaked NWIS-era
182+
#: "sites" terminology; it is retained so existing ``except NoSitesError``
183+
#: handlers keep working, and will be removed in a future release.
184+
NoSitesError = NoDataError
143185

144186

145-
class RateLimited(TransientError):
146-
"""A request was rejected with HTTP 429 (too many requests)."""
147-
148-
149-
class ServiceUnavailable(TransientError):
150-
"""A request was rejected with a server error (HTTP 5xx).
187+
def error_for_status(
188+
status: int, message: str, *, retry_after: float | None = None
189+
) -> DataRetrievalError:
190+
"""Return the typed :class:`DataRetrievalError` for an HTTP error *status*.
151191
152-
Raised by both the legacy ``query`` path and the Water Data path, so a 5xx
153-
surfaces as one type regardless of which subsystem issued the request.
192+
The single status-to-type mapping shared by every request path (the legacy
193+
``query`` path, ``waterdata``, ``nadp`` / ``streamstats``), so a given status
194+
surfaces as the same type everywhere. ``message`` is used verbatim;
195+
``retry_after`` is attached only to the transient (:class:`TransientError`)
196+
types. A 413/414 surfaces as :class:`URLTooLong` (a :class:`RequestTooLarge`)
197+
rather than a generic :class:`HTTPError`, matching the client-side
198+
over-long-URL case.
154199
"""
200+
if status in (413, 414):
201+
return URLTooLong(message)
202+
if status == 429:
203+
return RateLimited(message, status_code=status, retry_after=retry_after)
204+
if 500 <= status < 600:
205+
return ServiceUnavailable(message, status_code=status, retry_after=retry_after)
206+
return HTTPError(message, status_code=status)

dataretrieval/nadp.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545

4646
import httpx
4747

48-
from dataretrieval.utils import HTTPX_DEFAULTS
48+
from dataretrieval.utils import HTTPX_DEFAULTS, _raise_for_status
4949

5050
_DEPRECATION_MESSAGE = (
5151
"The `nadp` module is deprecated and will be removed from `dataretrieval` "
@@ -230,7 +230,7 @@ def get_zip(url: str, filename: str) -> NADP_ZipFile:
230230
_warn_deprecated()
231231

232232
req = httpx.get(url + filename, **HTTPX_DEFAULTS)
233-
req.raise_for_status()
233+
_raise_for_status(req)
234234

235235
# z = zipfile.ZipFile(io.BytesIO(req.content))
236236
z = NADP_ZipFile(io.BytesIO(req.content))

0 commit comments

Comments
 (0)