Skip to content

Commit 5509be6

Browse files
thodson-usgsclaude
andcommitted
feat(errors): unify HTTP status->exception across all request paths
Adds a single classifier so every HTTP error response raises the correct typed DataRetrievalError, replacing four divergent paths (legacy query handled only 400/404/414/5xx; waterdata mapped other 4xx to a bare RuntimeError; nldi raised a bare ValueError; nadp/streamstats raised httpx.HTTPStatusError). - exceptions.py: add `error_for_status(status, message, *, retry_after)` -- the one status->type map (400->BadRequestError, 404->NotFoundError, 413/414->URLTooLong, 429->RateLimited, 5xx->ServiceUnavailable, other 4xx->new ClientError) -- plus `ClientError(DataRetrievalError, ValueError)`. - Route every path through it: utils._raise_for_status (legacy query, now also 429 + generic 4xx), waterdata._raise_for_non_200 (drops the bare RuntimeError), nldi._query_nldi, and nadp/streamstats (were raise_for_status()). - A given status now surfaces as the same type everywhere, and a 429 is a RateLimited rather than a fatal ValueError on the single-shot paths too. The type classifies the HTTP *condition* only -- it does NOT imply the path retries it. Auto-retry/resume remains a Water Data chunker feature; the single-shot query/nadp/streamstats paths raise the typed error without retrying (a caller can read retry_after and re-issue). Connection-level failures (timeouts/DNS) still surface as httpx errors on those paths. The general exceptions module documents none of the chunker's resume internals -- that stays in waterdata/chunking (the correct dependency direction). mypy --strict clean; ruff clean; suite green (nwis live-test network flakes are environmental, pass in isolation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent c8a4361 commit 5509be6

10 files changed

Lines changed: 161 additions & 116 deletions

File tree

dataretrieval/__init__.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,10 @@
1717
``nldi`` requires geopandas (``pip install dataretrieval[nldi]``) and is
1818
imported on demand: ``from dataretrieval import nldi``.
1919
20-
Most request failures raise a subclass of :class:`dataretrieval.DataRetrievalError`
21-
(the taxonomy lives in ``dataretrieval.exceptions``); a few paths -- ``nadp``,
22-
``streamstats``, and some ``nldi`` / ``waterdata`` status codes -- are not
23-
migrated yet.
20+
When a request gets an HTTP error response it raises a subclass of
21+
:class:`dataretrieval.DataRetrievalError` (the taxonomy lives in
22+
``dataretrieval.exceptions``). Connection-level failures (timeouts, DNS) still
23+
surface as ``httpx`` exceptions on the single-shot service paths.
2424
"""
2525

2626
from importlib.metadata import PackageNotFoundError, version
@@ -32,6 +32,7 @@
3232

3333
from dataretrieval.exceptions import (
3434
BadRequestError,
35+
ClientError,
3536
DataRetrievalError,
3637
NoSitesError,
3738
NotFoundError,
@@ -67,6 +68,7 @@
6768
# so callers can ``except dataretrieval.DataRetrievalError``
6869
"exceptions",
6970
"BadRequestError",
71+
"ClientError",
7072
"DataRetrievalError",
7173
"NoSitesError",
7274
"NotFoundError",

dataretrieval/exceptions.py

Lines changed: 57 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,16 @@
11
"""Exception taxonomy for ``dataretrieval``.
22
3-
Most request failures raise a subclass of :class:`DataRetrievalError` -- the
4-
legacy ``query`` path (``nwis`` / ``wqp`` / ``nldi``) and the modern ``waterdata``
5-
chunker both do -- so a caller can handle them with one
6-
``except dataretrieval.DataRetrievalError``. A few paths are not migrated yet:
7-
``nadp`` and ``streamstats`` raise ``httpx.HTTPStatusError``, and some ``nldi`` /
8-
``waterdata`` status codes still raise a bare ``ValueError`` / ``RuntimeError``.
3+
When a request gets an HTTP error response, the service modules (``nwis``,
4+
``wqp``, ``nldi``, ``waterdata``, ``nadp``, ``streamstats``) raise a subclass of
5+
:class:`DataRetrievalError`, so a caller can handle any of them with one
6+
``except dataretrieval.DataRetrievalError``. Connection-level failures (timeouts,
7+
DNS, refused connections) surface as ``httpx`` exceptions on the single-shot
8+
request paths.
99
1010
Two intermediate bases let a caller catch a whole family: :class:`RequestTooLarge`
11-
(the request can't fit, however it was issued) and :class:`TransientError` (a raw
12-
retryable transport failure). Note that after the Water Data chunker exhausts its
13-
own retries it raises a resumable
14-
:class:`~dataretrieval.waterdata.chunking.ChunkInterrupted` subclass
15-
(``QuotaExhausted`` / ``ServiceInterrupted``); those are *not* ``TransientError``
16-
instances -- catch ``ChunkInterrupted`` to resume a chunked call.
11+
(the request can't fit, however it was issued) and :class:`TransientError` (a
12+
429 or 5xx the server may serve on a later try). These classify the HTTP
13+
condition; they do not by themselves retry the request.
1714
1815
This module imports only ``httpx`` (the package's core HTTP dependency, always
1916
installed) -- not pandas/geopandas -- so it stays cheap to import and free of
@@ -28,29 +25,33 @@
2825
"DataRetrievalError",
2926
"BadRequestError",
3027
"NotFoundError",
28+
"ClientError",
3129
"RequestTooLarge",
3230
"URLTooLong",
3331
"Unchunkable",
3432
"NoSitesError",
3533
"TransientError",
3634
"RateLimited",
3735
"ServiceUnavailable",
36+
"error_for_status",
3837
]
3938

4039

4140
class DataRetrievalError(Exception):
4241
"""Base class for errors raised when a request to a USGS or EPA web
4342
service fails.
4443
45-
The legacy ``query`` path (``nwis`` / ``wqp`` / ``nldi``) and the modern
46-
``waterdata`` chunker raise a subclass of this on a failed request, so a
47-
caller can handle those uniformly::
44+
Service modules raise a subclass of this when a request gets an HTTP error
45+
response, so a caller can handle them uniformly::
4846
4947
try:
5048
df, md = dataretrieval.wqp.get_results(...)
5149
except dataretrieval.DataRetrievalError:
5250
...
5351
52+
(Connection-level failures still surface as ``httpx`` exceptions on the
53+
single-shot paths.)
54+
5455
Subclasses also inherit from the built-in exception this package has
5556
historically raised for the condition's *kind* -- :class:`ValueError` for a
5657
request that can't succeed as written (bad params, too large), and
@@ -73,6 +74,12 @@ class NotFoundError(DataRetrievalError, ValueError):
7374
"""The requested resource was not found; often an empty query (HTTP 404)."""
7475

7576

77+
class ClientError(DataRetrievalError, ValueError):
78+
"""The service rejected the request with a 4xx not covered by a more
79+
specific type (e.g. 401 Unauthorized, 403 Forbidden, 405). Fatal -- retrying
80+
it unchanged won't help."""
81+
82+
7683
class RequestTooLarge(DataRetrievalError, ValueError):
7784
"""The request is too large for the service to satisfy.
7885
@@ -119,18 +126,19 @@ def __str__(self) -> str:
119126

120127
# --- Transient transport errors ------------------------------------------
121128
# The service was reachable but temporarily refused the request; the same call
122-
# may succeed if retried. Each is also a ``RuntimeError`` (the built-in the
123-
# waterdata path has always raised). The Water Data chunker recognizes them via
124-
# ``isinstance(exc, TransientError)`` and wraps them as resumable
125-
# ``ChunkInterrupted`` subclasses.
129+
# may succeed on a later try. Each is also a ``RuntimeError`` for backward
130+
# compatibility. Whether a transient is actually retried is up to the calling
131+
# path.
126132

127133

128134
class TransientError(DataRetrievalError, RuntimeError):
129-
"""Base for transient HTTP failures that are worth an automatic retry.
135+
"""A 429 or 5xx the server may serve on a later try (:class:`RateLimited`
136+
for 429, :class:`ServiceUnavailable` for 5xx).
130137
131-
One subclass per recoverable HTTP status family (429 -> :class:`RateLimited`,
132-
5xx -> :class:`ServiceUnavailable`); the Water Data chunker recognizes them
133-
by this shared base and wraps them as resumable interruptions.
138+
This classifies the HTTP condition; it does not by itself retry the request.
139+
Whether a transient is retried is up to the calling path -- a single-shot
140+
request raises it for the caller to handle (e.g. wait :attr:`retry_after`
141+
and re-issue).
134142
135143
Parameters
136144
----------
@@ -157,3 +165,29 @@ class ServiceUnavailable(TransientError):
157165
Raised by both the legacy ``query`` path and the Water Data path, so a 5xx
158166
surfaces as one type regardless of which subsystem issued the request.
159167
"""
168+
169+
170+
def error_for_status(
171+
status: int, message: str, *, retry_after: float | None = None
172+
) -> DataRetrievalError:
173+
"""Return the typed :class:`DataRetrievalError` for an HTTP error *status*.
174+
175+
The single status-to-type mapping shared by every request path (the legacy
176+
``query`` path, ``waterdata``, ``nadp`` / ``streamstats``), so a given status
177+
surfaces as the same type everywhere. ``message`` is used
178+
verbatim; ``retry_after`` is attached only to the transient
179+
(:class:`TransientError`) types. The returned type classifies the HTTP
180+
condition only -- it does not imply the caller's path retries it (see
181+
:class:`TransientError`).
182+
"""
183+
if status == 400:
184+
return BadRequestError(message)
185+
if status == 404:
186+
return NotFoundError(message)
187+
if status in (413, 414):
188+
return URLTooLong(message)
189+
if status == 429:
190+
return RateLimited(message, retry_after=retry_after)
191+
if 500 <= status < 600:
192+
return ServiceUnavailable(message, retry_after=retry_after)
193+
return ClientError(message)

dataretrieval/nadp.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545

4646
import httpx
4747

48-
from dataretrieval.utils import HTTPX_DEFAULTS
48+
from dataretrieval.utils import HTTPX_DEFAULTS, _raise_for_status
4949

5050
_DEPRECATION_MESSAGE = (
5151
"The `nadp` module is deprecated and will be removed from `dataretrieval` "
@@ -230,7 +230,7 @@ def get_zip(url: str, filename: str) -> NADP_ZipFile:
230230
_warn_deprecated()
231231

232232
req = httpx.get(url + filename, **HTTPX_DEFAULTS)
233-
req.raise_for_status()
233+
_raise_for_status(req)
234234

235235
# z = zipfile.ZipFile(io.BytesIO(req.content))
236236
z = NADP_ZipFile(io.BytesIO(req.content))

dataretrieval/nldi.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from json import JSONDecodeError
44
from typing import Any, Literal, cast
55

6+
from dataretrieval.exceptions import error_for_status
67
from dataretrieval.utils import query
78

89
try:
@@ -24,7 +25,10 @@ def _query_nldi(
2425
# A helper function to query the NLDI API
2526
response = query(url, payload=query_params)
2627
if response.status_code != 200:
27-
raise ValueError(f"{error_message}. Error reason: {response.reason_phrase}")
28+
raise error_for_status(
29+
response.status_code,
30+
f"{error_message} Error reason: {response.reason_phrase}",
31+
)
2832

2933
response_data: dict[str, Any] | list[Any] = {}
3034
try:

dataretrieval/streamstats.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
import httpx
1414

15-
from dataretrieval.utils import HTTPX_DEFAULTS
15+
from dataretrieval.utils import HTTPX_DEFAULTS, _raise_for_status
1616

1717

1818
def download_workspace(workspaceID: str, format: str = "") -> httpx.Response:
@@ -39,7 +39,7 @@ def download_workspace(workspaceID: str, format: str = "") -> httpx.Response:
3939

4040
r = httpx.get(url, params=payload, **HTTPX_DEFAULTS)
4141

42-
r.raise_for_status()
42+
_raise_for_status(r)
4343
return r
4444
# data = r.raw.read()
4545

@@ -146,7 +146,7 @@ def get_watershed(
146146

147147
r = httpx.get(url, params=payload, **HTTPX_DEFAULTS)
148148

149-
r.raise_for_status()
149+
_raise_for_status(r)
150150

151151
if format == "geojson":
152152
return r

dataretrieval/utils.py

Lines changed: 24 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,9 @@
1414
import dataretrieval
1515
from dataretrieval.codes import tz
1616
from dataretrieval.exceptions import (
17-
BadRequestError,
1817
NoSitesError,
19-
NotFoundError,
20-
ServiceUnavailable,
2118
URLTooLong,
19+
error_for_status,
2220
)
2321

2422
# Typed as ``dict[str, Any]`` (not the inferred ``dict[str, object]``) so that
@@ -290,31 +288,22 @@ def _url_too_long_error(detail: str) -> URLTooLong:
290288

291289

292290
def _raise_for_status(response: httpx.Response) -> None:
293-
"""Map an unsuccessful HTTP status to a typed :class:`DataRetrievalError`;
291+
"""Raise the typed :class:`DataRetrievalError` for an HTTP error response;
294292
return ``None`` on success.
295293
296-
Shared by the legacy :func:`query` path. The 4xx types stay
297-
:class:`ValueError`-compatible (this path's historical contract), but a 5xx
298-
raises the transient :class:`ServiceUnavailable` (a :class:`RuntimeError`),
299-
since a server failure is retryable rather than a bad request.
294+
Shared by the legacy :func:`query` path (and ``nadp`` / ``streamstats``).
295+
Delegates the status-to-type mapping to
296+
:func:`dataretrieval.exceptions.error_for_status`; a 414 keeps the richer
297+
"split your query" guidance via :func:`_url_too_long_error`.
300298
"""
301299
status = response.status_code
302-
if status == 400:
303-
raise BadRequestError(
304-
f"Bad Request, check that your parameters are correct. URL: {response.url}"
305-
)
306-
elif status == 404:
307-
raise NotFoundError(
308-
"Page Not Found Error. May be the result of an empty query. "
309-
f"URL: {response.url}"
310-
)
311-
elif status == 414:
300+
if status < 400:
301+
return
302+
if status in (413, 414):
312303
raise _url_too_long_error(f"API response reason: {response.reason_phrase}")
313-
elif 500 <= status < 600:
314-
raise ServiceUnavailable(
315-
f"Service Unavailable: {status} {response.reason_phrase}. "
316-
f"The service at {response.url} may be down or experiencing issues."
317-
)
304+
raise error_for_status(
305+
status, f"{response.reason_phrase} (HTTP {status}). URL: {response.url}"
306+
)
318307

319308

320309
def query(
@@ -348,13 +337,19 @@ def query(
348337
Raises
349338
------
350339
DataRetrievalError
351-
On failure: :class:`~dataretrieval.exceptions.BadRequestError` (400),
340+
On an HTTP error response, the typed subclass for the status (via
341+
:func:`dataretrieval.exceptions.error_for_status`) -- e.g.
342+
:class:`~dataretrieval.exceptions.BadRequestError` (400),
352343
:class:`~dataretrieval.exceptions.NotFoundError` (404),
353-
:class:`~dataretrieval.exceptions.URLTooLong` (414 or a client-side
354-
over-long URL), :class:`~dataretrieval.exceptions.ServiceUnavailable`
355-
(5xx), or :class:`~dataretrieval.exceptions.NoSitesError` (no sites/data
356-
matched). The 4xx types are also :class:`ValueError`;
357-
``ServiceUnavailable`` is a :class:`RuntimeError`.
344+
:class:`~dataretrieval.exceptions.URLTooLong` (413/414 or a client-side
345+
over-long URL), :class:`~dataretrieval.exceptions.RateLimited` (429),
346+
:class:`~dataretrieval.exceptions.ServiceUnavailable` (5xx), or
347+
:class:`~dataretrieval.exceptions.ClientError` for any other 4xx -- or
348+
:class:`~dataretrieval.exceptions.NoSitesError` when a 200 response
349+
reports no sites/data matched. The 4xx types are also
350+
:class:`ValueError`; the transient 429/5xx types are
351+
:class:`RuntimeError`. Connection-level failures (timeouts, DNS) instead
352+
surface as ``httpx`` exceptions.
358353
"""
359354

360355
for key, value in payload.items():

0 commit comments

Comments
 (0)