Skip to content

Commit 9801177

Browse files
thodson-usgsclaude
andcommitted
refactor(errors): unify request errors under a DataRetrievalError taxonomy
An HTTP failure used to surface as a different exception type depending on which module made the request: - legacy query() (wqp, nwis, ngwmn, nldi): ValueError, or NoSitesError(Exception) - waterdata: RateLimited / ServiceUnavailable (RuntimeError), RequestTooLarge (ValueError), ChunkInterrupted (RuntimeError) - nadp / streamstats: bare httpx.HTTPStatusError so no single `except` clause could catch "any dataretrieval request failure." Add a `DataRetrievalError` base and root the existing exceptions on it, so a caller can `except dataretrieval.DataRetrievalError` for any request failure. Each subclass also keeps the built-in it has historically raised (BadRequestError is a ValueError; RateLimited is a RuntimeError), so existing `except ValueError` / `except RuntimeError` handlers keep working unchanged. The taxonomy lives in a new, dependency-free `dataretrieval/exceptions.py` — the single home for it (cf. requests.exceptions, botocore.exceptions) — rather than buried in the pandas/httpx-heavy utils.py: - exceptions.py: DataRetrievalError + the legacy query() types (BadRequestError, NotFoundError, RequestTooLargeError, ServiceUnavailableError; all ValueError) + NoSitesError + the Water Data transport types (RateLimited, ServiceUnavailable, RequestTooLarge). Explicit __all__, re-exported from dataretrieval/__init__.py so the top-level export is intentional, not an accident of `import *`. - utils.py keeps the behavior: query() and a consolidated _raise_for_status() status->exception mapper; it imports the types it raises. - chunking.py keeps the chunker-specific resumable types (ChunkInterrupted and its subclasses, which carry a ChunkedCall resume handle) and imports the transport types from exceptions. The old import paths still resolve (utils.NoSitesError, waterdata.chunking.RateLimited, ...) via re-import, so nothing downstream breaks. Out of scope (follow-ups): nadp/streamstats still raise httpx.HTTPStatusError; nldi's manual non-200 ValueError isn't rooted; waterdata.utils._raise_for_non_200's catch-all for non-retryable 4xx stays a bare RuntimeError (a deliberate fatal/non-resumable signal the chunker relies on). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1adf174 commit 9801177

6 files changed

Lines changed: 258 additions & 98 deletions

File tree

dataretrieval/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
except PackageNotFoundError:
66
__version__ = "version-unknown"
77

8+
from dataretrieval.exceptions import *
89
from dataretrieval.nadp import *
910
from dataretrieval.nwis import *
1011
from dataretrieval.samples import *

dataretrieval/exceptions.py

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
"""Exception taxonomy for ``dataretrieval``.
2+
3+
A failed request from any service module (``nwis``, ``wqp``, ``waterdata``,
4+
``nldi``, ...) raises a subclass of :class:`DataRetrievalError`, so a caller can
5+
handle any request failure with a single ``except dataretrieval.DataRetrievalError``.
6+
7+
This module deliberately has no third-party dependencies, so any module can
8+
import it without pulling in pandas/httpx.
9+
"""
10+
11+
from __future__ import annotations
12+
13+
__all__ = [
14+
"DataRetrievalError",
15+
"BadRequestError",
16+
"NotFoundError",
17+
"RequestTooLargeError",
18+
"ServiceUnavailableError",
19+
"NoSitesError",
20+
"RateLimited",
21+
"ServiceUnavailable",
22+
"RequestTooLarge",
23+
]
24+
25+
26+
class DataRetrievalError(Exception):
27+
"""Base class for errors raised when a request to a USGS or EPA web
28+
service fails.
29+
30+
Every service module (``nwis``, ``wqp``, ``waterdata``, ``nldi``, ...)
31+
raises a subclass of this when a request fails, so a caller can handle any
32+
request failure uniformly::
33+
34+
try:
35+
df, md = dataretrieval.wqp.get_results(...)
36+
except dataretrieval.DataRetrievalError:
37+
...
38+
39+
Subclasses also inherit from the built-in exception this package has
40+
historically raised for the same condition (e.g. :class:`BadRequestError`
41+
is also a :class:`ValueError`, :class:`RateLimited` is also a
42+
:class:`RuntimeError`), so existing ``except ValueError`` / ``except
43+
RuntimeError`` handlers keep working unchanged.
44+
"""
45+
46+
47+
# Legacy ``query()`` path: HTTP status families mapped to ValueError-compatible
48+
# types (the type that path has always raised).
49+
class BadRequestError(DataRetrievalError, ValueError):
50+
"""The service rejected the request parameters (HTTP 400)."""
51+
52+
53+
class NotFoundError(DataRetrievalError, ValueError):
54+
"""The requested resource was not found; often an empty query (HTTP 404)."""
55+
56+
57+
class RequestTooLargeError(DataRetrievalError, ValueError):
58+
"""The request URL was too long for the service (HTTP 414, or rejected
59+
client-side before it was sent)."""
60+
61+
62+
class ServiceUnavailableError(DataRetrievalError, ValueError):
63+
"""The service is down or returned a server error (HTTP 5xx)."""
64+
65+
66+
class NoSitesError(DataRetrievalError):
67+
"""The selection criteria matched no sites/data."""
68+
69+
def __init__(self, url):
70+
self.url = url
71+
72+
def __str__(self):
73+
return (
74+
"No sites/data found using the selection criteria specified in "
75+
f"url: {self.url}"
76+
)
77+
78+
79+
# Water Data API transport errors: retryable HTTP status families, surfaced as
80+
# RuntimeError-compatible types the chunker detects via ``isinstance`` and wraps
81+
# as resumable interruptions.
82+
class _RetryableTransportError(DataRetrievalError, RuntimeError):
83+
"""
84+
Base for typed HTTP transport failures the chunker recognizes as
85+
transient.
86+
87+
Raised by :func:`dataretrieval.waterdata.utils._raise_for_non_200`
88+
and walked by :func:`dataretrieval.waterdata.chunking._classify_chunk_error`.
89+
One subclass per recoverable HTTP status family (429 → :class:`RateLimited`,
90+
5xx → :class:`ServiceUnavailable`); ``ChunkedCall`` wraps them as resumable
91+
:class:`~dataretrieval.waterdata.chunking.ChunkInterrupted` subclasses.
92+
93+
Parameters
94+
----------
95+
message : str
96+
Human-readable error message.
97+
retry_after : float, optional
98+
Seconds to wait before retrying, parsed from the
99+
``Retry-After`` response header.
100+
101+
Attributes
102+
----------
103+
retry_after : float or None
104+
Seconds to wait before retrying, parsed from the
105+
``Retry-After`` response header. ``None`` when the header was
106+
absent or unparseable.
107+
"""
108+
109+
def __init__(self, message: str, *, retry_after: float | None = None) -> None:
110+
super().__init__(message)
111+
self.retry_after = retry_after
112+
113+
114+
class RateLimited(_RetryableTransportError):
115+
"""
116+
A USGS Water Data API request was rejected with HTTP 429.
117+
118+
Exposed as a typed exception so callers (notably the multi-value
119+
chunker) can detect rate-limit failures via ``isinstance`` instead
120+
of string-matching error messages.
121+
"""
122+
123+
124+
class ServiceUnavailable(_RetryableTransportError):
125+
"""
126+
A USGS Water Data API request was rejected with HTTP 5xx.
127+
128+
Surfaced as a typed exception (parallel to :class:`RateLimited`)
129+
so ``ChunkedCall`` can treat transient server failures as
130+
resumable interruptions rather than fatal programmer errors.
131+
"""
132+
133+
134+
class RequestTooLarge(DataRetrievalError, ValueError):
135+
"""
136+
No chunking plan fits the URL byte limit.
137+
138+
Raised when even the smallest reducible plan (every list axis at
139+
singleton chunks and the filter at one clause per sub-request)
140+
still exceeds the server's byte limit. Shrink the input lists,
141+
simplify the filter, or split the call manually.
142+
"""

dataretrieval/utils.py

Lines changed: 38 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@
1010

1111
import dataretrieval
1212
from dataretrieval.codes import tz
13+
from dataretrieval.exceptions import (
14+
BadRequestError,
15+
NoSitesError,
16+
NotFoundError,
17+
RequestTooLargeError,
18+
ServiceUnavailableError,
19+
)
1320

1421
HTTPX_DEFAULTS = {
1522
"follow_redirects": True,
@@ -270,14 +277,42 @@ def __repr__(self) -> str:
270277
data_list.append(data) # append results to list"""
271278

272279

273-
def _url_too_long_error(detail: str) -> ValueError:
274-
return ValueError(
280+
def _url_too_long_error(detail: str) -> RequestTooLargeError:
281+
return RequestTooLargeError(
275282
"Request URL too long. Modify your query to use fewer sites. "
276283
f"{detail}. Pseudo-code example of how to split your query: "
277284
f"\n {_URL_TOO_LONG_EXAMPLE}"
278285
)
279286

280287

288+
def _raise_for_status(response: httpx.Response) -> None:
289+
"""Raise a typed :class:`DataRetrievalError` for an unsuccessful response.
290+
291+
Centralizes the HTTP-status-to-exception mapping for the shared
292+
:func:`query` path so every legacy service module (``wqp``, ``nwis``,
293+
``ngwmn``, ``nldi``) surfaces request failures the same way. A successful
294+
response returns ``None``. The raised types are also :class:`ValueError`
295+
subclasses, preserving this module's historical contract.
296+
"""
297+
status = response.status_code
298+
if status == 400:
299+
raise BadRequestError(
300+
f"Bad Request, check that your parameters are correct. URL: {response.url}"
301+
)
302+
elif status == 404:
303+
raise NotFoundError(
304+
"Page Not Found Error. May be the result of an empty query. "
305+
f"URL: {response.url}"
306+
)
307+
elif status == 414:
308+
raise _url_too_long_error(f"API response reason: {response.reason_phrase}")
309+
elif 500 <= status < 600:
310+
raise ServiceUnavailableError(
311+
f"Service Unavailable: {status} {response.reason_phrase}. "
312+
f"The service at {response.url} may be down or experiencing issues."
313+
)
314+
315+
281316
def query(url, payload, delimiter=",", ssl_check=True):
282317
"""Send a query.
283318
@@ -321,37 +356,9 @@ def query(url, payload, delimiter=",", ssl_check=True):
321356
except httpx.InvalidURL as exc:
322357
raise _url_too_long_error(f"httpx rejected the URL client-side: {exc}") from exc
323358

324-
if response.status_code == 400:
325-
raise ValueError(
326-
f"Bad Request, check that your parameters are correct. URL: {response.url}"
327-
)
328-
elif response.status_code == 404:
329-
raise ValueError(
330-
"Page Not Found Error. May be the result of an empty query. "
331-
+ f"URL: {response.url}"
332-
)
333-
elif response.status_code == 414:
334-
raise _url_too_long_error(f"API response reason: {response.reason_phrase}")
335-
elif 500 <= response.status_code < 600:
336-
raise ValueError(
337-
f"Service Unavailable: {response.status_code} {response.reason_phrase}. "
338-
+ f"The service at {response.url} may be down or experiencing issues."
339-
)
359+
_raise_for_status(response)
340360

341361
if response.text.startswith("No sites/data"):
342362
raise NoSitesError(response.url)
343363

344364
return response
345-
346-
347-
class NoSitesError(Exception):
348-
"""Custom error class used when selection criteria returns no sites/data."""
349-
350-
def __init__(self, url):
351-
self.url = url
352-
353-
def __str__(self):
354-
return (
355-
"No sites/data found using the selection criteria specified in "
356-
f"url: {self.url}"
357-
)

dataretrieval/waterdata/chunking.py

Lines changed: 7 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,12 @@
6666
import pandas as pd
6767
from anyio.from_thread import start_blocking_portal
6868

69+
from dataretrieval.exceptions import (
70+
DataRetrievalError,
71+
RateLimited,
72+
RequestTooLarge,
73+
ServiceUnavailable,
74+
)
6975
from dataretrieval.utils import HTTPX_DEFAULTS
7076

7177
from . import _progress
@@ -383,70 +389,7 @@ def _passthrough_result(
383389
return frame, response
384390

385391

386-
class _RetryableTransportError(RuntimeError):
387-
"""
388-
Base for typed HTTP transport failures the chunker recognizes as
389-
transient.
390-
391-
Raised by :func:`dataretrieval.waterdata.utils._raise_for_non_200`
392-
and walked by :func:`_classify_chunk_error`. One subclass per
393-
recoverable HTTP status family (429 → :class:`RateLimited`,
394-
5xx → :class:`ServiceUnavailable`); ``ChunkedCall`` wraps them as
395-
resumable :class:`ChunkInterrupted` subclasses.
396-
397-
Parameters
398-
----------
399-
message : str
400-
Human-readable error message.
401-
retry_after : float, optional
402-
Seconds to wait before retrying, parsed from the
403-
``Retry-After`` response header.
404-
405-
Attributes
406-
----------
407-
retry_after : float or None
408-
Seconds to wait before retrying, parsed from the
409-
``Retry-After`` response header. ``None`` when the header was
410-
absent or unparseable.
411-
"""
412-
413-
def __init__(self, message: str, *, retry_after: float | None = None) -> None:
414-
super().__init__(message)
415-
self.retry_after = retry_after
416-
417-
418-
class RateLimited(_RetryableTransportError):
419-
"""
420-
A USGS Water Data API request was rejected with HTTP 429.
421-
422-
Exposed as a typed exception so callers (notably the multi-value
423-
chunker) can detect rate-limit failures via ``isinstance`` instead
424-
of string-matching error messages.
425-
"""
426-
427-
428-
class ServiceUnavailable(_RetryableTransportError):
429-
"""
430-
A USGS Water Data API request was rejected with HTTP 5xx.
431-
432-
Surfaced as a typed exception (parallel to :class:`RateLimited`)
433-
so ``ChunkedCall`` can treat transient server failures as
434-
resumable interruptions rather than fatal programmer errors.
435-
"""
436-
437-
438-
class RequestTooLarge(ValueError):
439-
"""
440-
No chunking plan fits the URL byte limit.
441-
442-
Raised when even the smallest reducible plan (every list axis at
443-
singleton chunks and the filter at one clause per sub-request)
444-
still exceeds the server's byte limit. Shrink the input lists,
445-
simplify the filter, or split the call manually.
446-
"""
447-
448-
449-
class ChunkInterrupted(RuntimeError):
392+
class ChunkInterrupted(DataRetrievalError, RuntimeError):
450393
"""
451394
Base class for mid-stream chunk failures whose completed work is
452395
preserved and resumable.

dataretrieval/waterdata/utils.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,11 @@
2626
from anyio.from_thread import start_blocking_portal
2727

2828
from dataretrieval import __version__
29+
from dataretrieval.exceptions import RateLimited, ServiceUnavailable
2930
from dataretrieval.utils import HTTPX_DEFAULTS, BaseMetadata
3031
from dataretrieval.waterdata import _progress, chunking
3132
from dataretrieval.waterdata.chunking import (
3233
_QUOTA_HEADER,
33-
RateLimited,
34-
ServiceUnavailable,
3534
_safe_elapsed,
3635
get_active_client,
3736
)

0 commit comments

Comments
 (0)