Skip to content

Commit 5bc68a6

Browse files
thodson-usgsclaude
andcommitted
refactor(errors)!: unify request failures under a DataRetrievalError taxonomy
Before, an HTTP failure surfaced as a different exception type depending on which module made the request -- a ValueError (or bare Exception) on the legacy query() path, RuntimeError-based types on the waterdata path, a bare httpx.HTTPStatusError elsewhere -- so there was no single `except` for "any dataretrieval request failure". Introduce dataretrieval/exceptions.py (dependency-free, re-exported at top level as dataretrieval.<Name>), rooted at DataRetrievalError, with two intermediate bases that name the axes a caller reasons about: DataRetrievalError(Exception) |- BadRequestError(.., ValueError) # 400 |- NotFoundError(.., ValueError) # 404 |- RequestTooLarge(.., ValueError) # base: request too large to satisfy | |- URLTooLong # 414 / client-side URL reject | '- Unchunkable # chunker planner floor |- NoSitesError # empty result '- TransientError(.., RuntimeError) # base: retryable; carries retry_after |- RateLimited # 429 '- ServiceUnavailable # 5xx (both paths) - One type per condition, raised by both the legacy query() path and the Water Data chunker. Callers can catch a whole family (`except RequestTooLarge` / `except TransientError`); the chunker's retry check is a single isinstance(exc, TransientError). - query()'s inline status ladder is extracted into a reusable _raise_for_status(). - NoSitesError now subclasses DataRetrievalError (was Exception). - Built-in compatibility by kind: fatal client errors are also ValueError, transient transport errors also RuntimeError, so existing `except ValueError` / `except RuntimeError` handlers keep working. BREAKING CHANGES - The legacy query() path raises typed errors instead of ad-hoc ValueErrors (400 -> BadRequestError, 404 -> NotFoundError, 414/over-long URL -> URLTooLong). - A 5xx on the legacy query() path now raises ServiceUnavailable, a RuntimeError (was a ValueError): a transient server failure is a runtime condition, not a bad value. - The Water Data chunker's planner-floor error is Unchunkable (a RequestTooLarge subclass). - Import the transport types/bases from dataretrieval / dataretrieval.exceptions, not from dataretrieval.waterdata.chunking. Verified: 477 passed / 2 skipped, ruff clean; live API spot checks (404/400/ over-long URL raise the typed errors, 200 unaffected); all 21 example notebooks execute end-to-end against the live API (227/227 cells, 0 errors). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1adf174 commit 5bc68a6

8 files changed

Lines changed: 292 additions & 117 deletions

File tree

dataretrieval/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
except PackageNotFoundError:
66
__version__ = "version-unknown"
77

8+
from dataretrieval.exceptions import *
89
from dataretrieval.nadp import *
910
from dataretrieval.nwis import *
1011
from dataretrieval.samples import *

dataretrieval/exceptions.py

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
"""Exception taxonomy for ``dataretrieval``.
2+
3+
A failed request from any service module (``nwis``, ``wqp``, ``waterdata``,
4+
``nldi``, ...) raises a subclass of :class:`DataRetrievalError`, so a caller can
5+
handle any request failure with a single ``except dataretrieval.DataRetrievalError``.
6+
7+
The tree has two intermediate bases a caller can catch to span a whole family:
8+
:class:`RequestTooLarge` (the request can't fit, however it was issued) and
9+
:class:`TransientError` (a temporary failure worth retrying).
10+
11+
This module deliberately has no third-party dependencies, so any module can
12+
import it without pulling in pandas/httpx.
13+
"""
14+
15+
from __future__ import annotations
16+
17+
__all__ = [
18+
"DataRetrievalError",
19+
"BadRequestError",
20+
"NotFoundError",
21+
"RequestTooLarge",
22+
"URLTooLong",
23+
"Unchunkable",
24+
"NoSitesError",
25+
"TransientError",
26+
"RateLimited",
27+
"ServiceUnavailable",
28+
]
29+
30+
31+
class DataRetrievalError(Exception):
32+
"""Base class for errors raised when a request to a USGS or EPA web
33+
service fails.
34+
35+
Every service module (``nwis``, ``wqp``, ``waterdata``, ``nldi``, ...)
36+
raises a subclass of this when a request fails, so a caller can handle any
37+
request failure uniformly::
38+
39+
try:
40+
df, md = dataretrieval.wqp.get_results(...)
41+
except dataretrieval.DataRetrievalError:
42+
...
43+
44+
Subclasses also inherit from the built-in exception this package has
45+
historically raised for the condition's *kind* -- :class:`ValueError` for a
46+
request that can't succeed as written (bad params, too large), and
47+
:class:`RuntimeError` for a transient transport failure -- so existing
48+
``except ValueError`` / ``except RuntimeError`` handlers keep working.
49+
"""
50+
51+
52+
# --- Fatal client errors -------------------------------------------------
53+
# The request can't succeed as written; retrying it unchanged won't help. Each
54+
# is also a ``ValueError`` -- the built-in the legacy ``query`` path has always
55+
# raised -- so existing ``except ValueError`` handlers keep working.
56+
57+
58+
class BadRequestError(DataRetrievalError, ValueError):
59+
"""The service rejected the request parameters (HTTP 400)."""
60+
61+
62+
class NotFoundError(DataRetrievalError, ValueError):
63+
"""The requested resource was not found; often an empty query (HTTP 404)."""
64+
65+
66+
class RequestTooLarge(DataRetrievalError, ValueError):
67+
"""The request is too large for the service to satisfy.
68+
69+
A base for the two ways a request can exceed what the service accepts;
70+
catch it to handle either. The concrete subclasses are :class:`URLTooLong`
71+
(a single request the server rejected) and :class:`Unchunkable` (the Water
72+
Data chunker could not split the call small enough to fit).
73+
"""
74+
75+
76+
class URLTooLong(RequestTooLarge):
77+
"""A single request URL exceeded the service's limit (HTTP 414, or rejected
78+
client-side before it was sent).
79+
80+
Raised by the legacy ``query`` path, which issues one request without
81+
chunking. Remediation: query fewer sites, or split the call manually.
82+
"""
83+
84+
85+
class Unchunkable(RequestTooLarge):
86+
"""No chunking plan fits the URL byte limit.
87+
88+
Raised by the Water Data chunker when even the smallest reducible plan
89+
(every list axis at one atom per sub-request, the filter at one clause per
90+
sub-request) still exceeds the server's byte limit -- so unlike
91+
:class:`URLTooLong`, automatic splitting has already been tried and
92+
exhausted. Shrink the input lists, simplify the filter, or split the call
93+
manually.
94+
"""
95+
96+
97+
class NoSitesError(DataRetrievalError):
98+
"""The selection criteria matched no sites/data."""
99+
100+
def __init__(self, url):
101+
self.url = url
102+
103+
def __str__(self):
104+
return (
105+
"No sites/data found using the selection criteria specified in "
106+
f"url: {self.url}"
107+
)
108+
109+
110+
# --- Transient transport errors ------------------------------------------
111+
# The service was reachable but temporarily refused the request; the same call
112+
# may succeed if retried. Each is also a ``RuntimeError`` (the built-in the
113+
# waterdata path has always raised). The Water Data chunker recognizes them via
114+
# ``isinstance(exc, TransientError)`` and wraps them as resumable
115+
# ``ChunkInterrupted`` subclasses.
116+
117+
118+
class TransientError(DataRetrievalError, RuntimeError):
119+
"""Base for transient HTTP failures that are worth an automatic retry.
120+
121+
One subclass per recoverable HTTP status family (429 -> :class:`RateLimited`,
122+
5xx -> :class:`ServiceUnavailable`); the Water Data chunker recognizes them
123+
by this shared base and wraps them as resumable interruptions.
124+
125+
Parameters
126+
----------
127+
message : str
128+
Human-readable error message.
129+
retry_after : float, optional
130+
Seconds to wait before retrying, parsed from the ``Retry-After``
131+
response header; stored on the :attr:`retry_after` attribute (``None``
132+
when the header is absent or unparseable).
133+
"""
134+
135+
def __init__(self, message: str, *, retry_after: float | None = None) -> None:
136+
super().__init__(message)
137+
self.retry_after = retry_after
138+
139+
140+
class RateLimited(TransientError):
141+
"""A request was rejected with HTTP 429 (too many requests)."""
142+
143+
144+
class ServiceUnavailable(TransientError):
145+
"""A request was rejected with a server error (HTTP 5xx).
146+
147+
Raised by both the legacy ``query`` path and the Water Data path, so a 5xx
148+
surfaces as one type regardless of which subsystem issued the request.
149+
"""

dataretrieval/utils.py

Lines changed: 38 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,13 @@
1010

1111
import dataretrieval
1212
from dataretrieval.codes import tz
13+
from dataretrieval.exceptions import (
14+
BadRequestError,
15+
NoSitesError,
16+
NotFoundError,
17+
ServiceUnavailable,
18+
URLTooLong,
19+
)
1320

1421
HTTPX_DEFAULTS = {
1522
"follow_redirects": True,
@@ -270,14 +277,42 @@ def __repr__(self) -> str:
270277
data_list.append(data) # append results to list"""
271278

272279

273-
def _url_too_long_error(detail: str) -> ValueError:
274-
return ValueError(
280+
def _url_too_long_error(detail: str) -> URLTooLong:
281+
return URLTooLong(
275282
"Request URL too long. Modify your query to use fewer sites. "
276283
f"{detail}. Pseudo-code example of how to split your query: "
277284
f"\n {_URL_TOO_LONG_EXAMPLE}"
278285
)
279286

280287

288+
def _raise_for_status(response: httpx.Response) -> None:
289+
"""Map an unsuccessful HTTP status to a typed :class:`DataRetrievalError`;
290+
return ``None`` on success.
291+
292+
Shared by the legacy :func:`query` path. The 4xx types stay
293+
:class:`ValueError`-compatible (this path's historical contract), but a 5xx
294+
raises the transient :class:`ServiceUnavailable` (a :class:`RuntimeError`),
295+
since a server failure is retryable rather than a bad request.
296+
"""
297+
status = response.status_code
298+
if status == 400:
299+
raise BadRequestError(
300+
f"Bad Request, check that your parameters are correct. URL: {response.url}"
301+
)
302+
elif status == 404:
303+
raise NotFoundError(
304+
"Page Not Found Error. May be the result of an empty query. "
305+
f"URL: {response.url}"
306+
)
307+
elif status == 414:
308+
raise _url_too_long_error(f"API response reason: {response.reason_phrase}")
309+
elif 500 <= status < 600:
310+
raise ServiceUnavailable(
311+
f"Service Unavailable: {status} {response.reason_phrase}. "
312+
f"The service at {response.url} may be down or experiencing issues."
313+
)
314+
315+
281316
def query(url, payload, delimiter=",", ssl_check=True):
282317
"""Send a query.
283318
@@ -321,37 +356,9 @@ def query(url, payload, delimiter=",", ssl_check=True):
321356
except httpx.InvalidURL as exc:
322357
raise _url_too_long_error(f"httpx rejected the URL client-side: {exc}") from exc
323358

324-
if response.status_code == 400:
325-
raise ValueError(
326-
f"Bad Request, check that your parameters are correct. URL: {response.url}"
327-
)
328-
elif response.status_code == 404:
329-
raise ValueError(
330-
"Page Not Found Error. May be the result of an empty query. "
331-
+ f"URL: {response.url}"
332-
)
333-
elif response.status_code == 414:
334-
raise _url_too_long_error(f"API response reason: {response.reason_phrase}")
335-
elif 500 <= response.status_code < 600:
336-
raise ValueError(
337-
f"Service Unavailable: {response.status_code} {response.reason_phrase}. "
338-
+ f"The service at {response.url} may be down or experiencing issues."
339-
)
359+
_raise_for_status(response)
340360

341361
if response.text.startswith("No sites/data"):
342362
raise NoSitesError(response.url)
343363

344364
return response
345-
346-
347-
class NoSitesError(Exception):
348-
"""Custom error class used when selection criteria returns no sites/data."""
349-
350-
def __init__(self, url):
351-
self.url = url
352-
353-
def __str__(self):
354-
return (
355-
"No sites/data found using the selection criteria specified in "
356-
f"url: {self.url}"
357-
)

dataretrieval/waterdata/chunking.py

Lines changed: 13 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,13 @@
6666
import pandas as pd
6767
from anyio.from_thread import start_blocking_portal
6868

69+
from dataretrieval.exceptions import (
70+
DataRetrievalError,
71+
RateLimited,
72+
ServiceUnavailable,
73+
TransientError,
74+
Unchunkable,
75+
)
6976
from dataretrieval.utils import HTTPX_DEFAULTS
7077

7178
from . import _progress
@@ -383,70 +390,7 @@ def _passthrough_result(
383390
return frame, response
384391

385392

386-
class _RetryableTransportError(RuntimeError):
387-
"""
388-
Base for typed HTTP transport failures the chunker recognizes as
389-
transient.
390-
391-
Raised by :func:`dataretrieval.waterdata.utils._raise_for_non_200`
392-
and walked by :func:`_classify_chunk_error`. One subclass per
393-
recoverable HTTP status family (429 → :class:`RateLimited`,
394-
5xx → :class:`ServiceUnavailable`); ``ChunkedCall`` wraps them as
395-
resumable :class:`ChunkInterrupted` subclasses.
396-
397-
Parameters
398-
----------
399-
message : str
400-
Human-readable error message.
401-
retry_after : float, optional
402-
Seconds to wait before retrying, parsed from the
403-
``Retry-After`` response header.
404-
405-
Attributes
406-
----------
407-
retry_after : float or None
408-
Seconds to wait before retrying, parsed from the
409-
``Retry-After`` response header. ``None`` when the header was
410-
absent or unparseable.
411-
"""
412-
413-
def __init__(self, message: str, *, retry_after: float | None = None) -> None:
414-
super().__init__(message)
415-
self.retry_after = retry_after
416-
417-
418-
class RateLimited(_RetryableTransportError):
419-
"""
420-
A USGS Water Data API request was rejected with HTTP 429.
421-
422-
Exposed as a typed exception so callers (notably the multi-value
423-
chunker) can detect rate-limit failures via ``isinstance`` instead
424-
of string-matching error messages.
425-
"""
426-
427-
428-
class ServiceUnavailable(_RetryableTransportError):
429-
"""
430-
A USGS Water Data API request was rejected with HTTP 5xx.
431-
432-
Surfaced as a typed exception (parallel to :class:`RateLimited`)
433-
so ``ChunkedCall`` can treat transient server failures as
434-
resumable interruptions rather than fatal programmer errors.
435-
"""
436-
437-
438-
class RequestTooLarge(ValueError):
439-
"""
440-
No chunking plan fits the URL byte limit.
441-
442-
Raised when even the smallest reducible plan (every list axis at
443-
singleton chunks and the filter at one clause per sub-request)
444-
still exceeds the server's byte limit. Shrink the input lists,
445-
simplify the filter, or split the call manually.
446-
"""
447-
448-
449-
class ChunkInterrupted(RuntimeError):
393+
class ChunkInterrupted(DataRetrievalError, RuntimeError):
450394
"""
451395
Base class for mid-stream chunk failures whose completed work is
452396
preserved and resumable.
@@ -854,7 +798,7 @@ class ChunkPlan:
854798
855799
Raises
856800
------
857-
RequestTooLarge
801+
Unchunkable
858802
If the request needs chunking but even the singleton plan
859803
doesn't fit ``url_limit``.
860804
"""
@@ -923,7 +867,7 @@ def _plan(
923867
924868
Raises
925869
------
926-
RequestTooLarge
870+
Unchunkable
927871
If even the singleton plan (every axis at one atom per
928872
chunk) still exceeds ``url_limit``.
929873
"""
@@ -944,7 +888,7 @@ def _plan(
944888
biggest_axis, biggest_idx, biggest_size = axis, idx, size
945889

946890
if biggest_axis is None:
947-
raise RequestTooLarge(
891+
raise Unchunkable(
948892
f"Request exceeds {url_limit} bytes (URL + body) at the "
949893
f"smallest reducible plan (every axis at one atom per "
950894
f"sub-request). Reduce input sizes, shorten or simplify "
@@ -1119,7 +1063,7 @@ def _retryable(exc: BaseException) -> tuple[bool, float | None]:
11191063
``(retryable, retry_after)`` — the server ``Retry-After`` hint
11201064
(seconds) when the transient carried one, else ``None``.
11211065
"""
1122-
if isinstance(exc, (RateLimited, ServiceUnavailable)):
1066+
if isinstance(exc, TransientError):
11231067
return True, exc.retry_after
11241068
if isinstance(exc, httpx.TransportError):
11251069
return True, None
@@ -1708,7 +1652,7 @@ def multi_value_chunked(
17081652
17091653
Raises
17101654
------
1711-
RequestTooLarge
1655+
Unchunkable
17121656
If no plan can fit ``url_limit``.
17131657
ChunkInterrupted
17141658
On a mid-execution transient — 429, 5xx, or a bare transport

0 commit comments

Comments
 (0)