Skip to content

Commit 7cf9b87

Browse files
thodson-usgsclaude
andcommitted
Auto-retry live-API tests on transient upstream errors
#273 surfaced a transient HTTP 502 from upstream on its merge-to-main CI run that pre-PR-273 would have been silently swallowed by _walk_pages and turned into an empty DataFrame. The status-code-aware behavior is correct for users, but it makes every test that hits the live USGS Water Data API susceptible to flaking on a transient blip. This PR adds: - pytest-rerunfailures to the `test` optional dependency set. - tests/conftest.py that (a) registers a `live` pytest marker; (b) auto-applies it to every test that does not take `requests_mock` as a fixture (the existing mock-driven convention in this repo); and (c) configures live-marked tests to retry up to twice on a 5-second backoff — but ONLY when the failure trace matches one of a narrow set of transient-upstream patterns: `429:` / `5xx:` prefixes from `_raise_for_non_200`, `ConnectionError` shapes, and timeout strings from the requests/urllib3 stack. Library bugs raising other exception types are not retried — the `only_rerun` pattern set deliberately matches on the failure-text shape produced by `_raise_for_non_200` rather than on generic `RuntimeError`, so a real bug fails on the first attempt. Verified: pytest collection identifies 59 of 61 tests in waterdata_test.py as live (the two `*_mock_*` tests correctly stay unmarked). Offline tests (`_get_args`, `_check_*`, `_normalize_str_*`, `_construct_api_requests_*`) pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0a14ba5 commit 7cf9b87

3 files changed

Lines changed: 74 additions & 0 deletions

File tree

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
**05/15/2026:** Live-API tests are now auto-retried on transient upstream errors. PR #273 (above) intentionally surfaces mid-pagination HTTP failures to callers instead of silently truncating results, which made any CI test that hits the live USGS Water Data API susceptible to flaking on transient 429/5xx/connection blips — a 502 Bad Gateway from upstream broke CI on the #273 merge to main. A new `tests/conftest.py` registers a `live` pytest marker, auto-applies it to every test that does not take `requests_mock` (the existing mock-driven convention), and via `pytest-rerunfailures` retries those tests up to twice on a 5-second backoff — but only when the failure trace matches a narrow transient-upstream pattern (`429:` / `5xx:` prefixes from `_raise_for_non_200`, plus `ConnectionError` / timeout shapes). Library bugs raising other exception types still fail on the first attempt.
2+
13
**05/14/2026:** Fixed two latent bugs in the paginated `waterdata` request loop (`_walk_pages` and `get_stats_data`). Previously, when `requests.Session.request(...)` itself raised mid-pagination (network error, timeout), the except block called `_error_body()` on the *prior page's* response, so the logged "error" described the wrong request and could itself crash on non-JSON bodies. Separately, no status-code check was performed on subsequent paginated responses, so a 5xx body that didn't include `numberReturned` was silently treated as an empty page — pagination quietly stopped and the user got truncated data with no error logged. The loop now status-checks each page like the initial request and reports the actual exception. The "best-effort" behavior (return whatever pages were collected) is preserved.
24

35
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ dataretrieval = ["py.typed"]
3535
test = [
3636
"pytest > 5.0.0",
3737
"pytest-cov[all]",
38+
"pytest-rerunfailures",
3839
"coverage",
3940
"requests-mock",
4041
"ruff",

tests/conftest.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
"""Test infrastructure shared across all test modules.
2+
3+
After #273 the paginated ``waterdata`` getters surface mid-walk HTTP
4+
errors (429 / 5xx / connection drops) to the caller instead of silently
5+
truncating the result. That's the correct behavior for users — but it
6+
makes any CI test that hits the live USGS Water Data API susceptible to
7+
flaking on a transient upstream blip (e.g. the 502 Bad Gateway that
8+
broke CI on #273's merge to main).
9+
10+
This file:
11+
12+
* registers a ``live`` pytest marker;
13+
* auto-applies it to every test that does **not** take ``requests_mock``
14+
as a fixture — the existing convention in this repo for mock-driven
15+
tests, so the marker tracks "this test hits the network" without
16+
needing to decorate 35 functions by hand;
17+
* via ``pytest-rerunfailures``, configures live-marked tests to retry
18+
up to twice (5-second backoff) ONLY when the failure trace matches a
19+
transient-upstream pattern: ``429:`` / ``5xx:`` prefixes that
20+
``_raise_for_non_200`` produces, plus ``ConnectionError`` / timeout
21+
shapes from the ``requests`` library.
22+
23+
Library bugs that raise unrelated exception types are NOT retried —
24+
the regex set deliberately omits generic ``RuntimeError`` matches.
25+
"""
26+
27+
import pytest
28+
29+
# Match anywhere in the failure traceback. Anchor-free because the
30+
# trace embeds the exception message after a long preamble.
31+
_TRANSIENT_PATTERNS = [
32+
r"RuntimeError:\s*(?:429|5\d\d):", # _raise_for_non_200 output
33+
r"ConnectionError", # requests/urllib3
34+
r"ReadTimeout|ConnectTimeout|Timeout",
35+
r"timed out",
36+
r"Bad Gateway|Service Unavailable|Gateway Timeout",
37+
]
38+
39+
_MAX_RERUNS = 2
40+
_RERUN_DELAY_SEC = 5
41+
42+
43+
def pytest_configure(config):
44+
"""Register the ``live`` marker so pytest doesn't warn about it."""
45+
config.addinivalue_line(
46+
"markers",
47+
"live: marks tests that hit the live USGS Water Data API; "
48+
"retried up to twice on transient upstream HTTP errors.",
49+
)
50+
51+
52+
def pytest_collection_modifyitems(config, items):
53+
"""Auto-mark live-API tests with ``live`` + ``flaky`` retry config.
54+
55+
Heuristic: any test that does NOT request the ``requests_mock``
56+
fixture is treated as live. This catches every test in
57+
``waterdata_test.py`` and similar modules without needing to
58+
decorate them individually, and it auto-tracks new tests written
59+
in the same style.
60+
"""
61+
for item in items:
62+
if "requests_mock" in getattr(item, "fixturenames", ()):
63+
continue
64+
item.add_marker(pytest.mark.live)
65+
item.add_marker(
66+
pytest.mark.flaky(
67+
reruns=_MAX_RERUNS,
68+
reruns_delay=_RERUN_DELAY_SEC,
69+
only_rerun=_TRANSIENT_PATTERNS,
70+
)
71+
)

0 commit comments

Comments
 (0)