Skip to content

Commit 5b4c1f0

Browse files
thodson-usgsclaude
andauthored
test: unify a shared flaky-rerun marker; fix nwis collection-time call (#329)
The Ubuntu CI on main (run 27969553989, on the #326 merge) failed: a transient 503 from the live legacy NWIS site service hit the live tests in nwis_test.py. Windows passed by timing. Not a regression from #326 — the live tests and the offending call date to #62. Two things made the transient fatal: - nwis_test.py had no flaky-rerun marker (waterdata/ngwmn got one in #325), and - TestTZ fetched its sites in the class body (`sites, _ = what_sites(...)`), i.e. at collection time, where a 503 aborts the whole module and can never be reran (rerunfailures retries failed tests, not collection errors). Changes: - Add one shared marker, `conftest.flaky_api`, with a unified transient pattern list, and apply it to every live suite: nwis_test.py / waterdata_test.py / ngwmn_test.py (module-level) and utils_test.py::Test_query (class-level — the only other live suite, found by auditing each module behind a dead proxy). Replaces three drifted inline copies. - The status pattern now matches both the OGC path's "<status>:" and the legacy query path's "HTTP <status>" message shapes, so one list serves all four. This closes a latent gap: ngwmn's old marker omitted ServiceUnavailable and the chunked QuotaExhausted/ServiceInterrupted wrappers (#325 fixed waterdata but missed ngwmn), so a 503 in an ngwmn live test would have failed CI too. - Move TestTZ's class-body fetch into a class-scoped `sites` fixture so it runs at test time, where the marker can retry a transient. Everything else is mocked, module-skipped (nadp), or — the chunking ..._on_real_transport test — a deterministic local http.server, not external. Verified: nwis collection is network-free; the markers attach with the unified config; ngwmn now covers ServiceUnavailable/ServiceInterrupted; live Test_query and TestTZ pass. Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a7d7f55 commit 5b4c1f0

5 files changed

Lines changed: 53 additions & 40 deletions

File tree

tests/conftest.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,30 @@
1414

1515
import pytest
1616

17+
#: Trace patterns that ``pytest-rerunfailures`` retries on the live-API test
18+
#: modules: a transient upstream 429/5xx or dropped connection is retried,
19+
#: deterministic failures (assertion errors, 4xx, etc.) are not. The OGC engine
20+
#: renders a status error as ``"<status>: ..."`` while the legacy ``query`` path
21+
#: renders ``"HTTP <status> ..."``, so the status pattern allows either shape;
22+
#: the chunked fan-out wraps a transient sub-request as ``QuotaExhausted`` /
23+
#: ``ServiceInterrupted``.
24+
_TRANSIENT_RERUN_PATTERNS = [
25+
r"(?:RateLimited|ServiceUnavailable|RuntimeError):\s*(?:HTTP\s+)?(?:429|5\d\d)",
26+
r"(?:QuotaExhausted|ServiceInterrupted):",
27+
r"Connect(ion)?Error", # requests' ConnectionError + httpx' ConnectError
28+
r"ReadTimeout|ConnectTimeout|Timeout",
29+
]
30+
31+
#: Apply to a test module (``pytestmark = flaky_api``) or class (``@flaky_api``)
32+
#: that hits live USGS services, so a transient upstream failure is retried
33+
#: instead of failing CI. Mocked tests are unaffected — the patterns match only
34+
#: real round-trip error traces.
35+
flaky_api = pytest.mark.flaky(
36+
reruns=2,
37+
reruns_delay=5,
38+
only_rerun=_TRANSIENT_RERUN_PATTERNS,
39+
)
40+
1741

1842
def pytest_collection_modifyitems(config, items):
1943
"""Apply relaxed ``pytest-httpx`` strict-mode settings to every test

tests/ngwmn_test.py

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,11 @@
1616

1717
from dataretrieval import ngwmn
1818
from dataretrieval.utils import BaseMetadata
19+
from tests.conftest import flaky_api
1920

20-
pytestmark = pytest.mark.flaky(
21-
reruns=2,
22-
reruns_delay=5,
23-
only_rerun=[
24-
r"(?:RateLimited|RuntimeError):\s*(?:429|5\d\d):",
25-
r"Connect(ion)?Error",
26-
r"ReadTimeout|ConnectTimeout|Timeout",
27-
],
28-
)
21+
# These hit the live NGWMN OGC API, so retry transient upstream failures
22+
# rather than flaking CI (see ``conftest.flaky_api``).
23+
pytestmark = flaky_api
2924

3025
# A site with water-level, construction, and lithology records (per the R
3126
# dataRetrieval NGWMN examples), plus a non-USGS-agency id to exercise the

tests/nwis_test.py

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,18 @@
2424
preformat_peaks_response,
2525
what_sites,
2626
)
27+
from tests.conftest import flaky_api
2728

2829
START_DATE = "2018-01-24"
2930
END_DATE = "2018-01-25"
3031

3132
DATETIME_COL = "datetime"
3233
SITENO_COL = "site_no"
3334

35+
# Several tests in this module hit the live NWIS services, so retry a transient
36+
# upstream failure rather than failing CI (see ``conftest.flaky_api``).
37+
pytestmark = flaky_api
38+
3439

3540
def _load_mock_json(file_name):
3641
"""Helper to load mock JSON from tests/data."""
@@ -254,21 +259,27 @@ def test_get_record_defunct_service_water_use(self):
254259
class TestTZ:
255260
"""Tests relating to GitHub Issue #60."""
256261

257-
sites, _ = what_sites(stateCd="MD")
262+
@pytest.fixture(scope="class")
263+
def sites(self):
264+
# Fetch once per class, at test time (not at collection) so a transient
265+
# upstream failure is retried by the module ``flaky`` marker instead of
266+
# aborting collection — a class-body call cannot be reran.
267+
sites, _ = what_sites(stateCd="MD")
268+
return sites
258269

259-
def test_multiple_tz_01(self):
270+
def test_multiple_tz_01(self, sites):
260271
"""Test based on GitHub Issue #60 - error merging different time zones."""
261272
# this test fails before issue #60 is fixed
262-
iv, _ = get_iv(sites=self.sites.site_no.values[:25].tolist())
273+
iv, _ = get_iv(sites=sites.site_no.values[:25].tolist())
263274
# assert that the datetime column exists
264275
assert "datetime" in iv.index.names
265276
# assert that it is a datetime type
266277
assert isinstance(iv.index[0][1], datetime.datetime)
267278

268-
def test_multiple_tz_02(self):
279+
def test_multiple_tz_02(self, sites):
269280
"""Test based on GitHub Issue #60 - confirm behavior for same tz."""
270281
# this test passes before issue #60 is fixed
271-
iv, _ = get_iv(sites=self.sites.site_no.values[:20].tolist())
282+
iv, _ = get_iv(sites=sites.site_no.values[:20].tolist())
272283
# assert that the datetime column exists
273284
assert "datetime" in iv.index.names
274285
# assert that it is a datetime type

tests/utils_test.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@
66
import pytest
77

88
from dataretrieval import exceptions, nwis, utils
9+
from tests.conftest import flaky_api
910

1011

12+
# Test_query hits the live NWIS services (the rest of this module is mocked),
13+
# so retry transient upstream failures rather than flaking CI.
14+
@flaky_api
1115
class Test_query:
1216
"""Tests of the query function."""
1317

tests/waterdata_test.py

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -40,33 +40,12 @@
4040
_get_args,
4141
_normalize_str_iterable,
4242
)
43+
from tests.conftest import flaky_api
4344

44-
# Most tests in this module call the live USGS Water Data API. After
45-
# PR #273, transient upstream errors (5xx / 429 / connection drops)
46-
# propagate instead of silently truncating, which makes CI susceptible
47-
# to flaking on a brief upstream blip. Auto-retry such failures, but
48-
# only for the narrow set of transient-error trace patterns below —
49-
# library bugs raising other exception types still fail on the first
50-
# try. The marker is attached to every test in the module, but the
51-
# patterns match only traces produced by real network round-trips
52-
# (``_raise_for_non_200`` output, ``requests`` exceptions), so tests
53-
# using ``httpx_mock`` or ``mock.patch`` are no-ops for the rerun.
54-
pytestmark = pytest.mark.flaky(
55-
reruns=2,
56-
reruns_delay=5,
57-
only_rerun=[
58-
# Transient HTTP errors (429 / 5xx) on the direct path: RateLimited /
59-
# ServiceUnavailable carry a "<status>: ..." message (the RuntimeError
60-
# shape is kept for any legacy call site).
61-
r"(?:RateLimited|ServiceUnavailable|RuntimeError):\s*(?:429|5\d\d):",
62-
# The chunked fan-out wraps a transient sub-request as a ChunkInterrupted
63-
# subclass (QuotaExhausted for 429, ServiceInterrupted for 5xx), whose
64-
# message has no leading status token.
65-
r"(?:QuotaExhausted|ServiceInterrupted):",
66-
r"Connect(ion)?Error", # requests' ConnectionError + httpx' ConnectError
67-
r"ReadTimeout|ConnectTimeout|Timeout",
68-
],
69-
)
45+
# Most tests in this module call the live USGS Water Data API; transient
46+
# upstream errors propagate (since #273) instead of silently truncating, so
47+
# retry them rather than flaking CI (see ``conftest.flaky_api``).
48+
pytestmark = flaky_api
7049

7150

7251
@pytest.fixture(autouse=True)

0 commit comments

Comments
 (0)