Skip to content

Commit 88dd3f6

Browse files
thodson-usgsclaude
andcommitted
Tighten live-test selection: source-inspection on public getters
The "auto-mark every test that doesn't take requests_mock" heuristic was overkill — out of 244 collected tests it marked 200, including every pure unit test of internal helpers (rdb parsing, _to_str, CQL filter splitting, NWIS_Metadata setters, validation logic, etc). Marker over-application is cheap on the success path but conveys the wrong intent and is one more thing to keep in sync. Refined heuristic: a test is live iff (a) it does not take requests_mock as a fixture, AND (b) its function-body source references one of the package's public user-facing getters (the _PUBLIC_GETTERS set — all `get_*` / `what_*` entry points exposed by waterdata, wqp, samples, nadp, streamstats, nldi, nwis, and ngwmn modules). The set is maintained explicitly because there's no programmatic way for the conftest to know "what counts as a user-facing entry point"; adding a new public getter means adding it here. A regex with `\b` word boundaries avoids matching internal helpers that happen to share a prefix (e.g. `_get_args` won't match `get_args`). Dry-run against the current test inventory: - Old heuristic marked: 200 / 244 - New heuristic marked: 89 / 244 - Now-unmarked unit tests: 111 Spot-checked the 111 de-marked tests — all are genuinely offline: rdb_test.py parsers, utils_test.py to_str/BaseMetadata/datetime helpers, waterdata_filters_test.py CQL-filter logic, nldi_test.py validators, nwis_test.py metadata setters, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8906299 commit 88dd3f6

1 file changed

Lines changed: 93 additions & 11 deletions

File tree

tests/conftest.py

Lines changed: 93 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,97 @@
11
"""Auto-retry live-API tests on transient upstream errors.
22
33
After PR #273, paginated ``waterdata`` getters propagate mid-walk HTTP
4-
errors (429 / 5xx / connection drops) instead of silently truncating the
5-
result. That's the correct behavior for users but makes any CI test that
6-
hits the live USGS Water Data API susceptible to flaking on a transient
7-
upstream blip — e.g. the HTTP 502 Bad Gateway that broke CI on PR #273's
8-
merge to main.
9-
10-
Heuristic: any test that does NOT request the ``requests_mock`` fixture
11-
is treated as live and gets retried on transient-error patterns only.
12-
Library bugs raising other exception types still fail on the first try.
4+
errors (429 / 5xx / connection drops) instead of silently truncating —
5+
the right behavior, but it makes any CI test that calls the live USGS
6+
Water Data API susceptible to flaking on a transient upstream blip
7+
(e.g. the HTTP 502 Bad Gateway that broke CI on PR #273's merge).
8+
9+
Selection: a test is "live" iff (a) it does not request the
10+
``requests_mock`` fixture, AND (b) its function body references one of
11+
the package's public user-facing getters (the ``_PUBLIC_GETTERS`` set).
12+
This skips pure unit tests of internal helpers (``_get_args``,
13+
``_check_*``, ``_normalize_*``, ``_construct_api_requests``) that
14+
share the test file with live tests.
15+
16+
Retry: live tests are retried up to twice on a 5-second backoff,
17+
but only when the failure trace matches a narrow transient-upstream
18+
pattern. Library bugs raising other exception types still fail on
19+
the first try.
1320
"""
1421

22+
import inspect
23+
import re
24+
1525
import pytest
1626

17-
# Anchored loosely — the failure trace embeds the exception message
18-
# after a long preamble.
27+
# Public, network-doing entry points. The set is intentionally exhaustive
28+
# so the source-inspection heuristic catches every test that exercises
29+
# the library's user-facing API surface. Add new public getters here.
30+
_PUBLIC_GETTERS = frozenset(
31+
{
32+
# waterdata
33+
"get_channel",
34+
"get_combined_metadata",
35+
"get_continuous",
36+
"get_daily",
37+
"get_field_measurements",
38+
"get_field_measurements_metadata",
39+
"get_latest_continuous",
40+
"get_latest_daily",
41+
"get_monitoring_locations",
42+
"get_nearest_continuous",
43+
"get_peaks",
44+
"get_ratings",
45+
"get_reference_table",
46+
"get_time_series_metadata",
47+
"get_stats_por",
48+
"get_stats_date_range",
49+
# wqp / samples
50+
"get_results",
51+
"get_samples",
52+
"get_samples_summary",
53+
"what_activities",
54+
"what_activity_metrics",
55+
"what_detection_limits",
56+
"what_habitat_metrics",
57+
"what_organizations",
58+
"what_project_weights",
59+
"what_projects",
60+
"what_sites",
61+
# nadp
62+
"get_annual_MDN_map",
63+
"get_annual_NTN_map",
64+
"get_zip",
65+
# streamstats / nldi
66+
"get_basin",
67+
"get_features",
68+
"get_features_by_data_source",
69+
"get_flowlines",
70+
"get_sample_watershed",
71+
"get_watershed",
72+
# nwis (legacy)
73+
"get_discharge_measurements",
74+
"get_discharge_peaks",
75+
"get_dv",
76+
"get_gwlevels",
77+
"get_info",
78+
"get_iv",
79+
"get_pmcodes",
80+
"get_qwdata",
81+
"get_record",
82+
"get_stats",
83+
"get_water_use",
84+
# ngwmn
85+
"get_data",
86+
"get_levels",
87+
"get_sites",
88+
}
89+
)
90+
91+
_PUBLIC_GETTER_RE = re.compile(
92+
r"\b(?:" + "|".join(re.escape(n) for n in _PUBLIC_GETTERS) + r")\b"
93+
)
94+
1995
_TRANSIENT_PATTERNS = [
2096
r"RuntimeError:\s*(?:429|5\d\d):", # _raise_for_non_200 output
2197
r"ConnectionError", # requests/urllib3
@@ -32,6 +108,12 @@ def pytest_collection_modifyitems(config, items):
32108
for item in items:
33109
if "requests_mock" in item.fixturenames:
34110
continue
111+
try:
112+
src = inspect.getsource(item.function)
113+
except (OSError, TypeError):
114+
continue
115+
if not _PUBLIC_GETTER_RE.search(src):
116+
continue
35117
item.add_marker(
36118
pytest.mark.flaky(
37119
reruns=_MAX_RERUNS,

0 commit comments

Comments
 (0)