Skip to content

Commit 84d1d33

Browse files
committed
Merge remote-tracking branch 'upstream/main' into multivalue-chunker
# Conflicts: # NEWS.md # dataretrieval/waterdata/utils.py # tests/waterdata_test.py
2 parents 0768245 + 36866a0 commit 84d1d33

8 files changed

Lines changed: 240 additions & 23 deletions

File tree

.github/workflows/python-package.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ jobs:
1313
lint:
1414
runs-on: ubuntu-latest
1515
steps:
16-
- uses: actions/checkout@v4
16+
- uses: actions/checkout@v6
1717
- name: Set up Python 3.14
18-
uses: actions/setup-python@v5
18+
uses: actions/setup-python@v6
1919
with:
2020
python-version: "3.14"
2121
cache: "pip"
@@ -36,9 +36,9 @@ jobs:
3636
python-version: ["3.9", "3.13", "3.14"]
3737

3838
steps:
39-
- uses: actions/checkout@v4
39+
- uses: actions/checkout@v6
4040
- name: Set up Python ${{ matrix.python-version }}
41-
uses: actions/setup-python@v5
41+
uses: actions/setup-python@v6
4242
with:
4343
python-version: ${{ matrix.python-version }}
4444
cache: "pip"

.github/workflows/python-publish.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ jobs:
2121
runs-on: ubuntu-latest
2222

2323
steps:
24-
- uses: actions/checkout@v4
24+
- uses: actions/checkout@v6
2525
- name: Set up Python
26-
uses: actions/setup-python@v5
26+
uses: actions/setup-python@v6
2727
with:
2828
python-version: '3.x'
2929
cache: 'pip'

.github/workflows/sphinx-docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ jobs:
1111
runs-on: ubuntu-latest
1212
steps:
1313
- name: Checkout
14-
uses: actions/checkout@v4
14+
uses: actions/checkout@v6
1515
with:
1616
persist-credentials: false
1717
- name: Set up Python
18-
uses: actions/setup-python@v5
18+
uses: actions/setup-python@v6
1919
with:
2020
python-version: "3.13"
2121
cache: "pip"

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
**05/15/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. The chunker coordinates with the existing CQL `filter` chunker (long top-level-`OR` filters still split correctly when used alongside long multi-value lists), caps cartesian-product plans at 1000 sub-requests (the default USGS hourly quota), and aborts mid-call with a structured `QuotaExhausted` exception — carrying the partial result and a resume offset — if `x-ratelimit-remaining` drops below a safety floor. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N dimensions.
22

3+
**05/14/2026:** Fixed two latent bugs in the paginated `waterdata` request loop (`_walk_pages` and `get_stats_data`). Previously, when `requests.Session.request(...)` itself raised mid-pagination (network error, timeout), the except block called `_error_body()` on the *prior page's* response, so the logged "error" described the wrong request and could itself crash on non-JSON bodies. Separately, no status-code check was performed on subsequent paginated responses, so a 5xx body that didn't include `numberReturned` was silently treated as an empty page — pagination quietly stopped and the user got truncated data with no error logged. The loop now status-checks each page like the initial request and reports the actual exception. The "best-effort" behavior (return whatever pages were collected) is preserved.
4+
35
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
46

57
**05/07/2026:** `waterdata.get_samples()` and `wqp.get_results()` now append a derived `<prefix>DateTime` UTC column for every Date/Time/TimeZone triplet in the response (e.g. `Activity_StartDate` + `Activity_StartTime` + `Activity_StartTimeZone``Activity_StartDateTime`). Both the WQX3 (`<X>Date`/`<X>Time`/`<X>TimeZone`) and legacy WQP (`<X>Date`/`<X>Time/Time`/`<X>Time/TimeZoneCode`) shapes are recognized; abbreviations like EST/EDT/CST/PST resolve to a UTC `Timestamp`, unknown codes resolve to `NaT`, and the original triplet columns are preserved. Returned rows are also now sorted by `Activity_StartDateTime` (or the legacy `ActivityStartDateTime`) — the underlying APIs return rows in an unstable order. Mirrors R's `create_dateTime` and end-of-pipeline sort. Closes #266.

dataretrieval/waterdata/utils.py

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,18 @@ def _error_body(resp: requests.Response):
410410
)
411411

412412

413+
def _raise_for_non_200(resp: requests.Response) -> None:
414+
"""Raise ``RuntimeError(_error_body(resp))`` if ``resp`` is not 200.
415+
416+
Routes through ``_error_body`` (USGS-API-aware: handles 429/403
417+
specially, extracts ``code``/``description`` from JSON error bodies)
418+
rather than ``Response.raise_for_status``, which raises
419+
``HTTPError`` with a generic message.
420+
"""
421+
if resp.status_code != 200:
422+
raise RuntimeError(_error_body(resp))
423+
424+
413425
def _construct_api_requests(
414426
service: str,
415427
properties: list[str] | None = None,
@@ -464,12 +476,12 @@ def _construct_api_requests(
464476

465477
if service in _CQL2_REQUIRED_SERVICES:
466478
# POST with CQL2 JSON: multi-value params go in the request body.
479+
# The date-range loop above has already collapsed any _DATE_RANGE_PARAMS
480+
# value to a string, so the list/tuple check below cannot match them.
467481
post_params = {
468482
k: v
469483
for k, v in kwargs.items()
470-
if k not in _DATE_RANGE_PARAMS
471-
and isinstance(v, (list, tuple))
472-
and len(v) > 1
484+
if isinstance(v, (list, tuple)) and len(v) > 1
473485
}
474486
params = {k: v for k, v in kwargs.items() if k not in post_params}
475487
else:
@@ -652,8 +664,7 @@ def _walk_pages(
652664
client = client or requests.Session()
653665
try:
654666
resp = client.send(req)
655-
if resp.status_code != 200:
656-
raise RuntimeError(_error_body(resp))
667+
_raise_for_non_200(resp)
657668

658669
# Store the initial response for metadata
659670
initial_response = resp
@@ -675,11 +686,11 @@ def _walk_pages(
675686
headers=headers,
676687
data=content if method == "POST" else None,
677688
)
689+
_raise_for_non_200(resp)
678690
dfs.append(_get_resp_data(resp, geopd=geopd))
679691
curr_url = _next_req_url(resp)
680-
except Exception: # noqa: BLE001
681-
error_text = _error_body(resp)
682-
logger.error("Request incomplete. %s", error_text)
692+
except Exception as e: # noqa: BLE001
693+
logger.error("Request incomplete: %s", e)
683694
logger.warning(
684695
"Request failed for URL: %s. Data download interrupted.", curr_url
685696
)
@@ -1115,8 +1126,7 @@ def get_stats_data(
11151126

11161127
try:
11171128
resp = client.send(req)
1118-
if resp.status_code != 200:
1119-
raise RuntimeError(_error_body(resp))
1129+
_raise_for_non_200(resp)
11201130

11211131
# Store the initial response for metadata
11221132
initial_response = resp
@@ -1142,14 +1152,17 @@ def get_stats_data(
11421152
params=args,
11431153
headers=headers,
11441154
)
1155+
_raise_for_non_200(resp)
11451156
body = resp.json()
11461157
all_dfs.append(_handle_stats_nesting(body, geopd=GEOPANDAS))
11471158
next_token = body["next"]
1148-
except Exception: # noqa: BLE001
1149-
error_text = _error_body(resp)
1150-
logger.error("Request incomplete. %s", error_text)
1159+
except Exception as e: # noqa: BLE001
1160+
logger.error("Request incomplete: %s", e)
11511161
logger.warning(
1152-
"Request failed for URL: %s. Data download interrupted.", resp.url
1162+
"Request failed for URL: %s (next_token=%s). "
1163+
"Data download interrupted.",
1164+
url,
1165+
next_token,
11531166
)
11541167
next_token = None
11551168

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ dataretrieval = ["py.typed"]
3535
test = [
3636
"pytest > 5.0.0",
3737
"pytest-cov[all]",
38+
"pytest-rerunfailures",
3839
"coverage",
3940
"requests-mock",
4041
"ruff",

tests/waterdata_test.py

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import datetime
2+
import json
23
import sys
34
from unittest import mock
45

@@ -45,6 +46,26 @@
4546
_normalize_str_iterable,
4647
)
4748

49+
# Most tests in this module call the live USGS Water Data API. After
50+
# PR #273, transient upstream errors (5xx / 429 / connection drops)
51+
# propagate instead of silently truncating, which makes CI susceptible
52+
# to flaking on a brief upstream blip. Auto-retry such failures, but
53+
# only for the narrow set of transient-error trace patterns below —
54+
# library bugs raising other exception types still fail on the first
55+
# try. The marker is attached to every test in the module, but the
56+
# patterns match only traces produced by real network round-trips
57+
# (``_raise_for_non_200`` output, ``requests`` exceptions), so tests
58+
# using ``requests_mock`` or ``mock.patch`` are no-ops for the rerun.
59+
pytestmark = pytest.mark.flaky(
60+
reruns=2,
61+
reruns_delay=5,
62+
only_rerun=[
63+
r"RuntimeError:\s*(?:429|5\d\d):", # _raise_for_non_200 output
64+
r"ConnectionError",
65+
r"ReadTimeout|ConnectTimeout|Timeout",
66+
],
67+
)
68+
4869

4970
def mock_request(requests_mock, request_url, file_path):
5071
"""Mock request code"""
@@ -142,7 +163,18 @@ def test_construct_api_requests_monitoring_locations_post():
142163
hydrologic_unit_code=["010802050102", "010802050103"],
143164
)
144165
assert req.method == "POST"
145-
assert req.body is not None
166+
assert req.headers["Content-Type"] == "application/query-cql-json"
167+
168+
body = json.loads(req.body)
169+
# Top-level shape: AND over a list of per-param predicates.
170+
assert body["op"] == "and"
171+
assert isinstance(body["args"], list) and len(body["args"]) == 1
172+
173+
# The single predicate is an IN over hydrologic_unit_code with both values.
174+
predicate = body["args"][0]
175+
assert predicate["op"] == "in"
176+
assert predicate["args"][0] == {"property": "hydrologic_unit_code"}
177+
assert predicate["args"][1] == ["010802050102", "010802050103"]
146178

147179

148180
def test_construct_api_requests_single_value_stays_get():

0 commit comments

Comments
 (0)