Skip to content

Commit 7aeacd5

Browse files
thodson-usgsclaude
andcommitted
Return latest paginated/chunked response so QuotaExhausted floor sees current quota
The ``multi_value_chunked`` decorator reads ``x-ratelimit-remaining`` from the response returned by ``fetch_once(sub_args)`` to honor its documented ``QuotaExhausted`` safety floor. That response was two layers stale: 1. ``_walk_pages`` captured ``initial_response = resp`` before pagination and returned it, so any sub-request with N > 1 pages bubbled up only the first page's headers — the loop already kept overwriting ``resp`` each iteration; we just weren't returning the latest. 2. ``_combine_chunk_responses`` returned ``responses[0]`` with summed ``elapsed``, so when ``filters.chunked`` fanned out a long OR-filter into N sub-chunks the outer wrapper only saw the first sub-chunk's headers. Composed, the staleness gap per outer chunk was ``inner_chunks × pages_per_inner_chunk − 1`` HTTP requests of quota consumption the chunker was blind to. For the canonical workload (chained query, long site list, paginated filter) that gap easily exceeds the default floor of 50, so the guard never tripped — users hit ``RuntimeError("429: Too many requests...")`` from ``_raise_for_non_200`` instead of the structured ``QuotaExhausted`` with ``partial_frame``/``completed_chunks`` they were promised. Fix both layers: ``_walk_pages`` returns the latest ``resp`` (which the loop was already maintaining), and ``_combine_chunk_responses`` returns ``responses[-1]`` (with ``elapsed`` summed onto it instead of onto ``responses[0]``). Both changes match ``QuotaExhausted.partial_response``'s docstring ("metadata for the last successful sub-request"). Same fix applied to the parallel pagination loop in the stats helper for consistency. No behavior change for single-page mocked tests (initial == latest). 209 waterdata unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a9cf2d7 commit 7aeacd5

2 files changed

Lines changed: 16 additions & 14 deletions

File tree

dataretrieval/waterdata/filters.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -268,15 +268,20 @@ def _combine_chunk_frames(frames: list[pd.DataFrame]) -> pd.DataFrame:
268268
def _combine_chunk_responses(
269269
responses: list[requests.Response],
270270
) -> requests.Response:
271-
"""Return one response: first chunk's URL/headers + summed ``elapsed``.
271+
"""Return one response: last chunk's URL/headers + summed ``elapsed``.
272272
273-
Mutates the first response in place (only ``elapsed``); downstream only
273+
Returning the latest sub-response (rather than the first) preserves
274+
current rate-limit headers (e.g. ``x-ratelimit-remaining``), which the
275+
outer ``multi_value_chunked`` decorator inspects to honor its
276+
``QuotaExhausted`` safety floor between sub-requests.
277+
278+
Mutates the last response in place (only ``elapsed``); downstream only
274279
reads ``elapsed`` (in ``BaseMetadata.query_time``), URL, and headers.
275280
"""
276-
head = responses[0]
281+
tail = responses[-1]
277282
if len(responses) > 1:
278-
head.elapsed = sum((r.elapsed for r in responses[1:]), start=head.elapsed)
279-
return head
283+
tail.elapsed = sum((r.elapsed for r in responses[:-1]), start=tail.elapsed)
284+
return tail
280285

281286

282287
_FetchOnce = TypeVar(

dataretrieval/waterdata/utils.py

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -643,7 +643,10 @@ def _walk_pages(
643643
pd.DataFrame
644644
A DataFrame containing the aggregated results from all pages.
645645
requests.Response
646-
The initial response object containing metadata about the first request.
646+
The latest response from the pagination walk. Returning the most
647+
recent response (not the first) lets downstream callers observe
648+
current rate-limit headers (e.g. ``x-ratelimit-remaining``) on
649+
which the multi-value chunker's ``QuotaExhausted`` guard relies.
647650
648651
Raises
649652
------
@@ -666,9 +669,6 @@ def _walk_pages(
666669
resp = client.send(req)
667670
_raise_for_non_200(resp)
668671

669-
# Store the initial response for metadata
670-
initial_response = resp
671-
672672
# Grab some aspects of the original request: headers and the
673673
# request type (GET or POST)
674674
method = req.method.upper()
@@ -697,7 +697,7 @@ def _walk_pages(
697697
curr_url = None
698698

699699
# Concatenate all pages at once for efficiency
700-
return pd.concat(dfs, ignore_index=True), initial_response
700+
return pd.concat(dfs, ignore_index=True), resp
701701
finally:
702702
if close_client:
703703
client.close()
@@ -1128,9 +1128,6 @@ def get_stats_data(
11281128
resp = client.send(req)
11291129
_raise_for_non_200(resp)
11301130

1131-
# Store the initial response for metadata
1132-
initial_response = resp
1133-
11341131
# Grab some aspects of the original request: headers and the
11351132
# request type (GET or POST)
11361133
method = req.method.upper()
@@ -1173,7 +1170,7 @@ def get_stats_data(
11731170
if expand_percentiles:
11741171
dfs = _expand_percentiles(dfs)
11751172

1176-
return dfs, BaseMetadata(initial_response)
1173+
return dfs, BaseMetadata(resp)
11771174
finally:
11781175
if close_client:
11791176
client.close()

0 commit comments

Comments
 (0)