Skip to content

Commit 585b2a6

Browse files
thodson-usgsclaude
andcommitted
Clarify chunker docstrings to match URL+body sizing
Copilot review surfaced that ``_plan_chunks`` and ``multi_value_chunked`` docstrings advertised the budget as URL length, but the implementation gates on ``_request_bytes`` (URL + body) so POST routes are sized correctly. The public ``url_limit`` parameter name is kept for API stability; only the prose is updated to honestly describe what's measured. Also tightened ``_worst_case_args`` wording from "URL probe" to "request-size probe" for consistency. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5c9ee68 commit 585b2a6

1 file changed

Lines changed: 22 additions & 9 deletions

File tree

dataretrieval/waterdata/chunking.py

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -240,8 +240,9 @@ def _worst_case_args(
240240
each dim's largest chunk (by URL-encoded bytes), against the
241241
``probe_args`` already returned by ``_filter_aware_probe_args``
242242
(so any chunkable filter is at the inner chunker's bail-floor
243-
size). Driving the URL probe with this args dict tells the
244-
planner whether the *biggest* sub-request it would emit fits."""
243+
size). Driving the request-size probe with this args dict tells
244+
the planner whether the *biggest* sub-request it would emit fits
245+
the byte budget (URL + body, per ``_request_bytes``)."""
245246
out = dict(probe_args)
246247
for k, chunks in plan.items():
247248
out[k] = max(chunks, key=_chunk_bytes)
@@ -254,7 +255,12 @@ def _plan_chunks(
254255
url_limit: int,
255256
max_chunks: int | None = None,
256257
) -> dict[str, list[list[Any]]] | None:
257-
"""Greedy halving until the worst-case sub-request URL fits.
258+
"""Greedy halving until the worst-case sub-request fits ``url_limit``.
259+
260+
The budget is total request bytes (URL + body, via ``_request_bytes``)
261+
so POST routes are sized correctly — the public parameter name
262+
``url_limit`` is kept for API stability even though it caps URL +
263+
body, not URL alone.
258264
259265
Returns ``None`` when no chunking is needed (request as-is fits or
260266
no chunkable lists). Raises ``RequestTooLarge`` when:
@@ -341,12 +347,19 @@ def multi_value_chunked(
341347
quota_safety_floor: int | None = None,
342348
) -> Callable[[_FetchOnce], _FetchOnce]:
343349
"""Decorator that splits multi-value list params across sub-requests
344-
so each URL fits ``url_limit`` bytes (defaults to
345-
``filters._WATERDATA_URL_BYTE_LIMIT``) and the cartesian-product
346-
plan stays ≤ ``max_chunks`` sub-requests (defaults to
347-
``_DEFAULT_MAX_CHUNKS``). All defaults are resolved at call time so
348-
tests/users that patch the module constants affect this decorator
349-
uniformly.
350+
so each sub-request fits ``url_limit`` bytes (URL + body, per
351+
``_request_bytes``; defaults to ``filters._WATERDATA_URL_BYTE_LIMIT``)
352+
and the cartesian-product plan stays ≤ ``max_chunks`` sub-requests
353+
(defaults to ``_DEFAULT_MAX_CHUNKS``). All defaults are resolved at
354+
call time so tests/users that patch the module constants affect
355+
this decorator uniformly.
356+
357+
``url_limit`` caps total request bytes — the parameter name reflects
358+
the dominant GET case where body is empty, but POST routes (e.g.
359+
``monitoring-locations`` via CQL2 JSON) are sized against URL + body
360+
so they chunk correctly. Sizing against URL+body when the upstream
361+
limit is URL-only is conservative-safe (no false passes) but can
362+
over-chunk POSTs at the body's actual ceiling.
350363
351364
Between sub-requests the wrapper reads ``x-ratelimit-remaining`` from
352365
each response. If it drops below ``quota_safety_floor`` (default

0 commit comments

Comments
 (0)