Skip to content

Commit 0e10c50

Browse files
thodson-usgsclaude
andcommitted
Clarify chunker docstrings to match URL+body sizing
Copilot review surfaced that ``_plan_chunks`` and ``multi_value_chunked`` docstrings advertised the budget as URL length, but the implementation gates on ``_request_bytes`` (URL + body) so POST routes are sized correctly. The public ``url_limit`` parameter name is kept for API stability; only the prose is updated to honestly describe what's measured. Also tightened ``_worst_case_args`` wording from "URL probe" to "request-size probe" for consistency. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5c9ee68 commit 0e10c50

1 file changed

Lines changed: 17 additions & 7 deletions

File tree

dataretrieval/waterdata/chunking.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -237,11 +237,11 @@ def _worst_case_args(
237237
probe_args: dict[str, Any], plan: dict[str, list[list[Any]]]
238238
) -> dict[str, Any]:
239239
"""Args representing the worst-case sub-request the plan will issue:
240-
each dim's largest chunk (by URL-encoded bytes), against the
241-
``probe_args`` already returned by ``_filter_aware_probe_args``
242-
(so any chunkable filter is at the inner chunker's bail-floor
243-
size). Driving the URL probe with this args dict tells the
244-
planner whether the *biggest* sub-request it would emit fits."""
240+
each dim's largest chunk (by URL-encoded bytes), composed onto
241+
the ``probe_args`` already returned by ``_filter_aware_probe_args``
242+
so any chunkable filter sits at the inner chunker's bail-floor
243+
size. The planner feeds these args through ``_request_bytes`` to
244+
decide whether the biggest sub-request fits the budget."""
245245
out = dict(probe_args)
246246
for k, chunks in plan.items():
247247
out[k] = max(chunks, key=_chunk_bytes)
@@ -254,7 +254,11 @@ def _plan_chunks(
254254
url_limit: int,
255255
max_chunks: int | None = None,
256256
) -> dict[str, list[list[Any]]] | None:
257-
"""Greedy halving until the worst-case sub-request URL fits.
257+
"""Greedy halving until the worst-case sub-request fits ``url_limit``.
258+
259+
Budget is total request bytes (URL + body, via ``_request_bytes``)
260+
so POST routes size correctly — see ``multi_value_chunked`` for the
261+
parameter-name caveat.
258262
259263
Returns ``None`` when no chunking is needed (request as-is fits or
260264
no chunkable lists). Raises ``RequestTooLarge`` when:
@@ -341,13 +345,19 @@ def multi_value_chunked(
341345
quota_safety_floor: int | None = None,
342346
) -> Callable[[_FetchOnce], _FetchOnce]:
343347
"""Decorator that splits multi-value list params across sub-requests
344-
so each URL fits ``url_limit`` bytes (defaults to
348+
so each sub-request fits ``url_limit`` bytes (defaults to
345349
``filters._WATERDATA_URL_BYTE_LIMIT``) and the cartesian-product
346350
plan stays ≤ ``max_chunks`` sub-requests (defaults to
347351
``_DEFAULT_MAX_CHUNKS``). All defaults are resolved at call time so
348352
tests/users that patch the module constants affect this decorator
349353
uniformly.
350354
355+
``url_limit`` is enforced against total request bytes (URL + body,
356+
via ``_request_bytes``); the name reflects the dominant GET case
357+
where body is empty. POST routes (e.g. ``monitoring-locations`` via
358+
CQL2 JSON) are conservatively sized — never under-chunks, but may
359+
over-chunk at the body's true ceiling.
360+
351361
Between sub-requests the wrapper reads ``x-ratelimit-remaining`` from
352362
each response. If it drops below ``quota_safety_floor`` (default
353363
``_DEFAULT_QUOTA_SAFETY_FLOOR``), the wrapper raises ``QuotaExhausted``

0 commit comments

Comments
 (0)