Skip to content

Commit 79a9017

Browse files
thodson-usgsclaude
andcommitted
docs(waterdata): editorial pass on chunking.py docs
Readability + accuracy: - Module docstring: 'ChunkedCall iterates the joint cartesian product so every sub-request URL fits' attributed the fit guarantee to ChunkedCall, but that's ChunkPlan's job — reworded so ChunkPlan keeps each URL under budget and ChunkedCall fetches the resulting product. - Dropped two duplicated explanations: the sparse-completion [0,2,5] example (kept on the class docstring, trimmed from __init__) and the 'no semaphore' note (kept in _run's docstring, trimmed from its inline comment). Verified the docs carry no stale references after the async-only refactor + renames: every :meth:/:func:/:class:/:attr: cross-ref resolves, the retry defaults (4 / 0.5s / 30s / 60s) match the constants, and the only 'semaphore' mentions are correct negations (pool throttles, not a semaphore). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c507d62 commit 79a9017

1 file changed

Lines changed: 6 additions & 9 deletions

File tree

dataretrieval/waterdata/chunking.py

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@
44
parameter (sites, parameter codes, …) plus the cql-text ``filter``,
55
which splits along its top-level OR clauses. Any of them can fan the
66
URL past the server's ~8 KB byte limit. ``ChunkPlan`` picks a fan-out
7-
for each axis that minimizes total sub-requests under the URL budget;
8-
``ChunkedCall`` iterates the joint cartesian product so every
9-
sub-request URL fits. Requests that already fit get a trivial
7+
for each axis that minimizes total sub-requests while keeping every
8+
sub-request URL under the budget; ``ChunkedCall`` fetches the resulting
9+
cartesian product of chunks. Requests that already fit get a trivial
1010
single-step plan — ``ChunkedCall`` has one code path either way.
1111
1212
Concurrency: ``multi_value_chunked`` fans every pending sub-request out
@@ -1412,10 +1412,8 @@ def __init__(
14121412
self.fetch = fetch
14131413
self.retry_policy = retry_policy
14141414
self.finalize = finalize
1415-
# Completed (frame, response) pairs keyed by sub-args index.
1416-
# Sparse so the gather can record scattered completions (e.g.
1417-
# indices [0, 2, 5] when 1/3/4 failed) and a subsequent
1418-
# ``resume()`` only re-issues the missing indices.
1415+
# Completed (frame, response) pairs keyed by sub-args index; sparse
1416+
# (gathered sub-requests complete out of order — see class docstring).
14191417
self._chunks: dict[int, tuple[pd.DataFrame, httpx.Response]] = {}
14201418

14211419
def record(self, index: int, pair: tuple[pd.DataFrame, httpx.Response]) -> None:
@@ -1669,8 +1667,7 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
16691667
# ``httpx.Limits()`` defaults to ``max_connections=100`` — at higher
16701668
# concurrency the pool would silently bottleneck the fan-out behind
16711669
# that cap. Set it to the resolved concurrency so the pool *is* the
1672-
# throttle (``None`` for truly unbounded). No semaphore: we gather
1673-
# every pending sub-request and let the pool serialize.
1670+
# throttle (``None`` for truly unbounded).
16741671
limits = httpx.Limits(
16751672
max_connections=max_concurrent, max_keepalive_connections=max_concurrent
16761673
)

0 commit comments

Comments
 (0)