docs(waterdata): editorial pass on chunking.py docs

thodson-usgs · claude · thodson-usgs · commit 79a90172b68c · 2026-05-27T16:21:43.000-05:00
Readability + accuracy:
- Module docstring: 'ChunkedCall iterates the joint cartesian product so
  every sub-request URL fits' attributed the fit guarantee to ChunkedCall,
  but that's ChunkPlan's job — reworded so ChunkPlan keeps each URL under
  budget and ChunkedCall fetches the resulting product.
- Dropped two duplicated explanations: the sparse-completion [0,2,5] example
  (kept on the class docstring, trimmed from __init__) and the 'no semaphore'
  note (kept in _run's docstring, trimmed from its inline comment).

Verified the docs carry no stale references after the async-only refactor +
renames: every :meth:/:func:/:class:/:attr: cross-ref resolves, the retry
defaults (4 / 0.5s / 30s / 60s) match the constants, and the only
'semaphore' mentions are correct negations (pool throttles, not a semaphore).

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/dataretrieval/waterdata/chunking.py b/dataretrieval/waterdata/chunking.py
@@ -4,9 +4,9 @@
 parameter (sites, parameter codes, …) plus the cql-text ``filter``,
 which splits along its top-level OR clauses. Any of them can fan the
 URL past the server's ~8 KB byte limit. ``ChunkPlan`` picks a fan-out
-for each axis that minimizes total sub-requests under the URL budget;
-``ChunkedCall`` iterates the joint cartesian product so every
-sub-request URL fits. Requests that already fit get a trivial
+for each axis that minimizes total sub-requests while keeping every
+sub-request URL under the budget; ``ChunkedCall`` fetches the resulting
+cartesian product of chunks. Requests that already fit get a trivial
 single-step plan — ``ChunkedCall`` has one code path either way.
 
 Concurrency: ``multi_value_chunked`` fans every pending sub-request out
@@ -1412,10 +1412,8 @@ def __init__(
         self.fetch = fetch
         self.retry_policy = retry_policy
         self.finalize = finalize
-        # Completed (frame, response) pairs keyed by sub-args index.
-        # Sparse so the gather can record scattered completions (e.g.
-        # indices [0, 2, 5] when 1/3/4 failed) and a subsequent
-        # ``resume()`` only re-issues the missing indices.
+        # Completed (frame, response) pairs keyed by sub-args index; sparse
+        # (gathered sub-requests complete out of order — see class docstring).
         self._chunks: dict[int, tuple[pd.DataFrame, httpx.Response]] = {}
 
     def record(self, index: int, pair: tuple[pd.DataFrame, httpx.Response]) -> None:
@@ -1669,8 +1667,7 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
         # ``httpx.Limits()`` defaults to ``max_connections=100`` — at higher
         # concurrency the pool would silently bottleneck the fan-out behind
         # that cap. Set it to the resolved concurrency so the pool *is* the
-        # throttle (``None`` for truly unbounded). No semaphore: we gather
-        # every pending sub-request and let the pool serialize.
+        # throttle (``None`` for truly unbounded).
         limits = httpx.Limits(
             max_connections=max_concurrent, max_keepalive_connections=max_concurrent
         )