refactor(waterdata): single gated path and one canonical pool-timeout rationale

thodson-usgs · claude · thodson-usgs · commit 8a3885be7bd3 · 2026-06-09T23:02:17.000-05:00
Post-review cleanup of the semaphore gate:

- Collapse the None-vs-Semaphore fork: "unbounded" is now a degenerate
  cap at the plan total (a gate that can never block), so the gated
  fetch is the only code path — matching the chunker's no-special-branch
  style elsewhere.
- Keep the full why-not-the-pool rationale only on ChunkedCall._run;
  the module docstring, inline comment, and test docstrings now point
  there instead of retelling it, and prose no longer hardcodes the 60 s
  pool-timeout literal that belongs to HTTPX_DEFAULTS.
- Merge the two concurrency-probe tests into one parametrized test and
  trim the e2e stall repro from 6 sub-requests x 0.7 s to 4 x 0.5 s
  (~1 s saved per suite run, same &gt;=2x timeout margin).

Both regression cases still fail against the pre-fix chunker
(verified by running them with HEAD~1's chunking.py).

Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/dataretrieval/waterdata/chunking.py b/dataretrieval/waterdata/chunking.py
@@ -10,24 +10,18 @@
 single-step plan — ``ChunkedCall`` has one code path either way.
 
 Concurrency: ``multi_value_chunked`` fans every pending sub-request out
-under one ``asyncio.gather`` sharing a single ``httpx.AsyncClient``; an
-``asyncio.Semaphore`` caps the sub-requests in flight at ``N``, and the
-client's connection pool is sized to match
-(``httpx.Limits(max_connections=N, max_keepalive_connections=N)``) so
-the in-flight sub-requests reuse keepalive connections. The semaphore —
-not the pool — is the throttle: excess sub-requests park on the
-semaphore instead of queueing on connection acquisition, whose wait
-counts against the client's pool-acquire timeout (``HTTPX_DEFAULTS``)
-and would otherwise expire the queued tail of the fan-out whenever
-every connection stays busy past it (see :meth:`ChunkedCall._run`).
-``API_USGS_CONCURRENT`` resolves ``N``: an
-integer N > 1 allows N sub-requests in flight; ``1`` pins sequential
-dispatch (one request at a time); the literal ``unbounded`` removes
-the cap (``N=None``). The default (16) is the server-friendly sweet
-spot; higher values can trip USGS burst-protection 5xx in practice. The
-fan-out runs in a short-lived worker thread (an ``anyio`` blocking
-portal), so it works whether or not the caller is already inside an
-event loop (Jupyter / IPython / async apps).
+under one ``asyncio.gather`` sharing a single ``httpx.AsyncClient``. An
+``asyncio.Semaphore`` — not the client's connection pool, which is
+merely sized to match — caps the sub-requests in flight at ``N``; see
+:meth:`ChunkedCall._run` for why the gate must be the semaphore rather
+than the pool. ``API_USGS_CONCURRENT`` resolves ``N``: an integer N > 1
+allows N sub-requests in flight; ``1`` pins sequential dispatch (one
+request at a time); the literal ``unbounded`` lifts the cap. The
+default (16) is the server-friendly sweet spot; higher values can trip
+USGS burst-protection 5xx in practice. The fan-out runs in a
+short-lived worker thread (an ``anyio`` blocking portal), so it works
+whether or not the caller is already inside an event loop (Jupyter /
+IPython / async apps).
 
 Retries: each sub-request is retried on a transient failure (429,
 5xx, connect/read timeout) with exponential backoff + full jitter,
@@ -1560,15 +1554,16 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
 
         The gather dispatches *every* pending sub-request, but an
         ``asyncio.Semaphore`` gates the fetches so at most
-        ``N = max_concurrent`` are in flight (``None`` for unbounded);
+        ``N = max_concurrent`` are in flight (``None`` degenerates to a
+        cap at the plan total — a gate that can never block);
         ``N=1`` is just a sequential gather (one request at a time) and
         ``total <= 1`` is just a one-element gather. The client's
         connection pool is sized to match
         (``httpx.Limits(max_connections=N, max_keepalive_connections=N)``)
         so in-flight sub-requests reuse keepalive connections. The
         semaphore must be the throttle, not the pool, for two reasons.
         First, time spent queued on connection acquisition counts
-        against the client's pool-acquire timeout (60 s via
+        against the client's pool-acquire timeout (from
         ``HTTPX_DEFAULTS``); a queued waiter's clock only resets when
         some response completes and httpcore reassigns the freed
         connection. So whenever every pooled connection stays busy past
@@ -1592,7 +1587,8 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
         ----------
         max_concurrent : int or None
             Maximum sub-requests in flight (the semaphore value, and the
-            connection-pool size). ``None`` disables the cap.
+            connection-pool size). ``None`` lifts the cap: the gate
+            degenerates to the plan total and the pool goes unbounded.
 
         Returns
         -------
@@ -1614,14 +1610,15 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
         # The semaphore is the throttle; the pool is merely sized to match
         # it (the ``httpx.Limits()`` defaults — ``max_connections=100``,
         # keepalive 20 — would bottleneck or churn connections under a
-        # different cap). Why the gate can't be the pool itself is in the
-        # method docstring: pool-acquire waits count against the client's
-        # pool timeout, semaphore waits don't.
+        # different cap). See the method docstring for why the gate can't
+        # be the pool itself. ``unbounded`` (``max_concurrent=None``) is a
+        # degenerate cap at the plan total — a semaphore that can never
+        # block — so gated is the only code path.
         limits = httpx.Limits(
             max_connections=max_concurrent, max_keepalive_connections=max_concurrent
         )
-        semaphore = (
-            None if max_concurrent is None else asyncio.Semaphore(max_concurrent)
+        semaphore = asyncio.Semaphore(
+            self.plan.total if max_concurrent is None else max_concurrent
         )
 
         async with httpx.AsyncClient(limits=limits, **HTTPX_DEFAULTS) as client:
@@ -1641,8 +1638,6 @@ async def fetch_gated(
                     a sub-request sleeping off a retry backoff isn't
                     holding a slot while it isn't touching the server.
                     """
-                    if semaphore is None:
-                        return await self.fetch(args)
                     async with semaphore:
                         return await self.fetch(args)
 
diff --git a/tests/waterdata_chunking_test.py b/tests/waterdata_chunking_test.py
@@ -1517,67 +1517,60 @@ async def fetch_async(args):
     return fetch_async
 
 
-def test_fan_out_in_flight_sub_requests_never_exceed_cap(monkeypatch):
-    """At most ``API_USGS_CONCURRENT`` sub-requests are in flight at once,
-    while still running genuinely in parallel up to that cap.
+@pytest.mark.parametrize(
+    ("cap", "expected_high_water"),
+    [
+        pytest.param(2, 2, id="capped"),
+        pytest.param("unbounded", len(_EIGHT_SINGLETON_SITES), id="unbounded"),
+    ],
+)
+def test_fan_out_in_flight_high_water_mark_is_the_cap(
+    monkeypatch, cap, expected_high_water
+):
+    """The fetch-level high-water mark of simultaneous sub-requests IS the
+    ``API_USGS_CONCURRENT`` cap — genuine parallelism up to it, never past
+    it — and ``unbounded`` degenerates to every sub-request at once.
 
     Regression: the cap used to be enforced only by the shared client's
-    connection-pool size, so every sub-request beyond it queued on
-    connection *acquisition* — subject to the client's pool-acquire
-    timeout and httpcore's thundering-herd reassignment (see
-    ``ChunkedCall._run``). The semaphore parks excess sub-requests
-    before they touch the pool, which this test observes directly: the
-    fetch-level high-water mark IS the cap, not the plan's total.
+    connection-pool size, so sub-requests beyond it queued on connection
+    *acquisition*, subject to the client's pool-acquire timeout and
+    httpcore's thundering-herd reassignment (see ``ChunkedCall._run``).
+    The semaphore parks excess sub-requests before they touch the pool.
     """
     in_flight = {"now": 0, "max": 0}
     fetch = _async_chunked_fetch(
-        monkeypatch, _concurrency_probe(in_flight), max_concurrent=2
+        monkeypatch, _concurrency_probe(in_flight), max_concurrent=cap
     )
 
     df, _ = fetch({"sites": list(_EIGHT_SINGLETON_SITES)})
 
     assert len(df) == len(_EIGHT_SINGLETON_SITES)  # all sub-requests completed
-    assert in_flight["max"] == 2  # parallel, but never beyond the cap
-
-
-def test_fan_out_unbounded_dispatches_every_sub_request_at_once(monkeypatch):
-    """``API_USGS_CONCURRENT=unbounded`` disables the gate: every pending
-    sub-request is in flight simultaneously."""
-    in_flight = {"now": 0, "max": 0}
-    fetch = _async_chunked_fetch(
-        monkeypatch, _concurrency_probe(in_flight), max_concurrent="unbounded"
-    )
-
-    df, _ = fetch({"sites": list(_EIGHT_SINGLETON_SITES)})
-
-    assert len(df) == len(_EIGHT_SINGLETON_SITES)
-    assert in_flight["max"] == len(_EIGHT_SINGLETON_SITES)
+    assert in_flight["max"] == expected_high_water
 
 
 def test_fan_out_outlives_pool_timeout_on_real_transport(monkeypatch):
     """End-to-end regression for the pool-timeout starvation bug: the
     fan-out must survive every pooled connection staying busy past the
-    client's pool-acquire timeout.
+    client's pool-acquire timeout (the stall mechanism is documented on
+    ``ChunkedCall._run``; at production scale think a batch of large,
+    slowly-streaming pages).
 
     Sub-requests here send real HTTP to a slow localhost server through
-    the chunker's shared client (fakes can't catch this —
-    ``MockTransport`` bypasses the connection pool). A queued waiter's
-    pool-timeout clock only resets when some response completes, so the
-    repro needs a *stall*: with the pool as the only throttle, 2
-    connections busy for 0.7 s each and a 0.3 s pool timeout pinned
-    below, the 4 queued sub-requests sat through 0.3 s with no
-    completion → ``httpx.PoolTimeout`` → (retries exhausted,
-    ``API_USGS_RETRIES=0``) a spurious resumable ``ServiceInterrupted``.
-    Gated by the semaphore, queued sub-requests never touch the pool
-    and the call completes. (Production scale: 60 s timeout, tripped by
-    a batch of large, slowly-streaming pages.)
+    the chunker's shared client — fakes can't catch this, since
+    ``MockTransport`` bypasses the connection pool. With the pool as the
+    only throttle, 2 connections busy for 0.5 s each and the 0.25 s pool
+    timeout pinned below, the 2 queued sub-requests sat out the full
+    timeout with no completion to reset their clocks →
+    ``httpx.PoolTimeout`` → (retries exhausted, ``API_USGS_RETRIES=0``)
+    a spurious resumable ``ServiceInterrupted``. Gated by the semaphore,
+    queued sub-requests never touch the pool and the call completes.
     """
 
     class _SlowHandler(http.server.BaseHTTPRequestHandler):
         protocol_version = "HTTP/1.1"  # keepalive, so pooled connections reuse
 
         def do_GET(self):
-            time.sleep(0.7)  # hold the connection busy past the pool timeout
+            time.sleep(0.5)  # hold the connection busy past the pool timeout
             body = b'{"ok": true}'
             self.send_response(200)
             self.send_header("Content-Length", str(len(body)))
@@ -1592,10 +1585,10 @@ def log_message(self, *args):  # keep pytest output clean
     thread.start()
     url = f"http://127.0.0.1:{server.server_address[1]}/"
 
-    # Scale the production 60 s pool timeout down to 0.3 s so the
-    # pre-semaphore failure mode reproduces in test time.
+    # Scale the production pool timeout (see ``HTTPX_DEFAULTS``) down to
+    # 0.25 s so the pre-semaphore failure mode reproduces in test time.
     monkeypatch.setitem(
-        HTTPX_DEFAULTS, "timeout", httpx.Timeout(5.0, connect=1.0, pool=0.3)
+        HTTPX_DEFAULTS, "timeout", httpx.Timeout(5.0, connect=1.0, pool=0.25)
     )
 
     async def fetch_async(args):
@@ -1605,7 +1598,7 @@ async def fetch_async(args):
         assert resp.status_code == 200
         return pd.DataFrame({"id": [_atom_id(args)]}), resp
 
-    sites = _EIGHT_SINGLETON_SITES[:6]  # 2 in flight + 4 queued, 3 waves
+    sites = _EIGHT_SINGLETON_SITES[:4]  # 2 in flight + 2 queued, 2 waves
     try:
         fetch = _async_chunked_fetch(monkeypatch, fetch_async, max_concurrent=2)
         df, _ = fetch({"sites": sites})