1313under one ``asyncio.gather`` sharing a single ``httpx.AsyncClient``;
1414concurrency is bounded purely by the client's connection pool
1515(``httpx.Limits(max_connections=N, max_keepalive_connections=N)``), so
16- the pool — not a semaphore — throttles. ``API_USGS_CONCURRENT`` resolves
16+ the pool throttles. ``API_USGS_CONCURRENT`` resolves
1717``N``: an integer N > 1 caps connections at N; ``1`` pins a single
1818connection (one request at a time); the literal ``unbounded`` removes
1919the cap (``N=None``). The default (16) is the server-friendly sweet
@@ -153,13 +153,10 @@ def _read_concurrency_env() -> int | None:
153153
154154
155155# Retry-with-backoff defaults for transient sub-request failures (429 /
156- # 5xx / connect-read timeouts). All four are resolved at call time by
157- # ``RetryPolicy.from_env`` (the env var via ``monkeypatch.setenv``, the
158- # timing constants via ``monkeypatch.setattr`` on this module), so both
159- # are overridable in tests and by power users. Defaults: 4 retries, 0.5s
160- # base doubling under full jitter up to a 30s per-attempt ceiling, and
161- # honor a server ``Retry-After`` up to 60s before escalating to a
162- # resumable interruption instead.
156+ # 5xx / connect-read timeouts): 4 retries, 0.5s base doubling under full
157+ # jitter up to a 30s per-attempt ceiling, and honor a server
158+ # ``Retry-After`` up to 60s before escalating to a resumable interruption
159+ # instead.
163160_RETRIES_ENV = "API_USGS_RETRIES"
164161_RETRIES_DEFAULT = 4
165162_RETRY_BASE_BACKOFF = 0.5
@@ -196,7 +193,7 @@ class RetryPolicy:
196193 """Bounded retry-with-backoff config for transient sub-request failures.
197194
198195 An immutable value object that owns the *timing* decisions; the
199- exception taxonomy ("is this worth retrying at all?" ) lives in
196+ exception taxonomy (which failures are retryable ) lives in
200197 :func:`_retryable`. Backoff is exponential with **full jitter**
201198 (:func:`random.uniform` over ``[0, ceiling]``) so the concurrent
202199 fan-out's retries don't re-burst in lockstep. A server ``Retry-After``
@@ -376,14 +373,12 @@ def get_active_client() -> httpx.AsyncClient | None:
376373# ``ChunkedCall`` drives: an ``async def fetch(args) -> (df, response)``.
377374_Fetch = Callable [[dict [str , Any ]], Awaitable [tuple [pd .DataFrame , httpx .Response ]]]
378375
379- # Caller-supplied transform applied to the *combined* chunk result. It lets a
380- # resumed call (:meth:`ChunkedCall.resume` / :attr:`~ChunkedCall.partial_frame`
381- # / :attr:`~ChunkedCall.partial_response`) return the same shape as the
382- # un-interrupted call instead of the chunker's raw ``(frame, httpx.Response)``.
383- # The chunker stays generic — it only knows "post-process the assembled
384- # result"; the OGC getters inject the actual type-coercion / column-arrangement
385- # / ``BaseMetadata`` pipeline (see ``utils._finalize_ogc``). The default is
386- # identity, so direct ``ChunkedCall`` use and the tests are unaffected.
376+ # Caller-supplied transform applied to the combined chunk result, so a
377+ # resumed call returns the same shape as an un-interrupted one rather than
378+ # the chunker's raw ``(frame, httpx.Response)``. This keeps the chunker
379+ # generic: the OGC getters inject their post-processing (type coercion,
380+ # column arrangement, ``BaseMetadata``) through ``utils._finalize_ogc``.
381+ # The default is identity, so direct ``ChunkedCall`` use is unaffected.
387382_Finalize = Callable [[pd .DataFrame , httpx .Response ], tuple [pd .DataFrame , Any ]]
388383
389384
@@ -1114,18 +1109,16 @@ def _retryable(exc: BaseException) -> tuple[bool, float | None]:
11141109 """
11151110 Decide whether ``exc`` is a transient worth an automatic retry.
11161111
1117- Inspects only the *top-level* exception, by design — and so is
1118- deliberately narrower than :func:`_classify_chunk_error`, which walks
1119- the ``__cause__`` chain for resumability. ``_paginate`` raises an
1120- initial-request transient (429 / 5xx / :class:`httpx.TransportError`
1121- such as ``ConnectError`` / ``ReadTimeout``) *raw*, but re-wraps any
1122- mid-pagination failure as a ``RuntimeError``. Retrying only the raw,
1123- top-level transient means we re-issue a sub-request that made no
1124- progress (cheap), while a failure after partial pagination escalates
1125- to the resumable :class:`ChunkInterrupted` instead of being re-walked
1126- from page 1 — which would re-spend the very quota that was exhausted.
1127- ``httpx.InvalidURL`` is excluded (a too-long cursor won't fix on
1128- retry), and it only ever arises on a follow-up page anyway.
1112+ Only the *top-level* exception is inspected — unlike
1113+ :func:`_classify_chunk_error`, which walks the ``__cause__`` chain.
1114+ The distinction matters because ``_paginate`` raises an
1115+ initial-request transient (429 / 5xx / :class:`httpx.TransportError`)
1116+ *raw*, but wraps a mid-pagination failure as a ``RuntimeError``. So a
1117+ raw transient means a sub-request that made no progress and is cheap to
1118+ re-issue, whereas a mid-pagination failure is left to escalate to a
1119+ resumable :class:`ChunkInterrupted` rather than re-walked from page 1
1120+ (which would re-spend the quota just exhausted). ``httpx.InvalidURL``
1121+ is never retried — a too-long cursor won't fix on a retry.
11291122
11301123 Returns
11311124 -------
@@ -1620,9 +1613,8 @@ def resume(self) -> tuple[pd.DataFrame, Any]:
16201613
16211614 async def _run (self , max_concurrent : int | None ) -> tuple [pd .DataFrame , Any ]:
16221615 """
1623- The async execution core: gather every pending sub-request over
1624- one shared :class:`httpx.AsyncClient` and return the combined,
1625- finalized result.
1616+ Gather every pending sub-request over one shared
1617+ :class:`httpx.AsyncClient` and return the combined, finalized result.
16261618
16271619 Pending sub-requests (:meth:`_pending`) fan out under
16281620 ``asyncio.gather`` with ``return_exceptions=True`` so completed
@@ -1632,12 +1624,12 @@ async def _run(self, max_concurrent: int | None) -> tuple[pd.DataFrame, Any]:
16321624 ``.call``; ``exc.call.resume()`` then re-issues only the unfinished
16331625 indices through this same runner.
16341626
1635- Concurrency is bounded purely by the client's connection pool —
1627+ Concurrency is bounded by the client's connection pool —
16361628 ``httpx.Limits(max_connections=N, max_keepalive_connections=N)``
1637- where ``N = max_concurrent`` (``None`` for unbounded). There is no
1638- semaphore: the gather dispatches *every* pending sub-request and the
1639- pool throttles, so ``N=1`` is just a single-connection gather (one
1640- request at a time) and ``total <= 1`` is just a one-element gather.
1629+ where ``N = max_concurrent`` (``None`` for unbounded). The gather
1630+ dispatches *every* pending sub-request and the pool throttles, so
1631+ ``N=1`` is just a single-connection gather (one request at a time)
1632+ and ``total <= 1`` is just a one-element gather.
16411633 The shared client is published on :data:`_chunked_client` so
16421634 the paginated-loop helpers reuse its connection pool.
16431635
0 commit comments