Commit 399f44a
feat(waterdata): async parallel chunker over httpx.AsyncClient
Multi-value Water Data queries (many monitoring locations, many parameter
codes, large CQL2 filters) can exceed the server's ~8 KB URL/body limit
and need to be split into multiple sub-requests. This adds an
``async``-only chunker that:
* Plans the fan-out: every multi-value list parameter and the cql-text
``filter`` (along its top-level ``OR`` clauses) is modeled as a chunkable
axis; ``ChunkPlan`` greedy-halves the biggest axis until every
sub-request URL fits the byte budget, then iterates the cartesian product.
* Dispatches concurrently: ``ChunkedCall._run`` gathers every pending
sub-request through one shared ``httpx.AsyncClient`` with the connection
pool sized from ``API_USGS_CONCURRENT`` (default 16; ``unbounded``
removes the cap). A single ``anyio`` blocking portal lets the sync
facade work from inside event loops (Jupyter, async apps).
* Survives transients: typed ``RateLimited`` (429) and
``ServiceUnavailable`` (5xx) trigger bounded retry-with-backoff
(``API_USGS_RETRIES`` default 4; full jitter; honors ``Retry-After``
up to a 60 s cap). Anything still failing escalates to a resumable
``ChunkInterrupted`` subclass carrying ``.call`` — call ``.call.resume()``
once the underlying condition clears; only the still-pending
sub-requests are re-issued.
* Combines and finalizes: the OGC getters inject ``utils._finalize_ogc``
(type coercion, column arrangement, ``max_rows`` truncation,
``BaseMetadata``) through the chunker's ``finalize`` hook so a
successful first call and a resumed call yield the same shape.
Surfaces and integration:
* ``dataretrieval/waterdata/chunking.py`` (new module): ``RetryPolicy``,
``ChunkPlan``, ``ChunkedCall``, ``ChunkInterrupted`` /
``QuotaExhausted`` / ``ServiceInterrupted``, ``multi_value_chunked``
decorator.
* ``utils._fetch_once`` is the decorated async fetcher; pagination
helpers ``_paginate`` / ``_walk_pages`` are async and share the
chunker's client via a ``ContextVar``.
* ``api.get_reference_table(... max_rows=...)``: new preview cap.
* ``_progress``: per-call status line (chunk count, pages, rows,
rate-limit remaining); ``API_USGS_PROGRESS`` opt-in/off.
Deps: add ``geopandas>=0.10`` + ``mapclassify`` to ``[doc]`` extras so
``WaterData_demo.ipynb``'s ``.set_crs().explore()`` cell executes (the
plain-pandas frame lacks ``.set_crs``).
Tests: full async chunker suite (planning, retry taxonomy, resume,
client-sharing, progress reporter, finalize injection) + live-API
regression tests covering every public getter. 298 offline + 63 live
tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 569ff38 commit 399f44a
11 files changed
Lines changed: 2049 additions & 462 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
| |||
140 | 143 | | |
141 | 144 | | |
142 | 145 | | |
| 146 | + | |
143 | 147 | | |
144 | 148 | | |
145 | 149 | | |
146 | 150 | | |
147 | 151 | | |
148 | 152 | | |
149 | 153 | | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
150 | 173 | | |
151 | 174 | | |
152 | 175 | | |
| |||
179 | 202 | | |
180 | 203 | | |
181 | 204 | | |
| 205 | + | |
| 206 | + | |
182 | 207 | | |
183 | 208 | | |
184 | 209 | | |
| |||
209 | 234 | | |
210 | 235 | | |
211 | 236 | | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
212 | 244 | | |
213 | 245 | | |
214 | 246 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2022 | 2022 | | |
2023 | 2023 | | |
2024 | 2024 | | |
| 2025 | + | |
2025 | 2026 | | |
2026 | 2027 | | |
2027 | 2028 | | |
| |||
2046 | 2047 | | |
2047 | 2048 | | |
2048 | 2049 | | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
2049 | 2056 | | |
2050 | 2057 | | |
2051 | 2058 | | |
| |||
2092 | 2099 | | |
2093 | 2100 | | |
2094 | 2101 | | |
2095 | | - | |
| 2102 | + | |
| 2103 | + | |
| 2104 | + | |
2096 | 2105 | | |
2097 | 2106 | | |
2098 | 2107 | | |
| |||
0 commit comments