Add optional parallel chunk processing via ThreadPoolExecutor

thodson-usgs · claude · thodson-usgs · commit 37e2d2afff04 · 2026-05-16T10:12:19.000-05:00
Draft. Adds `max_workers` to `@multi_value_chunked`; when set &gt; 1
sub-requests run concurrently in a ``ThreadPoolExecutor``. Default
(``None``) keeps the existing sequential behavior, including the
mid-call quota guard and ``QuotaExhausted.partial_frame``
resumability. Parallel mode forfeits both: workers can race past the
floor before any one observes the crossing, and any chunk failure
discards completed-but-uncollected results.

Benchmarked on 671 CAMELS sites x ``get_field_measurements``
(337,808 rows, ~14 paginated chunks):
- sync:        82.1s +/- 12.1s
- workers=2:  100.2s (unstable; one run truncated by 429)
- workers=4:   51.5s +/- 2.8s (~1.6x; one run lightly truncated)
- workers=8:   crashed on 429 mid-flight

Open question for follow-up: under parallelism we observe partial
data being returned silently when one chunk hits a paginated 429 --
``_walk_pages`` returns the rows it already has rather than
surfacing the truncation. This pre-existed but parallelism makes
it easier to trigger. Investigate as part of any future merge.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/dataretrieval/waterdata/chunking.py b/dataretrieval/waterdata/chunking.py
@@ -35,6 +35,7 @@
 
 from __future__ import annotations
 
+import concurrent.futures
 import functools
 import itertools
 import math
@@ -339,6 +340,7 @@ def multi_value_chunked(
     url_limit: int | None = None,
     max_chunks: int | None = None,
     quota_safety_floor: int | None = None,
+    max_workers: int | None = None,
 ) -> Callable[[_FetchOnce], _FetchOnce]:
     """Decorator that splits multi-value list params across sub-requests
     so each URL fits ``url_limit`` bytes (defaults to
@@ -362,6 +364,14 @@ def multi_value_chunked(
     response)`` — so the two decorators compose cleanly. The planner is
     filter-aware so it doesn't raise prematurely when the inner filter
     chunker would have shrunk the per-sub-request URL on its own.
+
+    ``max_workers`` (default ``None``, sequential) runs sub-requests in a
+    ``ThreadPoolExecutor`` when ``> 1``. Parallel mode forfeits the mid-call
+    quota guard and ``QuotaExhausted.partial_frame`` resumability — workers
+    race past the floor before any one observes the crossing, and any
+    chunk failure discards completed-but-uncollected results. Empirically
+    helpful only on pagination-bound workloads, where shared per-IP rate
+    limits cap useful parallelism well below host CPU count.
     """
 
     def decorator(fetch_once: _FetchOnce) -> _FetchOnce:
@@ -385,10 +395,30 @@ def wrapper(
 
             keys = list(plan)
             total = math.prod(len(plan[k]) for k in keys)
+            sub_args_list = [
+                {**args, **dict(zip(keys, combo))}
+                for combo in itertools.product(*(plan[k] for k in keys))
+            ]
+
+            if max_workers is not None and max_workers > 1:
+                # Parallel mode forfeits the mid-call quota guard and
+                # ``QuotaExhausted.partial_frame`` resumability: workers race
+                # past the floor before any one observes the crossing, and a
+                # failed chunk discards completed-but-uncollected results.
+                with concurrent.futures.ThreadPoolExecutor(
+                    max_workers=max_workers
+                ) as executor:
+                    results = list(executor.map(fetch_once, sub_args_list))
+                frames = [r[0] for r in results]
+                responses = [r[1] for r in results]
+                return (
+                    _combine_chunk_frames(frames),
+                    _combine_chunk_responses(responses),
+                )
+
             frames: list[pd.DataFrame] = []
             responses: list[requests.Response] = []
-            for i, combo in enumerate(itertools.product(*(plan[k] for k in keys))):
-                sub_args = {**args, **dict(zip(keys, combo))}
+            for i, sub_args in enumerate(sub_args_list):
                 frame, response = fetch_once(sub_args)
                 frames.append(frame)
                 responses.append(response)