Skip to content

Commit 493e4eb

Browse files
committed
test(waterdata): Add offline stress test for the joint chunker
Standalone runner (``python3 tests/stress_chunker.py``) that exercises the chunker across eight scenarios with the URL byte limit lowered well below the live API's. No live HTTP — mocks fetch_once and uses the real _construct_api_requests for URL sizing. Per-scenario invariants verified: 1. Every sub-request URL ≤ url_limit (primary correctness). 2. List-dim coverage: the union of distinct chunks issued for each list dim equals the input with no overlap (no data dropped, no duplicate fetches of the same atom within its dim). 3. Filter-clause coverage: the distinct filter chunks split back into clauses, concatenated in iteration order, equal the original clauses (lossless OR-disjunction). 4. Speedup vs the bail-floor-singleton baseline that the old two- decorator design would have produced in pathological cases. Plus a greedy-search adaptation check: scanning ``url_limit`` across 1200 → 10000 confirms sub-request count is monotonically non-increasing as the budget grows (the planner adapts to the limit). Scenarios: A. Long sites only (pure list chunking) B. Long filter only (pure filter chunking) C. Long sites + long filter (joint trade-off — 1000× vs baseline) D. 3-D list cartesian product (3000× vs baseline) E. Lopsided clause sizes (worst-case sizing) F. URL-encoding-heavy clauses (quote_plus inflation) G. Very tight URL limit (singleton chunks) H. Generous URL limit (no chunking needed) I. url_limit sweep proving greedy adaptation All 15 chunked calls pass every invariant.
1 parent f1588ae commit 493e4eb

1 file changed

Lines changed: 430 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)