Commit 493e4eb
committed
test(waterdata): Add offline stress test for the joint chunker
Standalone runner (``python3 tests/stress_chunker.py``) that exercises
the chunker across eight scenarios with the URL byte limit lowered well
below the live API's. No live HTTP — mocks fetch_once and uses the real
_construct_api_requests for URL sizing.
Per-scenario invariants verified:
1. Every sub-request URL ≤ url_limit (primary correctness).
2. List-dim coverage: the union of distinct chunks issued for each
list dim equals the input with no overlap (no data dropped, no
duplicate fetches of the same atom within its dim).
3. Filter-clause coverage: the distinct filter chunks split back into
clauses, concatenated in iteration order, equal the original
clauses (lossless OR-disjunction).
4. Speedup vs the bail-floor-singleton baseline that the old two-
decorator design would have produced in pathological cases.
Plus a greedy-search adaptation check: scanning ``url_limit`` across
1200 → 10000 confirms sub-request count is monotonically non-increasing
as the budget grows (the planner adapts to the limit).
Scenarios:
A. Long sites only (pure list chunking)
B. Long filter only (pure filter chunking)
C. Long sites + long filter (joint trade-off — 1000× vs baseline)
D. 3-D list cartesian product (3000× vs baseline)
E. Lopsided clause sizes (worst-case sizing)
F. URL-encoding-heavy clauses (quote_plus inflation)
G. Very tight URL limit (singleton chunks)
H. Generous URL limit (no chunking needed)
I. url_limit sweep proving greedy adaptation
All 15 chunked calls pass every invariant.1 parent f1588ae commit 493e4eb
1 file changed
Lines changed: 430 additions & 0 deletions
0 commit comments