Skip to content

Commit 3b11ce5

Browse files
thodson-usgsclaude
andcommitted
Fix doc/test wording around longest-vs-shortest clause and align fake build
Three corrections from PR review: - ``RequestTooLarge``'s docstring said the second irreducible case is "any chunkable filter reduced to its smallest top-level OR-clause." The planner actually probes at the inner chunker's bail-floor size, which is bounded below by the LONGEST clause (after URL-encoding), not the shortest. Rewrite the case to describe what the planner actually does. - ``test_plan_chunks_coordinates_with_filter_chunker``'s docstring said the planner models per-sub-request URL as ``worst-dim-chunk + shortest-clause``. Same direction error; corrected to ``longest-clause-after-encoding`` with the rationale (inner chunker's bail floor, not its happy-path output). - ``_fake_build`` test fixture used raw ``len(",".join(...))`` for list params, but the real ``_construct_api_requests`` URL goes through ``quote_plus``. For the all-alphanumeric values these tests use, the gap is 2 bytes per comma — small but enough to let a test pass against the fake while production would have a larger URL. Pull ``quote_plus`` into the fake so its byte count matches what the chunker's ``_request_bytes`` actually measures. No behavior change to the production chunker; 209 waterdata tests pass with no other tunings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 954fbcb commit 3b11ce5

2 files changed

Lines changed: 20 additions & 12 deletions

File tree

dataretrieval/waterdata/chunking.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,9 +103,10 @@
103103
class RequestTooLarge(ValueError):
104104
"""Raised when a chunked request cannot be issued. Two cases:
105105
(1) URL exceeds the byte limit even with every multi-value param at
106-
a singleton chunk and any chunkable filter reduced to its smallest
107-
top-level OR-clause; (2) the cartesian-product plan would issue more
108-
than ``max_chunks`` sub-requests."""
106+
a singleton chunk and any chunkable filter at the inner chunker's
107+
bail-floor size (the URL contribution of its longest single
108+
OR-clause, after URL-encoding); (2) the cartesian-product plan
109+
would issue more than ``max_chunks`` sub-requests."""
109110

110111

111112
class QuotaExhausted(RuntimeError):

tests/waterdata_test.py

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -240,15 +240,20 @@ def _fake_build(*, base=200, **kwargs):
240240
"""Fake build_request: URL length deterministic in its inputs.
241241
242242
Mirrors the GET-routed shape: payload goes in the URL, body is None.
243-
The chunker's ``_request_bytes`` helper sums url + body, so this
244-
stays equivalent to URL-only sizing for these tests.
243+
List/string values are URL-encoded via ``quote_plus`` so the fake's
244+
byte count matches what the real ``_construct_api_requests`` would
245+
produce; otherwise an alphanumeric test could pass against the fake
246+
but fail in production once values containing ``%``, ``+``, ``/``,
247+
``&`` etc. (which expand under encoding) reach the same code path.
245248
"""
249+
from urllib.parse import quote_plus
250+
246251
bytes_ = base
247252
for v in kwargs.values():
248253
if isinstance(v, (list, tuple)):
249-
bytes_ += len(",".join(map(str, v)))
254+
bytes_ += len(quote_plus(",".join(map(str, v))))
250255
elif isinstance(v, str):
251-
bytes_ += len(v)
256+
bytes_ += len(quote_plus(v))
252257
return _FakeReq("x" * bytes_)
253258

254259

@@ -326,8 +331,10 @@ def test_plan_chunks_coordinates_with_filter_chunker():
326331
With the FULL filter in URL-length probes, singleton-per-dim URL still
327332
exceeds the limit and the planner would raise RequestTooLarge. With
328333
filter-aware probing, the planner models the per-sub-request URL as
329-
``worst-dim-chunk + shortest-clause`` (what the inner filter chunker
330-
will actually emit), sees it fits, and returns a plan.
334+
``worst-dim-chunk + longest-clause-after-encoding`` (the inner filter
335+
chunker's bail floor — it returns the FULL filter if any single
336+
clause exceeds the budget, so the longest clause is the smallest
337+
floor it can guarantee). The probe fits, plan returns.
331338
332339
Sanity-check the *negative*: with filter-aware probing disabled, the
333340
same inputs would raise.
@@ -338,9 +345,9 @@ def test_plan_chunks_coordinates_with_filter_chunker():
338345
"filter": " OR ".join(clauses),
339346
}
340347
# singleton+full-filter ≈ 200 + 10 + 86 = 296 (over limit 240) — would raise.
341-
# max-clause probe model ≈ 200 + 10 + 5 = 215 (under limit) — plan succeeds.
342-
# (Here all clauses are the same length, so min==max; the fix matters for
343-
# filters with lopsided clauses where min < max.)
348+
# longest-clause probe model ≈ 200 + 10 + 5 = 215 (under limit) — plan succeeds.
349+
# (Here all clauses are the same length, so longest == shortest; the
350+
# encoding-ratio coordination matters for lopsided clauses.)
344351
plan = _plan_chunks(args, _fake_build, url_limit=240)
345352
assert plan is not None # coordination prevented the premature raise
346353
assert len(plan["monitoring_location_id"]) > 1 # planner did split

0 commit comments

Comments
 (0)