Skip to content

Commit eddcffc

Browse files
thodson-usgsclaude
andauthored
fix(waterdata): raise RequestTooLarge for an unchunkable over-budget request (#309)
ChunkPlan's "no chunkable axes" branch returned immediately without sizing the request, deliberately leaving an over-budget URL for the server to reject. So a single large CQL-text `filter` with one big `IN (...)` clause (no top-level `OR`, hence no chunk axis) was shipped verbatim and failed with an opaque HTTP 414 — and not even RequestTooLarge. (The equivalent monitoring_location_id=[...] chunks fine.) Size-check the no-axes path: if the single request fits, pass through as before; if it's over budget and in the chunker's domain (cql-text / GET multi-value) there's nothing left to split, raise RequestTooLarge with actionable guidance instead of shipping it. A cql-json filter is NOT in the chunker's domain — chunking splits only cql-text — so an over-budget cql-json request is passed through unchanged regardless of size; the server judges it, not the chunker. This preserves the existing cql-json passthrough contract (tests/waterdata_filters_test.py:: test_cql_json_filter_is_not_chunked). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent dca49a7 commit eddcffc

2 files changed

Lines changed: 46 additions & 9 deletions

File tree

dataretrieval/waterdata/chunking.py

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -871,13 +871,30 @@ def __init__(
871871
self.canonical_url: str | None = None
872872

873873
axes = _extract_axes(args)
874-
# No chunkable axes → skip ``build_request`` entirely; the
875-
# common Water Data call shape shouldn't pay for an unused
876-
# request prep on the passthrough hot path. The fetcher
877-
# will run with the user's args verbatim; if that produces
878-
# an over-budget URL, the server (or httpx itself) rejects.
879874
if not axes:
880-
return
875+
# No chunkable axis: nothing to split. If the single request fits,
876+
# run it verbatim (the common passthrough). ``_safe_request_bytes``
877+
# treats an un-constructable URL (httpx.InvalidURL, > 64 KB) as over
878+
# budget.
879+
if _safe_request_bytes(build_request, args, url_limit) <= url_limit:
880+
return
881+
# Over budget. A filter the chunker doesn't manage — cql-json — is
882+
# passed through unchanged (chunking applies only to cql-text); the
883+
# server, not us, judges it. Otherwise this is an in-domain shape we
884+
# would normally chunk but can't (a single large CQL ``IN`` clause
885+
# with no top-level ``OR``, or one oversized value), so raise an
886+
# actionable error instead of shipping it for an opaque HTTP 414.
887+
filter_expr = args.get("filter")
888+
if filter_expr is not None and not _is_chunkable(
889+
filter_expr, args.get("filter_lang")
890+
):
891+
return
892+
raise RequestTooLarge(
893+
f"Request exceeds {url_limit} bytes (URL + body) and has no "
894+
f"chunkable multi-value argument to split (e.g. a single large "
895+
f"CQL `IN` clause, or one oversized value). Narrow the query, "
896+
f"simplify the filter, or split the call manually."
897+
)
881898

882899
# Constructing the initial request can itself trip
883900
# ``httpx.InvalidURL`` (URL > 64 KB) — that's the canonical

tests/waterdata_chunking_test.py

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -161,10 +161,30 @@ def test_extract_axes_skips_singletons_and_never_chunk_params():
161161

162162

163163
def test_chunk_plan_returns_passthrough_when_no_chunkable_axes():
164-
"""Scalar args with nothing to chunk → passthrough, even at a
165-
URL limit the request technically exceeds (the server may 414,
166-
but ``ChunkPlan`` has nothing to split)."""
164+
"""Scalar args with nothing to chunk and a request within the limit →
165+
passthrough (no axes)."""
167166
args = {"monitoring_location_id": "scalar-only"}
167+
plan = ChunkPlan(args, _fake_build, url_limit=1000)
168+
assert plan.axes == []
169+
assert plan.total == 1
170+
171+
172+
def test_chunk_plan_raises_when_unchunkable_request_exceeds_limit():
173+
"""A request with nothing to chunk that still exceeds the byte limit (e.g.
174+
a single large CQL ``IN`` clause with no top-level ``OR``) raises
175+
RequestTooLarge instead of being shipped for the server to reject with an
176+
opaque HTTP 414."""
177+
args = {"monitoring_location_id": "scalar-only"}
178+
with pytest.raises(RequestTooLarge):
179+
ChunkPlan(args, _fake_build, url_limit=10)
180+
181+
182+
def test_chunk_plan_passes_through_unchunkable_cql_json_over_limit():
183+
"""A cql-json filter is outside the chunker's domain (it splits only
184+
cql-text), so an over-budget cql-json request is passed through unchanged
185+
instead of raising — the server judges it, not us. Guards against the
186+
chunker hijacking the deliberate cql-json passthrough."""
187+
args = {"filter": "a OR b OR c", "filter_lang": "cql-json"}
168188
plan = ChunkPlan(args, _fake_build, url_limit=10)
169189
assert plan.axes == []
170190
assert plan.total == 1

0 commit comments

Comments
 (0)