Skip to content

Commit 9161fbd

Browse files
thodson-usgsclaude
andauthored
perf(waterdata): emit compact CQL2 JSON to halve POST chunk count (#292)
monitoring-locations is the one service that POSTs a CQL2 body (it doesn't support comma-separated multi-value GET). The body was pretty-printed via json.dumps(indent=4), ~39 B/value, so it counted ~2x against both the server's ~8 KB request-size cap and the chunk planner's byte budget. The tightest separators (~17 B/value) roughly double how many ids fit per sub-request, halving the chunk count and API requests for large id lists: n_ids indent=4 compact 500 4 2 1000 8 4 5000 32 16 Live check: a 500-id query returns all 500 rows in 2 sub-requests (was 4). The WAF body limit (403) is empirically ~8.2-8.4 KB, so 8000-byte compact bodies stay safely under it. Locked in with a compactness assertion on the monitoring-locations POST test. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7f64c2d commit 9161fbd

2 files changed

Lines changed: 19 additions & 2 deletions

File tree

dataretrieval/waterdata/utils.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -316,15 +316,25 @@ def _cql2_param(args: dict[str, Any]) -> str:
316316
Returns
317317
-------
318318
str
319-
JSON string representation of the CQL2 query.
319+
Compact JSON string representation of the CQL2 query.
320+
321+
Notes
322+
-----
323+
Serialized with the tightest separators (no indentation or
324+
whitespace). The body counts against the server's ~8 KB request-size
325+
limit and against :func:`chunking._request_bytes` when planning
326+
chunks, so every saved byte fits more values per POST: compact
327+
encoding roughly halves the per-value cost versus pretty-printing,
328+
which roughly doubles how many monitoring-location ids fit in one
329+
sub-request and so halves the chunk count for large id lists.
320330
"""
321331
filters = []
322332
for key, values in args.items():
323333
filters.append({"op": "in", "args": [{"property": key}, values]})
324334

325335
query = {"op": "and", "args": filters}
326336

327-
return json.dumps(query, indent=4)
337+
return json.dumps(query, separators=(",", ":"))
328338

329339

330340
def _default_headers():

tests/waterdata_test.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,13 @@ def test_construct_api_requests_monitoring_locations_post():
157157
assert req.method == "POST"
158158
assert req.headers["Content-Type"] == "application/query-cql-json"
159159

160+
# Body is serialized compactly (tight separators, no whitespace): the
161+
# body counts against the server's ~8 KB request-size cap and the
162+
# chunk planner's byte budget, so pretty-printing would needlessly
163+
# halve how many ids fit per sub-request and double the chunk count.
164+
raw = req.content.decode()
165+
assert "\n" not in raw and ", " not in raw and ": " not in raw
166+
160167
body = json.loads(req.content)
161168
# Top-level shape: AND over a list of per-param predicates.
162169
assert body["op"] == "and"

0 commit comments

Comments
 (0)