Skip to content

Commit f85f318

Browse files
thodson-usgsclaude
andcommitted
fix(waterdata): Drop dataclass slots=True for Python 3.9 compat
``slots=True`` for ``@dataclass`` requires Python 3.10. The package declares ``requires-python = ">=3.9"`` and CI tests 3.9, so the import was failing test collection on the 3.9 matrix cell. Dropping the kwarg loses a small memory optimization on short-lived ``_Axis`` instances (not material) and restores compatibility. Also aligns one residual "sub-chunk" comment to "chunk" — the rest of the file already uses "chunk". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 95bd5f0 commit f85f318

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. Every multi-value list parameter and the cql-text `filter` (split on its top-level `OR`s) is modeled as a chunkable axis; greedy halving splits the biggest chunk across all axes until each sub-request URL fits. After the first sub-request `ChunkedCall` reads `x-ratelimit-remaining`; if the rest of the plan won't fit the window it raises `RequestExceedsQuota` reporting the deficit. Mid-call transient failures (429 or 5xx) surface as a `ChunkInterrupted` subclass — `QuotaExhausted` for 429, `ServiceInterrupted` for 5xx — carrying the partial result plus a resumable call handle (`exc.call`); call `exc.call.resume()` to continue only the still-pending sub-requests once the underlying condition clears. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N axes. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page/sub-request headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.
1+
**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
22

33
**05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
44

dataretrieval/waterdata/chunking.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -452,7 +452,7 @@ def _request_bytes(req: requests.PreparedRequest) -> int:
452452
return len(req.url) + body_len
453453

454454

455-
@dataclass(frozen=True, slots=True)
455+
@dataclass(frozen=True)
456456
class _Axis:
457457
"""
458458
A single chunkable axis of one user-level request — a list of
@@ -932,7 +932,7 @@ def _combine_chunk_frames(frames: list[pd.DataFrame]) -> pd.DataFrame:
932932
Dedup is restricted to rows whose ``id`` is non-null. ``pandas``
933933
treats NaN==NaN as a duplicate for ``drop_duplicates``, so a
934934
blanket call would collapse every id-less row into a single one —
935-
silent data loss if any sub-chunk emits features without an
935+
silent data loss if any chunk emits features without an
936936
``id`` field.
937937
"""
938938
non_empty = [f for f in frames if not f.empty]

0 commit comments

Comments
 (0)