fix(waterdata): Drop dataclass slots=True for Python 3.9 compat

thodson-usgs · claude · thodson-usgs · commit f85f318fa1f2 · 2026-05-22T23:09:21.000-05:00
``slots=True`` for ``@dataclass`` requires Python 3.10. The package
declares ``requires-python = "&gt;=3.9"`` and CI tests 3.9, so the import
was failing test collection on the 3.9 matrix cell. Dropping the kwarg
loses a small memory optimization on short-lived ``_Axis`` instances
(not material) and restores compatibility.

Also aligns one residual "sub-chunk" comment to "chunk" — the rest of
the file already uses "chunk".

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/NEWS.md b/NEWS.md
@@ -1,4 +1,4 @@
-**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. Every multi-value list parameter and the cql-text `filter` (split on its top-level `OR`s) is modeled as a chunkable axis; greedy halving splits the biggest chunk across all axes until each sub-request URL fits. After the first sub-request `ChunkedCall` reads `x-ratelimit-remaining`; if the rest of the plan won't fit the window it raises `RequestExceedsQuota` reporting the deficit. Mid-call transient failures (429 or 5xx) surface as a `ChunkInterrupted` subclass — `QuotaExhausted` for 429, `ServiceInterrupted` for 5xx — carrying the partial result plus a resumable call handle (`exc.call`); call `exc.call.resume()` to continue only the still-pending sub-requests once the underlying condition clears. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N axes. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the *last* page/sub-request headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.
+**05/17/2026:** The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
 
 **05/16/2026:** Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. **Behavior change**: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
 
diff --git a/dataretrieval/waterdata/chunking.py b/dataretrieval/waterdata/chunking.py
@@ -452,7 +452,7 @@ def _request_bytes(req: requests.PreparedRequest) -> int:
     return len(req.url) + body_len
 
 
-@dataclass(frozen=True, slots=True)
+@dataclass(frozen=True)
 class _Axis:
     """
     A single chunkable axis of one user-level request — a list of
@@ -932,7 +932,7 @@ def _combine_chunk_frames(frames: list[pd.DataFrame]) -> pd.DataFrame:
     Dedup is restricted to rows whose ``id`` is non-null. ``pandas``
     treats NaN==NaN as a duplicate for ``drop_duplicates``, so a
     blanket call would collapse every id-less row into a single one —
-    silent data loss if any sub-chunk emits features without an
+    silent data loss if any chunk emits features without an
     ``id`` field.
     """
     non_empty = [f for f in frames if not f.empty]

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		-05/17/2026: The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit. A common chained-query pattern — pull a long site list from `get_monitoring_locations`, then feed it into `get_daily` — previously failed with HTTP 414 once the resulting URL grew past the limit; it now fans out across multiple sub-requests under the hood and returns one combined DataFrame. Every multi-value list parameter and the cql-text `filter` (split on its top-level `OR`s) is modeled as a chunkable axis; greedy halving splits the biggest chunk across all axes until each sub-request URL fits. After the first sub-request `ChunkedCall` reads `x-ratelimit-remaining`; if the rest of the plan won't fit the window it raises `RequestExceedsQuota` reporting the deficit. Mid-call transient failures (429 or 5xx) surface as a `ChunkInterrupted` subclass — `QuotaExhausted` for 429, `ServiceInterrupted` for 5xx — carrying the partial result plus a resumable call handle (`exc.call`); call `exc.call.resume()` to continue only the still-pending sub-requests once the underlying condition clears. Mirrors R `dataRetrieval`'s [#870](https://github.com/DOI-USGS/dataRetrieval/pull/870), generalized to N axes. Note one metadata-behavior change for paginated/chunked calls: `BaseMetadata.url` still reflects the user's original query (unchanged), but `BaseMetadata.header` now carries the last page/sub-request headers (so `x-ratelimit-remaining` is current) rather than the first, and `BaseMetadata.query_time` is now the cumulative wall-clock across pages instead of the first page's elapsed.
	`1`	+05/17/2026: The OGC `waterdata` getters (`get_daily`, `get_continuous`, `get_field_measurements`, and the rest of the multi-value-capable functions) now transparently chunk requests whose URLs would otherwise exceed the server's ~8 KB byte limit.
`2`	`2`
`3`	`3`	05/16/2026: Fixed silent truncation in the paginated `waterdata` request loops (`_walk_pages` and `get_stats_data`). Mid-pagination failures (HTTP 429, 5xx, network error) were previously swallowed — pagination would quietly stop and the function would return whatever rows it had collected, leaving callers with truncated DataFrames they had no way to detect. The loops now status-check every page like the initial request and raise `RuntimeError` on any failure, with the upstream exception chained as `__cause__` and a short menu of recovery actions (wait and retry, reduce the request, or obtain an API token) in the message. Behavior change: callers that previously consumed partial DataFrames on transient upstream blips will now see an exception; retry the call (possibly with a smaller `limit` or narrower query).
`4`	`4`