Skip to content

Commit 44c434b

Browse files
committed
Merge remote-tracking branch 'upstream/main' into fix-paginated-truncation-errors
# Conflicts: # NEWS.md
2 parents 404b8ce + c755f6b commit 44c434b

14 files changed

Lines changed: 1150 additions & 513 deletions

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
**05/14/2026:** Fixed two latent bugs in the paginated `waterdata` request loop (`_walk_pages` and `get_stats_data`). Previously, when `requests.Session.request(...)` itself raised mid-pagination (network error, timeout), the except block called `_error_body()` on the *prior page's* response, so the logged "error" described the wrong request and could itself crash on non-JSON bodies. Separately, no status-code check was performed on subsequent paginated responses, so a 5xx body that didn't include `numberReturned` was silently treated as an empty page — pagination quietly stopped and the user got truncated data with no error logged. The loop now status-checks each page like the initial request and reports the actual exception. The "best-effort" behavior (return whatever pages were collected) is preserved.
22

3+
**05/07/2026:** Bumped the declared minimum Python version from **3.8** to **3.9** (`pyproject.toml`'s `requires-python` and the ruff target). This brings the manifest in line with what was already being tested — CI's matrix has long covered only 3.9, 3.13, and 3.14, the `waterdata` test module already skipped itself on Python < 3.10, and several modules already use 3.9-only stdlib (e.g. `zoneinfo`). Users on 3.8 will no longer be able to install the package; please upgrade.
4+
5+
**05/07/2026:** `waterdata.get_samples()` and `wqp.get_results()` now append a derived `<prefix>DateTime` UTC column for every Date/Time/TimeZone triplet in the response (e.g. `Activity_StartDate` + `Activity_StartTime` + `Activity_StartTimeZone``Activity_StartDateTime`). Both the WQX3 (`<X>Date`/`<X>Time`/`<X>TimeZone`) and legacy WQP (`<X>Date`/`<X>Time/Time`/`<X>Time/TimeZoneCode`) shapes are recognized; abbreviations like EST/EDT/CST/PST resolve to a UTC `Timestamp`, unknown codes resolve to `NaT`, and the original triplet columns are preserved. Returned rows are also now sorted by `Activity_StartDateTime` (or the legacy `ActivityStartDateTime`) — the underlying APIs return rows in an unstable order. Mirrors R's `create_dateTime` and end-of-pipeline sort. Closes #266.
6+
37
**05/06/2026:** Each remaining active function in `dataretrieval.nwis` now emits a per-function `DeprecationWarning` naming the `waterdata` replacement to migrate to (visible the first time users call each getter). The `nwis` module is scheduled for removal on or after **2027-05-06**.
48

59
**05/06/2026:** Added `waterdata.get_ratings(...)` — wraps the new Water Data STAC catalog (`api.waterdata.usgs.gov/stac/v0/search`) for USGS stage-discharge rating curves. Returns parsed `exsa` / `base` / `corr` rating tables as a dict of DataFrames keyed by feature ID, or just the list of available STAC features when `download_and_parse=False`. Mirrors R's `read_waterdata_ratings`.

dataretrieval/nwis.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -291,10 +291,13 @@ def get_discharge_peaks(
291291

292292

293293
def get_gwlevels(**kwargs):
294-
"""Defunct: use ``waterdata.get_field_measurements()``."""
294+
"""Defunct: use ``waterdata.get_continuous()``, ``waterdata.get_daily()``,
295+
or ``waterdata.get_field_measurements()``."""
295296
raise NameError(
296-
"`nwis.get_gwlevels` has been replaced "
297-
"with `waterdata.get_field_measurements()`."
297+
"`nwis.get_gwlevels` has been replaced. Use "
298+
"`waterdata.get_continuous()` for continuous (typically 15-minute) "
299+
"values, `waterdata.get_daily()` for daily values, or "
300+
"`waterdata.get_field_measurements()` for discrete/manual readings."
298301
)
299302

300303

@@ -885,7 +888,8 @@ def get_record(
885888
- 'site' : site description
886889
- 'measurements' : (defunct) use `waterdata.get_field_measurements`
887890
- 'peaks': discharge peaks
888-
- 'gwlevels': (defunct) use `waterdata.get_field_measurements`
891+
- 'gwlevels': (defunct) use `waterdata.get_continuous`,
892+
`waterdata.get_daily`, or `waterdata.get_field_measurements`
889893
- 'pmcodes': (defunct) use `get_reference_table`
890894
- 'water_use': (defunct) no replacement available
891895
- 'ratings': get rating table
@@ -933,7 +937,11 @@ def get_record(
933937

934938
defunct_replacements = {
935939
"measurements": "`waterdata.get_field_measurements`",
936-
"gwlevels": "`waterdata.get_field_measurements`",
940+
"gwlevels": (
941+
"`waterdata.get_continuous` (continuous), "
942+
"`waterdata.get_daily`, or `waterdata.get_field_measurements` "
943+
"(discrete)"
944+
),
937945
"pmcodes": "`waterdata.get_reference_table`",
938946
"water_use": "no replacement available",
939947
}

dataretrieval/utils.py

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,108 @@ def format_datetime(df, date_field, time_field, tz_field):
9494
return df
9595

9696

97+
# (time-suffix, tz-suffix) pairs that follow a "<prefix>Date" column.
98+
_TIME_TZ_SUFFIXES = (
99+
# WQX3 / Samples, e.g.
100+
# Activity_StartDate / Activity_StartTime / Activity_StartTimeZone
101+
("Time", "TimeZone"),
102+
# Legacy WQP (slash-separated), e.g.
103+
# ActivityStartDate / ActivityStartTime/Time / ActivityStartTime/TimeZoneCode
104+
("Time/Time", "Time/TimeZoneCode"),
105+
)
106+
107+
108+
def _build_utc_datetime(
109+
date_series: pd.Series, time_series: pd.Series, tz_series: pd.Series
110+
) -> pd.Series:
111+
"""Combine date + time + tz-abbreviation columns into a UTC pandas Series.
112+
113+
Unknown timezone codes (and rows missing any of the three values) yield
114+
``NaT``. The input columns are not mutated.
115+
"""
116+
offsets = tz_series.map(tz)
117+
combined = (
118+
date_series.astype("string")
119+
+ " "
120+
+ time_series.astype("string")
121+
+ " "
122+
+ offsets.astype("string")
123+
)
124+
return pd.to_datetime(
125+
combined, format="%Y-%m-%d %H:%M:%S %z", utc=True, errors="coerce"
126+
)
127+
128+
129+
def _attach_datetime_columns(df: pd.DataFrame) -> pd.DataFrame:
130+
"""Add ``<prefix>DateTime`` UTC columns for any Date/Time/TimeZone triplets
131+
and sort the frame by the activity-start datetime.
132+
133+
Detects two naming patterns that appear in USGS Samples and Water Quality
134+
Portal CSV responses:
135+
136+
* **WQX3** — ``<prefix>Date``, ``<prefix>Time``, ``<prefix>TimeZone``
137+
* **Legacy WQP** — ``<prefix>Date``, ``<prefix>Time/Time``,
138+
``<prefix>Time/TimeZoneCode``
139+
140+
For every triplet present, a new ``<prefix>DateTime`` column is appended
141+
holding a UTC ``Timestamp`` (offsets resolved via
142+
:data:`dataretrieval.codes.tz`). The original Date/Time/TimeZone columns
143+
are left intact, and an existing ``<prefix>DateTime`` column is never
144+
overwritten.
145+
146+
Rows are sorted (and the index reset) by the canonical activity-start
147+
datetime when present — ``Activity_StartDateTime`` (WQX3) or
148+
``ActivityStartDateTime`` (legacy WQP) — falling back to the first
149+
detected ``*Date`` column. Mirrors R ``dataRetrieval``'s
150+
end-of-pipeline sort in ``importWQP.R``.
151+
152+
Parameters
153+
----------
154+
df : ``pandas.DataFrame``
155+
DataFrame returned from a Samples or WQP CSV endpoint.
156+
157+
Returns
158+
-------
159+
df : ``pandas.DataFrame``
160+
A new DataFrame with derivable ``<prefix>DateTime`` columns appended
161+
and rows sorted by the activity-start datetime (if any date column
162+
was detected).
163+
"""
164+
columns = set(df.columns)
165+
new_columns = {}
166+
first_date_col = None
167+
for col in df.columns:
168+
if not col.endswith("Date"):
169+
continue
170+
if first_date_col is None:
171+
first_date_col = col
172+
prefix = col.removesuffix("Date")
173+
target = prefix + "DateTime"
174+
if target in columns or target in new_columns:
175+
continue
176+
for time_suffix, tz_suffix in _TIME_TZ_SUFFIXES:
177+
time_col = prefix + time_suffix
178+
tz_col = prefix + tz_suffix
179+
if time_col in columns and tz_col in columns:
180+
new_columns[target] = _build_utc_datetime(
181+
df[col], df[time_col], df[tz_col]
182+
)
183+
break
184+
if new_columns:
185+
# Concat in one shot — per-column assignment on a wide CSV-derived
186+
# frame triggers pandas' fragmentation PerformanceWarning.
187+
df = pd.concat([df, pd.DataFrame(new_columns, index=df.index)], axis=1)
188+
if "Activity_StartDateTime" in df.columns:
189+
sort_key = "Activity_StartDateTime"
190+
elif "ActivityStartDateTime" in df.columns:
191+
sort_key = "ActivityStartDateTime"
192+
else:
193+
sort_key = first_date_col
194+
if sort_key is not None:
195+
df = df.sort_values(by=sort_key, ignore_index=True)
196+
return df
197+
198+
97199
class BaseMetadata:
98200
"""Base class for metadata.
99201

0 commit comments

Comments
 (0)