Skip to content

Commit 630cc8c

Browse files
thodson-usgsclaude
andcommitted
get_ratings: surface RDB header as df.attrs and document the nwis dep
Two non-functional follow-ups suggested during review of #269: (1) Document the cross-module reach into nwis._read_rdb. Rating files use the same USGS RDB shape as NWIS responses, so the parser is already reusable as-is — no refactor of the legacy nwis module is needed. Added a comment at the import site explaining why the private import is intentional and what to watch for if _read_rdb ever moves. (2) Surface the RDB #-prefixed header block. Each parsed rating frame now carries provenance in df.attrs: - df.attrs["comment"]: the list of "#"-prefixed header lines (rating id, parameter, expansion type, last-shifted timestamp, warnings, etc.). - df.attrs["url"]: the asset URL it was fetched from. R's read_waterdata_ratings exposes the comment block via comment(df); pandas's standard `attrs` dict is the Python equivalent. Done in ratings.py only — does not touch nwis. A live spot-check against api.waterdata.usgs.gov on USGS-01104475 exsa shows the 31-line USGS header survives intact (gauge name, parameter code, rating expansion, etc.). One new unit test pins the behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e304763 commit 630cc8c

2 files changed

Lines changed: 52 additions & 3 deletions

File tree

dataretrieval/waterdata/ratings.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@
2424
import pandas as pd
2525
import requests
2626

27+
# Rating files use the same USGS RDB shape as NWIS responses (comment
28+
# block prefixed with ``#``, header row, format-spec row, then tab-separated
29+
# data), so we reuse the parser already in ``nwis``. ``_read_rdb`` is private;
30+
# if it ever moves or its contract changes we want a loud failure here, hence
31+
# the explicit import rather than a copy.
2732
from dataretrieval.nwis import _read_rdb
2833

2934
from .utils import BASE_URL, _default_headers, _format_api_dates
@@ -84,6 +89,18 @@ def _search(
8489
return response.json().get("features", [])
8590

8691

92+
def _extract_rdb_comment(rdb: str) -> list[str]:
93+
"""Return the RDB ``#``-prefixed comment block as a list of header lines.
94+
95+
The comment block carries useful per-rating metadata — rating id,
96+
parameter description, expansion type, last-shifted timestamp, etc.
97+
R's ``read_waterdata_ratings`` exposes this via ``comment(df)``; we
98+
attach it to ``df.attrs["comment"]`` so callers can inspect or log
99+
provenance without re-reading the on-disk RDB.
100+
"""
101+
return [line for line in rdb.splitlines() if line.startswith("#")]
102+
103+
87104
def _download_and_parse(
88105
feature: dict[str, Any],
89106
file_path: str,
@@ -100,7 +117,10 @@ def _download_and_parse(
100117
with open(target, "w") as f:
101118
f.write(response.text)
102119

103-
return _read_rdb(response.text)
120+
df = _read_rdb(response.text)
121+
df.attrs["comment"] = _extract_rdb_comment(response.text)
122+
df.attrs["url"] = url
123+
return df
104124

105125

106126
def get_ratings(
@@ -168,8 +188,12 @@ def get_ratings(
168188
dict[str, pandas.DataFrame] or list[dict]
169189
When ``download_and_parse=True`` (the default), a dict keyed by
170190
feature ID (e.g. ``"USGS-01104475.exsa.rdb"``) mapping to a parsed
171-
``DataFrame``. When ``download_and_parse=False``, the raw list of
172-
STAC feature dicts as returned by the search endpoint.
191+
``DataFrame``. Each frame carries provenance in
192+
``df.attrs["comment"]`` (the RDB ``#``-prefixed header lines, like
193+
rating id, parameter, last-shifted timestamp) and
194+
``df.attrs["url"]`` (the asset URL it was fetched from). When
195+
``download_and_parse=False``, the raw list of STAC feature dicts
196+
as returned by the search endpoint.
173197
174198
Raises
175199
------

tests/waterdata_ratings_test.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,31 @@ def test_get_ratings_mocked_search_and_download(requests_mock, tmp_path):
100100
assert "monitoring_location_id IN ('USGS-01104475')" in qs["filter"][0]
101101

102102

103+
def test_get_ratings_attaches_rdb_comment_and_url(requests_mock, tmp_path):
104+
"""Each parsed frame should carry its RDB header + source URL in df.attrs."""
105+
requests_mock.get(
106+
"https://api.waterdata.usgs.gov/stac/v0/search",
107+
json=_stub_search_response(),
108+
)
109+
asset_url = (
110+
"https://api.waterdata.usgs.gov/stac-files/ratings/USGS.01104475.exsa.rdb"
111+
)
112+
requests_mock.get(asset_url, text=_SAMPLE_RDB)
113+
114+
out = get_ratings(
115+
monitoring_location_id="USGS-01104475",
116+
file_type="exsa",
117+
file_path=str(tmp_path),
118+
)
119+
df = out["USGS-01104475.exsa.rdb"]
120+
# The fixture has two `# ...` lines at the top; both should land in attrs.
121+
assert df.attrs["comment"] == [
122+
"# header line one",
123+
"# header line two",
124+
]
125+
assert df.attrs["url"] == asset_url
126+
127+
103128
def test_get_ratings_download_and_parse_false_returns_features(requests_mock):
104129
requests_mock.get(
105130
"https://api.waterdata.usgs.gov/stac/v0/search",

0 commit comments

Comments
 (0)