Skip to content

Commit 20daa00

Browse files
thodson-usgsclaude
andcommitted
feat(waterdata): add waterdata.xarray module returning CF datasets
Add dataretrieval.waterdata.xarray, optional-dependency wrappers that mirror the Water Data time-series getters but return CF-conventions xarray.Dataset objects instead of bare DataFrames. - Ragged (CF contiguous ragged array) layout by default; pass dense=True for the NaN-filled (monitoring_location_id, time) grid with one named variable per parameter. - CF metadata is derived from columns the getters already return (unit_of_measure -> units, statistic_id -> cell_methods, parameter_code -> standard_name/vertical_datum), plus a cached parameter-name lookup; sites carry cf_role=timeseries_id with lon/lat. - Coverage: get_daily, get_continuous, get_latest_continuous, get_latest_daily, get_nearest_continuous, get_peaks, get_field_measurements, get_samples, and preliminary get_stats_por / get_stats_date_range. - xarray is an optional extra (pip install dataretrieval[xarray]); the core package never imports it. Hash-valued ID columns are dropped inside the xarray builders, so the plain getters are left untouched. CF vocabulary maps live in waterdata.types (xarray-free, plain data). Adds a demo notebook + docs entry and offline converter unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ad4e980 commit 20daa00

7 files changed

Lines changed: 2348 additions & 0 deletions

File tree

dataretrieval/waterdata/types.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,62 @@
7474
"count",
7575
],
7676
}
77+
78+
79+
# --- CF / xarray vocabulary mappings ---------------------------------------
80+
# Lookup tables used by :mod:`dataretrieval.waterdata.xarray` to translate
81+
# USGS terms into CF-conventions metadata. Each is intentionally partial:
82+
# anything not listed falls back to a sensible default (raw unit string kept
83+
# verbatim; no standard_name emitted) rather than guessing a wrong CF term.
84+
# They are plain data, so they live here rather than in the (xarray-optional)
85+
# converter module and can be extended without importing xarray.
86+
87+
# USGS unit strings -> UDUNITS / CF-canonical form.
88+
CF_UNIT_MAP = {
89+
"ft^3/s": "ft3 s-1",
90+
"ft3/s": "ft3 s-1",
91+
"ft": "ft",
92+
"in": "in",
93+
"degC": "degC",
94+
"deg C": "degC",
95+
"uS/cm": "uS/cm",
96+
"mg/l": "mg L-1",
97+
"mg/L": "mg L-1",
98+
# UDUNITS 'ton' is the US short ton; 'short_ton' is not a valid UDUNITS name.
99+
"tons/day": "ton day-1",
100+
"%": "percent",
101+
}
102+
103+
# USGS statistic_id -> the operator in a CF ``cell_methods`` string.
104+
CF_CELL_METHODS = {
105+
"00001": "maximum",
106+
"00002": "minimum",
107+
"00003": "mean",
108+
"00006": "sum",
109+
"00008": "median",
110+
"00011": "point", # instantaneous
111+
}
112+
113+
# USGS 5-digit parameter code -> CF standard_name. Deliberately conservative;
114+
# codes without a confident match are left without a standard_name.
115+
CF_STANDARD_NAMES = {
116+
"00060": "water_volume_transport_in_river_channel",
117+
# 00010 (water temperature) is intentionally omitted: ``water_temperature``
118+
# is NOT a CF standard name, and the only valid CF water-temperature name,
119+
# ``sea_water_temperature``, is wrong-domain for USGS freshwater/groundwater.
120+
# Leaving it unmapped keeps the variable's ``long_name`` without emitting an
121+
# invalid or misleading ``standard_name``.
122+
"00065": "water_surface_height_above_reference_datum",
123+
"63160": "water_surface_height_above_reference_datum",
124+
"00045": "lwe_thickness_of_precipitation_amount",
125+
}
126+
127+
# USGS parameter code -> vertical reference datum, attached as a
128+
# ``vertical_datum`` attribute. The two water-surface-height parameters share
129+
# the CF standard_name water_surface_height_above_reference_datum, so the datum
130+
# distinguishes them: gage height (00065) is measured from a local site (gage)
131+
# datum, while stream water level (63160) is referenced to NAVD88.
132+
CF_VERTICAL_DATUM = {
133+
"00065": "local site datum",
134+
"63160": "NAVD88",
135+
}

0 commit comments

Comments
 (0)