Skip to content

Commit 047a53e

Browse files
thodson-usgsclaude
andauthored
refactor(waterdata)!: snake_case get_samples params (camelCase kept via shim) (#331)
* refactor(waterdata)!: snake_case get_samples params (camelCase kept via shim) Standardize the modern waterdata getter surface on snake_case parameter names ahead of the breaking 1.2.0 release. `get_samples` and `get_samples_summary` were the last getters exposing the Samples API's native camelCase param names; every other OGC getter already uses snake_case. The function still sends the Samples API its native camelCase query parameters: a module-level `_SAMPLES_PARAM_TO_API` dict maps each public snake_case parameter to its camelCase wire name just before the request is built (mirroring how the OGC getters map e.g. `skipGeometry`/`bbox`). Mappings follow `get_monitoring_locations`: `stateFips`->`state_code`, `countyFips`->`county_code`, `countryFips`->`country_code`, `boundingBox`->`bbox`, `monitoringLocationIdentifier`->`monitoring_location_id`; the rest are snake_cased (`usgsPCode`->`usgs_pcode`, `hydrologicUnit`->`hydrologic_unit`, etc.). Docstrings now document each snake_case parameter and note its underlying Samples-API camelCase name. A new generic, testable `_accept_legacy_kwargs(mapping)` decorator (dataretrieval/waterdata/utils.py) lets both getters still accept the old camelCase names, translating them to the new snake_case params and emitting a DeprecationWarning that names the replacement. Existing callers (including the demo notebooks, left untouched) keep working with a warning. BREAKING CHANGE: `get_samples` / `get_samples_summary` parameters are now snake_case. The old camelCase names still work but emit a DeprecationWarning and will be removed in a future release. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd * test(waterdata): cover backward compat of every legacy camelCase get_samples kwarg The existing tests check a couple of deprecated camelCase params end-to-end. Add a single unit test that iterates the whole `_SAMPLES_LEGACY_KWARGS` mapping and asserts, for every legacy name, that it is still accepted, emits a `DeprecationWarning` naming the snake_case replacement, is renamed to that param, and round-trips to the same Samples-API wire name it always used — so every existing camelCase call site keeps producing an identical request. A future param renamed without a legacy alias now fails this test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd * test(waterdata): legacy camelCase get_samples returns identical to snake_case Adds an end-to-end equivalence test: passing every renamed parameter as its legacy camelCase name produces a byte-identical request URL AND an identical DataFrame to the snake_case call (verified for all 21 renamed params at once, offline via pytest-httpx). This is the strongest backward-compat guarantee — the camelCase shim changes nothing the caller observes but the parameter names — and complements the existing per-name mapping test and the warning-path tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent af9b238 commit 047a53e

3 files changed

Lines changed: 352 additions & 96 deletions

File tree

dataretrieval/waterdata/api.py

Lines changed: 129 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
_OUTPUT_ID_BY_SERVICE,
3737
GEOPANDAS,
3838
SAMPLES_URL,
39+
_accept_legacy_kwargs,
3940
_as_str_list,
4041
_check_profiles,
4142
_construct_cql_request,
@@ -2259,32 +2260,70 @@ def _get_samples_csv(
22592260
return df, response
22602261

22612262

2263+
# Map the public snake_case ``get_samples`` parameters to the camelCase query
2264+
# parameter names the Samples API expects on the wire. ``characteristic`` is
2265+
# already snake_case-compatible (single word) and is sent unchanged. The
2266+
# remaining snake_case params are bookkeeping (``service``/``profile``/
2267+
# ``ssl_check``) and never reach the request.
2268+
_SAMPLES_PARAM_TO_API = {
2269+
"activity_media_name": "activityMediaName",
2270+
"activity_start_date_lower": "activityStartDateLower",
2271+
"activity_start_date_upper": "activityStartDateUpper",
2272+
"activity_type_code": "activityTypeCode",
2273+
"characteristic_group": "characteristicGroup",
2274+
"characteristic_user_supplied": "characteristicUserSupplied",
2275+
"bbox": "boundingBox",
2276+
"country_code": "countryFips",
2277+
"state_code": "stateFips",
2278+
"county_code": "countyFips",
2279+
"site_type_code": "siteTypeCode",
2280+
"site_type_name": "siteTypeName",
2281+
"usgs_pcode": "usgsPCode",
2282+
"hydrologic_unit": "hydrologicUnit",
2283+
"monitoring_location_id": "monitoringLocationIdentifier",
2284+
"organization_id": "organizationIdentifier",
2285+
"point_location_latitude": "pointLocationLatitude",
2286+
"point_location_longitude": "pointLocationLongitude",
2287+
"point_location_within_miles": "pointLocationWithinMiles",
2288+
"project_id": "projectIdentifier",
2289+
"record_identifier_user_supplied": "recordIdentifierUserSupplied",
2290+
}
2291+
2292+
# Deprecated camelCase keyword names (the Samples-API spelling) accepted for
2293+
# backward compatibility, mapped to the new snake_case parameter names. Derived
2294+
# from ``_SAMPLES_PARAM_TO_API`` so the two never drift apart.
2295+
_SAMPLES_LEGACY_KWARGS = {
2296+
api_name: py_name for py_name, api_name in _SAMPLES_PARAM_TO_API.items()
2297+
}
2298+
2299+
2300+
@_accept_legacy_kwargs(_SAMPLES_LEGACY_KWARGS)
22622301
def get_samples(
22632302
ssl_check: bool = True,
22642303
service: SERVICES = "results",
22652304
profile: PROFILES = "fullphyschem",
2266-
activityMediaName: str | Iterable[str] | None = None,
2267-
activityStartDateLower: str | None = None,
2268-
activityStartDateUpper: str | None = None,
2269-
activityTypeCode: str | Iterable[str] | None = None,
2270-
characteristicGroup: str | Iterable[str] | None = None,
2305+
activity_media_name: str | Iterable[str] | None = None,
2306+
activity_start_date_lower: str | None = None,
2307+
activity_start_date_upper: str | None = None,
2308+
activity_type_code: str | Iterable[str] | None = None,
2309+
characteristic_group: str | Iterable[str] | None = None,
22712310
characteristic: str | Iterable[str] | None = None,
2272-
characteristicUserSupplied: str | Iterable[str] | None = None,
2273-
boundingBox: list[float] | None = None,
2274-
countryFips: str | Iterable[str] | None = None,
2275-
stateFips: str | Iterable[str] | None = None,
2276-
countyFips: str | Iterable[str] | None = None,
2277-
siteTypeCode: str | Iterable[str] | None = None,
2278-
siteTypeName: str | Iterable[str] | None = None,
2279-
usgsPCode: str | Iterable[str] | None = None,
2280-
hydrologicUnit: str | Iterable[str] | None = None,
2281-
monitoringLocationIdentifier: str | Iterable[str] | None = None,
2282-
organizationIdentifier: str | Iterable[str] | None = None,
2283-
pointLocationLatitude: float | None = None,
2284-
pointLocationLongitude: float | None = None,
2285-
pointLocationWithinMiles: float | None = None,
2286-
projectIdentifier: str | Iterable[str] | None = None,
2287-
recordIdentifierUserSupplied: str | Iterable[str] | None = None,
2311+
characteristic_user_supplied: str | Iterable[str] | None = None,
2312+
bbox: list[float] | None = None,
2313+
country_code: str | Iterable[str] | None = None,
2314+
state_code: str | Iterable[str] | None = None,
2315+
county_code: str | Iterable[str] | None = None,
2316+
site_type_code: str | Iterable[str] | None = None,
2317+
site_type_name: str | Iterable[str] | None = None,
2318+
usgs_pcode: str | Iterable[str] | None = None,
2319+
hydrologic_unit: str | Iterable[str] | None = None,
2320+
monitoring_location_id: str | Iterable[str] | None = None,
2321+
organization_id: str | Iterable[str] | None = None,
2322+
point_location_latitude: float | None = None,
2323+
point_location_longitude: float | None = None,
2324+
point_location_within_miles: float | None = None,
2325+
project_id: str | Iterable[str] | None = None,
2326+
record_identifier_user_supplied: str | Iterable[str] | None = None,
22882327
) -> tuple[pd.DataFrame, BaseMetadata]:
22892328
"""Search Samples database for USGS water quality data.
22902329
This is a wrapper function for the Samples database API. All potential
@@ -2320,35 +2359,38 @@ def get_samples(
23202359
"actgroup", "count"
23212360
projects - "project", "projectmonitoringlocationweight"
23222361
organizations - "organization", "count"
2323-
activityMediaName : string or iterable of strings, optional
2362+
activity_media_name : string or iterable of strings, optional
23242363
Name or code indicating environmental medium in which sample was taken.
23252364
Call ``get_codes("samplemedia")`` for the valid inputs.
2326-
Example: "Water".
2327-
activityStartDateLower : string, optional
2365+
Example: "Water". (Samples API: ``activityMediaName``)
2366+
activity_start_date_lower : string, optional
23282367
The start date if using a date range. Takes the format YYYY-MM-DD.
23292368
The logic is inclusive, i.e. it will also return results that
23302369
match the date. If left as None, will pull all data on or before
2331-
activityStartDateUpper, if populated.
2332-
activityStartDateUpper : string, optional
2370+
``activity_start_date_upper``, if populated.
2371+
(Samples API: ``activityStartDateLower``)
2372+
activity_start_date_upper : string, optional
23332373
The end date if using a date range. Takes the format YYYY-MM-DD.
23342374
The logic is inclusive, i.e. it will also return results that
23352375
match the date. If left as None, will pull all data after
2336-
activityStartDateLower up to the most recent available results.
2337-
activityTypeCode : string or iterable of strings, optional
2376+
``activity_start_date_lower`` up to the most recent available results.
2377+
(Samples API: ``activityStartDateUpper``)
2378+
activity_type_code : string or iterable of strings, optional
23382379
Text code that describes type of field activity performed.
2339-
Example: "Sample-Routine, regular".
2340-
characteristicGroup : string or iterable of strings, optional
2380+
Example: "Sample-Routine, regular". (Samples API: ``activityTypeCode``)
2381+
characteristic_group : string or iterable of strings, optional
23412382
Characteristic group is a broad category of characteristics
23422383
describing one or more results. Call ``get_codes("characteristicgroup")``
23432384
for the valid inputs.
2344-
Example: "Organics, PFAS"
2385+
Example: "Organics, PFAS" (Samples API: ``characteristicGroup``)
23452386
characteristic : string or iterable of strings, optional
23462387
Characteristic is a specific category describing one or more results.
23472388
Call ``get_codes("characteristics")`` for the valid inputs.
2348-
Example: "Suspended Sediment Discharge"
2349-
characteristicUserSupplied : string or iterable of strings, optional
2389+
Example: "Suspended Sediment Discharge" (Samples API: ``characteristic``)
2390+
characteristic_user_supplied : string or iterable of strings, optional
23502391
A user supplied characteristic name describing one or more results.
2351-
boundingBox: list of four floats, optional
2392+
(Samples API: ``characteristicUserSupplied``)
2393+
bbox : list of four floats, optional
23522394
Filters on the associated monitoring location's point location
23532395
by checking if it is located within the specified geographic area.
23542396
The logic is inclusive, i.e. it will include locations that overlap
@@ -2361,55 +2403,63 @@ def get_samples(
23612403
* Eastern-most longitude
23622404
* Northern-most latitude
23632405
2364-
Example: [-92.8,44.2,-88.9,46.0]
2365-
countryFips : string or iterable of strings, optional
2366-
Example: "US" (United States)
2367-
stateFips : string or iterable of strings, optional
2406+
Example: [-92.8,44.2,-88.9,46.0] (Samples API: ``boundingBox``)
2407+
country_code : string or iterable of strings, optional
2408+
Example: "US" (United States) (Samples API: ``countryFips``)
2409+
state_code : string or iterable of strings, optional
23682410
Call ``get_codes("states")`` for the valid inputs.
2369-
Example: "US:15" (United States: Hawaii)
2370-
countyFips : string or iterable of strings, optional
2411+
Example: "US:15" (United States: Hawaii) (Samples API: ``stateFips``)
2412+
county_code : string or iterable of strings, optional
23712413
Call ``get_codes("counties")`` for the valid inputs.
23722414
Example: "US:15:001" (United States: Hawaii, Hawaii County)
2373-
siteTypeCode : string or iterable of strings, optional
2415+
(Samples API: ``countyFips``)
2416+
site_type_code : string or iterable of strings, optional
23742417
An abbreviation for a certain site type. Call ``get_codes("sitetype")``
23752418
for the valid inputs.
2376-
Example: "GW" (Groundwater site)
2377-
siteTypeName : string or iterable of strings, optional
2419+
Example: "GW" (Groundwater site) (Samples API: ``siteTypeCode``)
2420+
site_type_name : string or iterable of strings, optional
23782421
A full name for a certain site type. Call ``get_codes("sitetype")``
23792422
for the valid inputs.
2380-
Example: "Well"
2381-
usgsPCode : string or iterable of strings, optional
2423+
Example: "Well" (Samples API: ``siteTypeName``)
2424+
usgs_pcode : string or iterable of strings, optional
23822425
5-digit number used in the US Geological Survey computerized
23832426
data system, National Water Information System (NWIS), to
23842427
uniquely identify a specific constituent (the ``parameterCode`` column
23852428
of ``get_codes("characteristics")``).
23862429
Example: "00060" (Discharge, cubic feet per second)
2387-
hydrologicUnit : string or iterable of strings, optional
2430+
(Samples API: ``usgsPCode``)
2431+
hydrologic_unit : string or iterable of strings, optional
23882432
Max 12-digit number used to describe a hydrologic unit.
2389-
Example: "070900020502"
2390-
monitoringLocationIdentifier : string or iterable of strings, optional
2433+
Example: "070900020502" (Samples API: ``hydrologicUnit``)
2434+
monitoring_location_id : string or iterable of strings, optional
23912435
A monitoring location identifier has two parts: the agency code
23922436
and the location number, separated by a dash (-).
23932437
Example: "USGS-040851385"
2394-
organizationIdentifier : string or iterable of strings, optional
2438+
(Samples API: ``monitoringLocationIdentifier``)
2439+
organization_id : string or iterable of strings, optional
23952440
Designator used to uniquely identify a specific organization.
23962441
Currently only accepting the organization "USGS".
2397-
pointLocationLatitude : float, optional
2442+
(Samples API: ``organizationIdentifier``)
2443+
point_location_latitude : float, optional
23982444
Latitude for a point/radius query (decimal degrees). Must be used
2399-
with pointLocationLongitude and pointLocationWithinMiles.
2400-
pointLocationLongitude : float, optional
2445+
with ``point_location_longitude`` and ``point_location_within_miles``.
2446+
(Samples API: ``pointLocationLatitude``)
2447+
point_location_longitude : float, optional
24012448
Longitude for a point/radius query (decimal degrees). Must be used
2402-
with pointLocationLatitude and pointLocationWithinMiles.
2403-
pointLocationWithinMiles : float, optional
2449+
with ``point_location_latitude`` and ``point_location_within_miles``.
2450+
(Samples API: ``pointLocationLongitude``)
2451+
point_location_within_miles : float, optional
24042452
Radius for a point/radius query. Must be used with
2405-
pointLocationLatitude and pointLocationLongitude
2406-
projectIdentifier : string or iterable of strings, optional
2453+
``point_location_latitude`` and ``point_location_longitude``.
2454+
(Samples API: ``pointLocationWithinMiles``)
2455+
project_id : string or iterable of strings, optional
24072456
Designator used to uniquely identify a data collection project. Project
24082457
identifiers are specific to an organization (e.g. USGS).
2409-
Example: "ZH003QW03"
2410-
recordIdentifierUserSupplied : string or iterable of strings, optional
2458+
Example: "ZH003QW03" (Samples API: ``projectIdentifier``)
2459+
record_identifier_user_supplied : string or iterable of strings, optional
24112460
Internal AQS record identifier that returns 1 entry. Only available
24122461
for the "results" service.
2462+
(Samples API: ``recordIdentifierUserSupplied``)
24132463
24142464
Returns
24152465
-------
@@ -2432,34 +2482,37 @@ def get_samples(
24322482
24332483
>>> # Get PFAS results within a bounding box
24342484
>>> df, md = dataretrieval.waterdata.get_samples(
2435-
... boundingBox=[-90.2, 42.6, -88.7, 43.2],
2436-
... characteristicGroup="Organics, PFAS",
2485+
... bbox=[-90.2, 42.6, -88.7, 43.2],
2486+
... characteristic_group="Organics, PFAS",
24372487
... )
24382488
24392489
>>> # Get all activities for the Commonwealth of Virginia over a date range
24402490
>>> df, md = dataretrieval.waterdata.get_samples(
24412491
... service="activities",
24422492
... profile="sampact",
2443-
... activityStartDateLower="2023-10-01",
2444-
... activityStartDateUpper="2024-01-01",
2445-
... stateFips="US:51",
2493+
... activity_start_date_lower="2023-10-01",
2494+
... activity_start_date_upper="2024-01-01",
2495+
... state_code="US:51",
24462496
... )
24472497
24482498
>>> # Get all pH samples for two sites in Utah
24492499
>>> df, md = dataretrieval.waterdata.get_samples(
2450-
... monitoringLocationIdentifier=[
2500+
... monitoring_location_id=[
24512501
... "USGS-393147111462301",
24522502
... "USGS-393343111454101",
24532503
... ],
2454-
... usgsPCode="00400",
2504+
... usgs_pcode="00400",
24552505
... )
24562506
24572507
"""
24582508

24592509
_check_profiles(service, profile)
24602510

2461-
# Build argument dictionary, omitting None values
2462-
params = _get_args(locals(), exclude={"ssl_check", "profile"})
2511+
# Build argument dictionary, omitting None values. Parameters are the
2512+
# public snake_case names here; translate them to the camelCase names the
2513+
# Samples API expects just before building the request.
2514+
args = _get_args(locals(), exclude={"ssl_check", "profile"})
2515+
params = {_SAMPLES_PARAM_TO_API.get(key, key): value for key, value in args.items()}
24632516

24642517
params.update({"mimeType": "text/csv"})
24652518

@@ -2474,8 +2527,9 @@ def get_samples(
24742527
return df, BaseMetadata(response)
24752528

24762529

2530+
@_accept_legacy_kwargs({"monitoringLocationIdentifier": "monitoring_location_id"})
24772531
def get_samples_summary(
2478-
monitoringLocationIdentifier: str,
2532+
monitoring_location_id: str,
24792533
ssl_check: bool = True,
24802534
) -> tuple[pd.DataFrame, BaseMetadata]:
24812535
"""Get a summary of discrete water-quality samples at a single monitoring location.
@@ -2493,13 +2547,13 @@ def get_samples_summary(
24932547
24942548
Parameters
24952549
----------
2496-
monitoringLocationIdentifier : string
2550+
monitoring_location_id : string
24972551
A monitoring location identifier has two parts, separated by a dash
24982552
(``-``): the agency code and the location number. Examples:
24992553
``"USGS-040851385"``, ``"AZ014-320821110580701"``,
25002554
``"CAX01-15304600"``. Bare location numbers without an agency prefix
25012555
are accepted by the service but return an empty result, so a prefix
2502-
is effectively required.
2556+
is effectively required. (Samples API: ``monitoringLocationIdentifier``)
25032557
ssl_check : bool, optional
25042558
Check the SSL certificate. Default is True.
25052559
@@ -2516,18 +2570,18 @@ def get_samples_summary(
25162570
25172571
>>> # What discrete-sample data is available at this site?
25182572
>>> df, md = dataretrieval.waterdata.get_samples_summary(
2519-
... monitoringLocationIdentifier="USGS-04074950"
2573+
... monitoring_location_id="USGS-04074950"
25202574
... )
25212575
25222576
"""
2523-
if not isinstance(monitoringLocationIdentifier, str):
2577+
if not isinstance(monitoring_location_id, str):
25242578
raise TypeError(
2525-
"monitoringLocationIdentifier must be a string; the Samples "
2579+
"monitoring_location_id must be a string; the Samples "
25262580
"summary service accepts exactly one monitoring location per "
2527-
f"request, got {type(monitoringLocationIdentifier).__name__}."
2581+
f"request, got {type(monitoring_location_id).__name__}."
25282582
)
25292583

2530-
url = f"{SAMPLES_URL}/summary/{quote(monitoringLocationIdentifier, safe='')}"
2584+
url = f"{SAMPLES_URL}/summary/{quote(monitoring_location_id, safe='')}"
25312585
params = {"mimeType": "text/csv"}
25322586

25332587
df, response = _get_samples_csv(url, params, ssl_check)

0 commit comments

Comments
 (0)