Skip to content

Commit 459f391

Browse files
thodson-usgsclaude
andcommitted
refactor(waterdata): module coherence — unify id extraction and empty-frame idiom
Two behavior-preserving coherence fixes in the OGC response path (live get_daily/get_continuous/get_monitoring_locations bit-for-bit unchanged; 184 tests + README/time-conventions doctests pass): - `_get_resp_data`'s geopandas branch now extracts `id` the same way as its own non-geopandas branch — `[f.get("id") for f in features]` — instead of a second full `pd.json_normalize(features)["id"]`. Drops a redundant whole-feature normalize pass and is None-safe (the json_normalize form KeyErrors on id-less features); the two halves of the function now match. - Add `_empty_frame(geopd)` and call it at the four sites that spelled out `gpd.GeoDataFrame() if geopd else pd.DataFrame()`, centralizing the "empty page, preserve the geo/plain type" invariant that was folklore repeated across the OGC and stats parsers. Deferred as a coherence backlog (out of scope for this bugfix PR — mostly public-API or behavior changes): unifying error handling across the samples/codes/ratings direct-httpx callers (raw raise_for_status vs typed _raise_for_non_200), the ssl_check exposure asymmetry, get_codes returning a bare DataFrame, get_stats_data's inline finalizer vs the OGC closure, the samples GET-tail duplication, and assorted api.py docstring/naming drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent a37cb5c commit 459f391

1 file changed

Lines changed: 20 additions & 9 deletions

File tree

dataretrieval/waterdata/utils.py

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,17 @@ def _next_req_url(
722722
return None
723723

724724

725+
def _empty_frame(geopd: bool) -> pd.DataFrame:
726+
"""Empty result frame that preserves the result type.
727+
728+
Returns a ``GeoDataFrame`` when geopandas is installed (so a later
729+
``pd.concat`` with a real geo page keeps geometry/CRS) and a plain
730+
``DataFrame`` otherwise. Centralizes the "empty page, keep the type"
731+
invariant the OGC and stats parsers both rely on.
732+
"""
733+
return gpd.GeoDataFrame() if geopd else pd.DataFrame()
734+
735+
725736
def _get_resp_data(
726737
resp: httpx.Response,
727738
geopd: bool,
@@ -768,11 +779,9 @@ def _get_resp_data(
768779
if body is None:
769780
body = resp.json()
770781
if not body.get("numberReturned"):
771-
# Preserve the GeoDataFrame type on empty short-circuit so a
772-
# downstream ``pd.concat([empty_page, geo_page])`` doesn't
773-
# downgrade the geopd-installed user's result to a plain
774-
# DataFrame (stripping geometry/CRS).
775-
return gpd.GeoDataFrame() if geopd else pd.DataFrame()
782+
# No rows on this page — short-circuit with a type-preserving
783+
# empty frame (see ``_empty_frame``).
784+
return _empty_frame(geopd)
776785

777786
# Defensive: a 200 with ``numberReturned > 0`` but missing
778787
# ``features`` is a real schema-drift shape (mirrors the guard in
@@ -781,7 +790,7 @@ def _get_resp_data(
781790
# transient transport error to ``_paginate``'s exception handler.
782791
features = body.get("features") or []
783792
if not features:
784-
return gpd.GeoDataFrame() if geopd else pd.DataFrame()
793+
return _empty_frame(geopd)
785794

786795
if not geopd:
787796
df = pd.json_normalize([f.get("properties") or {} for f in features], sep="_")
@@ -798,7 +807,9 @@ def _get_resp_data(
798807

799808
# Organize json into geodataframe and make sure id column comes along.
800809
df = gpd.GeoDataFrame.from_features(features)
801-
df["id"] = pd.json_normalize(features)["id"].values
810+
# Same id extraction as the non-geopandas branch above: None-safe and
811+
# avoids a second full ``json_normalize`` pass over every feature.
812+
df["id"] = [f.get("id") for f in features]
802813
df = df[["id"] + [col for col in df.columns if col != "id"]]
803814

804815
# If no geometry present, then return pandas dataframe. A geodataframe
@@ -1361,7 +1372,7 @@ def _handle_stats_nesting(
13611372
fields like ``geometry.type`` from leaking into the result.
13621373
"""
13631374
if body is None:
1364-
return gpd.GeoDataFrame() if geopd else pd.DataFrame()
1375+
return _empty_frame(geopd)
13651376

13661377
# An empty (or missing) features list — a real mid-pagination
13671378
# shape — would otherwise crash the downstream merge with
@@ -1372,7 +1383,7 @@ def _handle_stats_nesting(
13721383
# plain DataFrame and strip geometry/CRS.
13731384
features = body.get("features") or []
13741385
if not features:
1375-
return gpd.GeoDataFrame() if geopd else pd.DataFrame()
1386+
return _empty_frame(geopd)
13761387

13771388
# The geopd-missing warning is emitted once at import (see top of module);
13781389
# doing it here would log per page.

0 commit comments

Comments
 (0)