Skip to content

fix(wqp): preserve leading zeros on code columns (HUCs, parameter codes, FIPS)#311

Merged
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/wqp-preserve-leading-zeros
Jun 1, 2026
Merged

fix(wqp): preserve leading zeros on code columns (HUCs, parameter codes, FIPS)#311
thodson-usgs merged 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/wqp-preserve-leading-zeros

Conversation

@thodson-usgs

Copy link
Copy Markdown
Collaborator

Problem

All nine WQP getters read the response with a bare
pd.read_csv(StringIO(text), delimiter=",", low_memory=False), which infers code columns as int/float and silently drops their significant leading zeros:

USGS parameter code  "00060"     -> 60
HUC8                 "07090002"  -> 7090002
FIPS / qualifier codes           -> numeric, zeros lost

R dataRetrieval reads these as character. (The NWIS RDB path is unaffected — it pins site_no/parm_cd to str already.)

Fix

Add a _read_wqp_csv helper (used by all nine read sites): read the header, then re-read with dtype=str for any column whose name is a code/identifier — ends with "code", or contains "identifier"/"huc"/"fips". This covers both the legacy and WQX3.0 column schemas while leaving value columns (e.g. ResultMeasureValue) numeric.

Verification

csv = "Location_HUCEightDigitCode,USGSpcode,ResultMeasureValue\n07090002,00060,1.5\n"
_read_wqp_csv(csv)
#   HUC8 -> "07090002"   (was np.int64(7090002))
#   pcode-> "00060"      (was np.int64(60))
#   ResultMeasureValue -> 1.5 (float, unchanged)

Added a regression test; the full wqp suite (15) passes — df.shape/df.size and the derived *DateTime columns are unchanged by the dtype shift. ruff clean.

Note: the committed WQX3 fixture (wqp3_results.txt) was itself generated post-corruption (its HUC cell is already 7090002), so the regression test uses a constructed row that actually carries a leading zero.

🤖 Generated with Claude Code

…es, FIPS)

The nine WQP getters read responses with a bare
`pd.read_csv(StringIO(text), delimiter=",", low_memory=False)`, which infers
code columns as int/float and silently drops their significant leading zeros:
a USGS parameter code "00060" became 60, HUC8 "07090002" became 7090002.
(R dataRetrieval reads these as character.)

Add a `_read_wqp_csv` helper that reads the header, then re-reads with
`dtype=str` for any column whose name is a code/identifier (ends with "code",
or contains "identifier"/"huc"/"fips") — covering both the legacy and WQX3.0
column schemas — while leaving value columns numeric. All nine read sites use
it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs force-pushed the fix/wqp-preserve-leading-zeros branch from b4961e0 to ed85222 Compare May 31, 2026 22:10
@thodson-usgs thodson-usgs marked this pull request as ready for review June 1, 2026 03:20
@thodson-usgs thodson-usgs merged commit 316af70 into DOI-USGS:main Jun 1, 2026
8 checks passed
@thodson-usgs thodson-usgs deleted the fix/wqp-preserve-leading-zeros branch June 1, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant