Skip to content

Commit df90403

Browse files
thodson-usgsclaude
andcommitted
docs(userguide): rewrite timeconventions onto the Water Data API
Replace the NWIS get_record examples (one of which used the decommissioned gwlevels service) with waterdata.get_continuous / get_daily, and update the guide to the Water Data API datetime model: time is a column (not the index), tz-aware datetime64[us, UTC] for continuous data and tz-naive dates for daily, demonstrated with the .dt.tz_convert idiom. Example output captured from live calls. Drops the NWIS-only index / PR#58 notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c450c40 commit df90403

1 file changed

Lines changed: 60 additions & 64 deletions

File tree

docs/source/userguide/timeconventions.rst

Lines changed: 60 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -3,78 +3,74 @@
33
Datetime Information
44
--------------------
55

6-
``dataretrieval`` attempts to normalize time data to UTC time when converting
7-
web service data into dataframes. To do this, in-built pandas functions are
8-
used; either :obj:`pandas.to_datetime()` during the initial datetime object
9-
conversion, or :obj:`pandas.DataFrame.tz_localize()` if the datetime objects
10-
exist but are not UTC-localized. In most cases (single-site and multi-site),
11-
``dataretrieval`` assigns the datetime information as the dataframe *index*,
12-
the exception to this is when incomplete datetime information is available, in
13-
these cases integers are used as the dataframe index (see `PR#58`_ for more
14-
details).
15-
16-
.. _PR#58: https://github.com/DOI-USGS/dataretrieval-python/pull/58
6+
``dataretrieval`` normalizes time data to UTC when converting Water Data API
7+
responses into dataframes. Timestamps are returned in the ``time`` column (the
8+
dataframe itself uses a default integer index). For sub-daily data — such as
9+
continuous (instantaneous) values — ``time`` is a timezone-aware
10+
``datetime64[us, UTC]`` column. Daily values represent a whole calendar day,
11+
so their ``time`` column is timezone-naive (dates only).
1712

1813

1914
Inspecting Timestamps
2015
*********************
2116

22-
For single sites, the index of the returned dataframe contains pandas
23-
timestamps.
17+
For continuous data, the ``time`` column holds UTC-localized pandas timestamps.
18+
19+
.. code:: python
20+
21+
>>> from dataretrieval import waterdata
22+
>>> df, md = waterdata.get_continuous(
23+
... monitoring_location_id="USGS-05427718",
24+
... parameter_code="00060",
25+
... time="2024-03-01/2024-03-02",
26+
... )
27+
>>> df["time"].head()
28+
0 2024-03-01 00:00:00+00:00
29+
1 2024-03-01 00:15:00+00:00
30+
2 2024-03-01 00:30:00+00:00
31+
3 2024-03-01 00:45:00+00:00
32+
4 2024-03-01 01:00:00+00:00
33+
Name: time, dtype: datetime64[us, UTC]
34+
35+
Each timestamp has the format ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because the values
36+
are localized to UTC, the offset (``+HH:MM``) is ``+00:00``. You can convert
37+
them to a local timezone of your choosing with the pandas ``.dt`` accessor.
2438

2539
.. code:: python
2640
27-
>>> import dataretrieval.nwis as nwis
28-
>>> site = '03339000'
29-
>>> df = nwis.get_record(sites=site, service='peaks',
30-
... start='2015-01-01', end='2017-12-31')
31-
>>> print(df)
32-
agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd
33-
datetime
34-
2015-06-08 00:00:00+00:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN
35-
2015-12-29 00:00:00+00:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN
36-
2017-05-05 00:00:00+00:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN
37-
38-
Here the index of the dataframe ``df`` is a set of datetime objects. Each has
39-
the format, ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because these timestamps are
40-
localized to be in UTC, the expected offset (``+HH:MM``) is ``+00:00``.
41-
These values can be converted to a local timezone of your choosing using
42-
:obj:`pandas` functionality.
41+
>>> df["time"] = df["time"].dt.tz_convert("America/New_York")
42+
>>> df["time"].head()
43+
0 2024-02-29 19:00:00-05:00
44+
1 2024-02-29 19:15:00-05:00
45+
2 2024-02-29 19:30:00-05:00
46+
3 2024-02-29 19:45:00-05:00
47+
4 2024-02-29 20:00:00-05:00
48+
Name: time, dtype: datetime64[us, America/New_York]
49+
50+
After conversion the timestamps carry New York's offset — ``-05:00`` during
51+
standard time, or ``-04:00`` during daylight saving time, since New York is 4
52+
or 5 hours behind UTC depending on the time of year. Note that the first
53+
midnight-UTC reading rolls back to the previous calendar day (``2024-02-29``)
54+
once shifted into New York time.
55+
56+
57+
Daily values
58+
************
59+
60+
Daily data summarize a whole calendar day, so the ``time`` column is
61+
timezone-naive — no offset is applied.
4362

4463
.. code:: python
4564
46-
>>> df.index = df.index.tz_convert(tz='America/New_York')
47-
>>> print(df)
48-
agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd
49-
datetime
50-
2015-06-07 20:00:00-04:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN
51-
2015-12-28 19:00:00-05:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN
52-
2017-05-04 20:00:00-04:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN
53-
54-
Above, the index was converted to localize the timestamps to New York.
55-
In the updated dataframe index, the resulting timestamps now have offsets of
56-
``-04:00`` and ``-05:00`` as New York is either 4 or 5 hours behind UTC
57-
depending on the time of year (due to daylight savings).
58-
59-
When information for multiple sites is requested, ``dataretrieval`` creates a
60-
dataframe with a multi-index, with the first entry containing the site number,
61-
and the second containing the datetime information.
62-
63-
.. doctest::
64-
65-
>>> import dataretrieval.nwis as nwis
66-
>>> sites = ['180049066381200', '290000095192602']
67-
>>> df = nwis.get_record(sites=sites, service='gwlevels',
68-
... start='2021-10-01', end='2022-01-01')
69-
>>> df
70-
agency_cd site_tp_cd lev_dt lev_tm lev_tz_cd ... lev_dt_acy_cd lev_acy_cd lev_src_cd lev_meth_cd lev_age_cd
71-
site_no datetime ...
72-
180049066381200 2021-10-04 19:54:00+00:00 USGS GW 2021-10-04 19:54 +0000 ... m NaN S S A
73-
2021-11-16 14:28:00+00:00 USGS GW 2021-11-16 14:28 +0000 ... m NaN S S A
74-
2021-12-09 10:43:00+00:00 USGS GW 2021-12-09 10:43 +0000 ... m NaN S S A
75-
290000095192602 2021-12-08 19:07:00+00:00 USGS GW 2021-12-08 19:07 +0000 ... m NaN S S P
76-
<BLANKLINE>
77-
[4 rows x 15 columns]
78-
79-
Here note that the default datetime index information returned is also UTC
80-
localized, and therefore the offset values are ``+00:00``.
65+
>>> df, md = waterdata.get_daily(
66+
... monitoring_location_id="USGS-05427718",
67+
... parameter_code="00060",
68+
... time="2024-03-01/2024-03-05",
69+
... )
70+
>>> df["time"].head()
71+
0 2024-03-01
72+
1 2024-03-02
73+
2 2024-03-03
74+
3 2024-03-04
75+
4 2024-03-05
76+
Name: time, dtype: datetime64[us]

0 commit comments

Comments
 (0)