docs(userguide): rewrite timeconventions onto the Water Data API

thodson-usgs · claude · thodson-usgs · commit df904037b857 · 2026-05-26T13:53:08.000-05:00
Replace the NWIS get_record examples (one of which used the decommissioned gwlevels service) with waterdata.get_continuous / get_daily, and update the guide to the Water Data API datetime model: time is a column (not the index), tz-aware datetime64[us, UTC] for continuous data and tz-naive dates for daily, demonstrated with the .dt.tz_convert idiom. Example output captured from live calls. Drops the NWIS-only index / PR#58 notes.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/source/userguide/timeconventions.rst b/docs/source/userguide/timeconventions.rst
@@ -3,78 +3,74 @@
 Datetime Information
 --------------------
 
-``dataretrieval`` attempts to normalize time data to UTC time when converting
-web service data into dataframes. To do this, in-built pandas functions are
-used; either :obj:`pandas.to_datetime()` during the initial datetime object
-conversion, or :obj:`pandas.DataFrame.tz_localize()` if the datetime objects
-exist but are not UTC-localized. In most cases (single-site and multi-site),
-``dataretrieval`` assigns the datetime information as the dataframe *index*,
-the exception to this is when incomplete datetime information is available, in
-these cases integers are used as the dataframe index (see `PR#58`_ for more
-details).
-
-.. _PR#58: https://github.com/DOI-USGS/dataretrieval-python/pull/58
+``dataretrieval`` normalizes time data to UTC when converting Water Data API
+responses into dataframes. Timestamps are returned in the ``time`` column (the
+dataframe itself uses a default integer index). For sub-daily data — such as
+continuous (instantaneous) values — ``time`` is a timezone-aware
+``datetime64[us, UTC]`` column. Daily values represent a whole calendar day,
+so their ``time`` column is timezone-naive (dates only).
 
 
 Inspecting Timestamps
 *********************
 
-For single sites, the index of the returned dataframe contains pandas
-timestamps.
+For continuous data, the ``time`` column holds UTC-localized pandas timestamps.
+
+.. code:: python
+
+    >>> from dataretrieval import waterdata
+    >>> df, md = waterdata.get_continuous(
+    ...     monitoring_location_id="USGS-05427718",
+    ...     parameter_code="00060",
+    ...     time="2024-03-01/2024-03-02",
+    ... )
+    >>> df["time"].head()
+    0   2024-03-01 00:00:00+00:00
+    1   2024-03-01 00:15:00+00:00
+    2   2024-03-01 00:30:00+00:00
+    3   2024-03-01 00:45:00+00:00
+    4   2024-03-01 01:00:00+00:00
+    Name: time, dtype: datetime64[us, UTC]
+
+Each timestamp has the format ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because the values
+are localized to UTC, the offset (``+HH:MM``) is ``+00:00``. You can convert
+them to a local timezone of your choosing with the pandas ``.dt`` accessor.
 
 .. code:: python
 
-    >>> import dataretrieval.nwis as nwis
-    >>> site = '03339000'
-    >>> df = nwis.get_record(sites=site, service='peaks',
-    ...                      start='2015-01-01', end='2017-12-31')
-    >>> print(df)
-                              agency_cd   site_no peak_tm  peak_va peak_cd  gage_ht  gage_ht_cd  year_last_pk  ag_dt  ag_tm  ag_gage_ht  ag_gage_ht_cd
-    datetime
-    2015-06-08 00:00:00+00:00      USGS  03339000   17:30    25100       C    22.83         NaN           NaN    NaN    NaN         NaN            NaN
-    2015-12-29 00:00:00+00:00      USGS  03339000   18:45    37600       C    26.66         NaN           NaN    NaN    NaN         NaN            NaN
-    2017-05-05 00:00:00+00:00      USGS  03339000   04:45    17000       C    18.47         NaN           NaN    NaN    NaN         NaN            NaN
-
-Here the index of the dataframe ``df`` is a set of datetime objects. Each has
-the format, ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because these timestamps are
-localized to be in UTC, the expected offset (``+HH:MM``) is ``+00:00``.
-These values can be converted to a local timezone of your choosing using
-:obj:`pandas` functionality.
+    >>> df["time"] = df["time"].dt.tz_convert("America/New_York")
+    >>> df["time"].head()
+    0   2024-02-29 19:00:00-05:00
+    1   2024-02-29 19:15:00-05:00
+    2   2024-02-29 19:30:00-05:00
+    3   2024-02-29 19:45:00-05:00
+    4   2024-02-29 20:00:00-05:00
+    Name: time, dtype: datetime64[us, America/New_York]
+
+After conversion the timestamps carry New York's offset — ``-05:00`` during
+standard time, or ``-04:00`` during daylight saving time, since New York is 4
+or 5 hours behind UTC depending on the time of year. Note that the first
+midnight-UTC reading rolls back to the previous calendar day (``2024-02-29``)
+once shifted into New York time.
+
+
+Daily values
+************
+
+Daily data summarize a whole calendar day, so the ``time`` column is
+timezone-naive — no offset is applied.
 
 .. code:: python
 
-    >>> df.index = df.index.tz_convert(tz='America/New_York')
-    >>> print(df)
-                              agency_cd   site_no peak_tm  peak_va peak_cd  gage_ht  gage_ht_cd  year_last_pk  ag_dt  ag_tm  ag_gage_ht  ag_gage_ht_cd
-    datetime
-    2015-06-07 20:00:00-04:00      USGS  03339000   17:30    25100       C    22.83         NaN           NaN    NaN    NaN         NaN            NaN
-    2015-12-28 19:00:00-05:00      USGS  03339000   18:45    37600       C    26.66         NaN           NaN    NaN    NaN         NaN            NaN
-    2017-05-04 20:00:00-04:00      USGS  03339000   04:45    17000       C    18.47         NaN           NaN    NaN    NaN         NaN            NaN
-
-Above, the index was converted to localize the timestamps to New York.
-In the updated dataframe index, the resulting timestamps now have offsets of
-``-04:00`` and ``-05:00`` as New York is either 4 or 5 hours behind UTC
-depending on the time of year (due to daylight savings).
-
-When information for multiple sites is requested, ``dataretrieval`` creates a
-dataframe with a multi-index, with the first entry containing the site number,
-and the second containing the datetime information.
-
-.. doctest::
-
-    >>> import dataretrieval.nwis as nwis
-    >>> sites = ['180049066381200', '290000095192602']
-    >>> df = nwis.get_record(sites=sites, service='gwlevels',
-    ...                      start='2021-10-01', end='2022-01-01')
-    >>> df
-                                              agency_cd site_tp_cd      lev_dt lev_tm lev_tz_cd  ...  lev_dt_acy_cd  lev_acy_cd  lev_src_cd  lev_meth_cd lev_age_cd
-    site_no         datetime                                                                     ...
-    180049066381200 2021-10-04 19:54:00+00:00      USGS         GW  2021-10-04  19:54     +0000  ...              m         NaN           S            S          A
-                    2021-11-16 14:28:00+00:00      USGS         GW  2021-11-16  14:28     +0000  ...              m         NaN           S            S          A
-                    2021-12-09 10:43:00+00:00      USGS         GW  2021-12-09  10:43     +0000  ...              m         NaN           S            S          A
-    290000095192602 2021-12-08 19:07:00+00:00      USGS         GW  2021-12-08  19:07     +0000  ...              m         NaN           S            S          P
-    <BLANKLINE>
-    [4 rows x 15 columns]
-
-Here note that the default datetime index information returned is also UTC
-localized, and therefore the offset values are ``+00:00``.
+    >>> df, md = waterdata.get_daily(
+    ...     monitoring_location_id="USGS-05427718",
+    ...     parameter_code="00060",
+    ...     time="2024-03-01/2024-03-05",
+    ... )
+    >>> df["time"].head()
+    0   2024-03-01
+    1   2024-03-02
+    2   2024-03-03
+    3   2024-03-04
+    4   2024-03-05
+    Name: time, dtype: datetime64[us]