|
3 | 3 | Datetime Information |
4 | 4 | -------------------- |
5 | 5 |
|
6 | | -``dataretrieval`` attempts to normalize time data to UTC time when converting |
7 | | -web service data into dataframes. To do this, in-built pandas functions are |
8 | | -used; either :obj:`pandas.to_datetime()` during the initial datetime object |
9 | | -conversion, or :obj:`pandas.DataFrame.tz_localize()` if the datetime objects |
10 | | -exist but are not UTC-localized. In most cases (single-site and multi-site), |
11 | | -``dataretrieval`` assigns the datetime information as the dataframe *index*, |
12 | | -the exception to this is when incomplete datetime information is available, in |
13 | | -these cases integers are used as the dataframe index (see `PR#58`_ for more |
14 | | -details). |
15 | | - |
16 | | -.. _PR#58: https://github.com/DOI-USGS/dataretrieval-python/pull/58 |
| 6 | +``dataretrieval`` normalizes time data to UTC when converting Water Data API |
| 7 | +responses into dataframes. Timestamps are returned in the ``time`` column (the |
| 8 | +dataframe itself uses a default integer index). For sub-daily data — such as |
| 9 | +continuous (instantaneous) values — ``time`` is a timezone-aware |
| 10 | +``datetime64[us, UTC]`` column. Daily values represent a whole calendar day, |
| 11 | +so their ``time`` column is timezone-naive (dates only). |
17 | 12 |
|
18 | 13 |
|
19 | 14 | Inspecting Timestamps |
20 | 15 | ********************* |
21 | 16 |
|
22 | | -For single sites, the index of the returned dataframe contains pandas |
23 | | -timestamps. |
| 17 | +For continuous data, the ``time`` column holds UTC-localized pandas timestamps. |
| 18 | + |
| 19 | +.. code:: python |
| 20 | +
|
| 21 | + >>> from dataretrieval import waterdata |
| 22 | + >>> df, md = waterdata.get_continuous( |
| 23 | + ... monitoring_location_id="USGS-05427718", |
| 24 | + ... parameter_code="00060", |
| 25 | + ... time="2024-03-01/2024-03-02", |
| 26 | + ... ) |
| 27 | + >>> df["time"].head() |
| 28 | + 0 2024-03-01 00:00:00+00:00 |
| 29 | + 1 2024-03-01 00:15:00+00:00 |
| 30 | + 2 2024-03-01 00:30:00+00:00 |
| 31 | + 3 2024-03-01 00:45:00+00:00 |
| 32 | + 4 2024-03-01 01:00:00+00:00 |
| 33 | + Name: time, dtype: datetime64[us, UTC] |
| 34 | +
|
| 35 | +Each timestamp has the format ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because the values |
| 36 | +are localized to UTC, the offset (``+HH:MM``) is ``+00:00``. You can convert |
| 37 | +them to a local timezone of your choosing with the pandas ``.dt`` accessor. |
24 | 38 |
|
25 | 39 | .. code:: python |
26 | 40 |
|
27 | | - >>> import dataretrieval.nwis as nwis |
28 | | - >>> site = '03339000' |
29 | | - >>> df = nwis.get_record(sites=site, service='peaks', |
30 | | - ... start='2015-01-01', end='2017-12-31') |
31 | | - >>> print(df) |
32 | | - agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd |
33 | | - datetime |
34 | | - 2015-06-08 00:00:00+00:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN |
35 | | - 2015-12-29 00:00:00+00:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN |
36 | | - 2017-05-05 00:00:00+00:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN |
37 | | -
|
38 | | -Here the index of the dataframe ``df`` is a set of datetime objects. Each has |
39 | | -the format, ``YYYY-MM-DD HH:MM:SS+HH:MM``. Because these timestamps are |
40 | | -localized to be in UTC, the expected offset (``+HH:MM``) is ``+00:00``. |
41 | | -These values can be converted to a local timezone of your choosing using |
42 | | -:obj:`pandas` functionality. |
| 41 | + >>> df["time"] = df["time"].dt.tz_convert("America/New_York") |
| 42 | + >>> df["time"].head() |
| 43 | + 0 2024-02-29 19:00:00-05:00 |
| 44 | + 1 2024-02-29 19:15:00-05:00 |
| 45 | + 2 2024-02-29 19:30:00-05:00 |
| 46 | + 3 2024-02-29 19:45:00-05:00 |
| 47 | + 4 2024-02-29 20:00:00-05:00 |
| 48 | + Name: time, dtype: datetime64[us, America/New_York] |
| 49 | +
|
| 50 | +After conversion the timestamps carry New York's offset — ``-05:00`` during |
| 51 | +standard time, or ``-04:00`` during daylight saving time, since New York is 4 |
| 52 | +or 5 hours behind UTC depending on the time of year. Note that the first |
| 53 | +midnight-UTC reading rolls back to the previous calendar day (``2024-02-29``) |
| 54 | +once shifted into New York time. |
| 55 | + |
| 56 | + |
| 57 | +Daily values |
| 58 | +************ |
| 59 | + |
| 60 | +Daily data summarize a whole calendar day, so the ``time`` column is |
| 61 | +timezone-naive — no offset is applied. |
43 | 62 |
|
44 | 63 | .. code:: python |
45 | 64 |
|
46 | | - >>> df.index = df.index.tz_convert(tz='America/New_York') |
47 | | - >>> print(df) |
48 | | - agency_cd site_no peak_tm peak_va peak_cd gage_ht gage_ht_cd year_last_pk ag_dt ag_tm ag_gage_ht ag_gage_ht_cd |
49 | | - datetime |
50 | | - 2015-06-07 20:00:00-04:00 USGS 03339000 17:30 25100 C 22.83 NaN NaN NaN NaN NaN NaN |
51 | | - 2015-12-28 19:00:00-05:00 USGS 03339000 18:45 37600 C 26.66 NaN NaN NaN NaN NaN NaN |
52 | | - 2017-05-04 20:00:00-04:00 USGS 03339000 04:45 17000 C 18.47 NaN NaN NaN NaN NaN NaN |
53 | | -
|
54 | | -Above, the index was converted to localize the timestamps to New York. |
55 | | -In the updated dataframe index, the resulting timestamps now have offsets of |
56 | | -``-04:00`` and ``-05:00`` as New York is either 4 or 5 hours behind UTC |
57 | | -depending on the time of year (due to daylight savings). |
58 | | - |
59 | | -When information for multiple sites is requested, ``dataretrieval`` creates a |
60 | | -dataframe with a multi-index, with the first entry containing the site number, |
61 | | -and the second containing the datetime information. |
62 | | - |
63 | | -.. doctest:: |
64 | | - |
65 | | - >>> import dataretrieval.nwis as nwis |
66 | | - >>> sites = ['180049066381200', '290000095192602'] |
67 | | - >>> df = nwis.get_record(sites=sites, service='gwlevels', |
68 | | - ... start='2021-10-01', end='2022-01-01') |
69 | | - >>> df |
70 | | - agency_cd site_tp_cd lev_dt lev_tm lev_tz_cd ... lev_dt_acy_cd lev_acy_cd lev_src_cd lev_meth_cd lev_age_cd |
71 | | - site_no datetime ... |
72 | | - 180049066381200 2021-10-04 19:54:00+00:00 USGS GW 2021-10-04 19:54 +0000 ... m NaN S S A |
73 | | - 2021-11-16 14:28:00+00:00 USGS GW 2021-11-16 14:28 +0000 ... m NaN S S A |
74 | | - 2021-12-09 10:43:00+00:00 USGS GW 2021-12-09 10:43 +0000 ... m NaN S S A |
75 | | - 290000095192602 2021-12-08 19:07:00+00:00 USGS GW 2021-12-08 19:07 +0000 ... m NaN S S P |
76 | | - <BLANKLINE> |
77 | | - [4 rows x 15 columns] |
78 | | - |
79 | | -Here note that the default datetime index information returned is also UTC |
80 | | -localized, and therefore the offset values are ``+00:00``. |
| 65 | + >>> df, md = waterdata.get_daily( |
| 66 | + ... monitoring_location_id="USGS-05427718", |
| 67 | + ... parameter_code="00060", |
| 68 | + ... time="2024-03-01/2024-03-05", |
| 69 | + ... ) |
| 70 | + >>> df["time"].head() |
| 71 | + 0 2024-03-01 |
| 72 | + 1 2024-03-02 |
| 73 | + 2 2024-03-03 |
| 74 | + 3 2024-03-04 |
| 75 | + 4 2024-03-05 |
| 76 | + Name: time, dtype: datetime64[us] |
0 commit comments