Skip to content

Commit 97ff9ee

Browse files
Add option to include timestamps without values when fetching data via get_samples_aggregate() (#147)
1 parent c2c3215 commit 97ff9ee

File tree

4 files changed

+95
-3
lines changed

4 files changed

+95
-3
lines changed

datareservoirio/client.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,7 @@ def get(
431431
df = pd.DataFrame(columns=("index", "values")).astype({"index": "int64"})
432432

433433
try:
434+
# When we move to pandas 3, the .loc here breaks with None start and end, haven't dug into why yet
434435
series = (
435436
df.set_index("index").squeeze("columns").loc[start:end].copy(deep=True)
436437
)
@@ -466,6 +467,7 @@ def get_samples_aggregate(
466467
aggregation_period=None,
467468
aggregation_function=None,
468469
max_page_size=_DEFAULT_MAX_PAGE_SIZE,
470+
include_empty_aggregations=False,
469471
):
470472
"""
471473
Retrieve a series from DataReservoir.io using the samples/aggregate endpoint.
@@ -489,6 +491,8 @@ def get_samples_aggregate(
489491
max_page_size : optional
490492
Maximum number of samples to return per page. The method automatically follows links
491493
to next pages and returns the entire series. For advanced usage.
494+
include_empty_aggregations : optional
495+
Whether to include empty aggregations with no data in the returned series. Default is False.
492496
Returns
493497
-------
494498
pandas.Series
@@ -550,6 +554,7 @@ def get_samples_aggregate(
550554
params["aggregationFunction"] = aggregation_function
551555
params["start"] = start.isoformat()
552556
params["end"] = end.isoformat()
557+
params["includeEmptyAggregations"] = include_empty_aggregations
553558

554559
next_page_link = f"{environment.api_base_url}reservoir/timeseries/{series_id}/samples/aggregate?{urlencode(params)}"
555560

docs/user_guide/access_data.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,19 @@ is *"tick"* (100 nanoseconds).
2929
aggregation_period='15m',
3030
aggregation_function='mean')
3131
32-
# Get all data for selected time period
32+
# Get all available data for selected time period
3333
timeseries = client.get_samples_aggregate(series_id,
3434
start='2024-01-01', end='2024-01-02',
3535
aggregation_period='tick',
3636
aggregation_function='mean')
3737
38+
# Get all datapoints resampled to 1 minute even if there is no data. Empty values will be filled with NaN.
39+
timeseries = client.get_samples_aggregate(series_id,
40+
start='2024-01-01', end='2024-01-02',
41+
aggregation_period='1m',
42+
aggregation_function='mean',
43+
include_empty_aggregations=True)
44+
3845
.. note::
3946

4047
:py:meth:`Client.get_samples_aggregate` returns a :py:class:`pandas.Series`. The :py:mod:`start`, :py:mod:`end`, :py:mod:`aggregation_period` and :py:mod:`aggregation_function` parameters are required.

docs/user_guide/advanced_config.rst

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,84 @@ Using the :py:mod:`max_page_size` parameter in :py:mod:`get_samples_aggregate` m
151151

152152
The :py:meth:`Client.get_samples_aggregate` method uses an endpoint that has support for paging of responses. This means that instead of making one big request, it might make a series of smaller requests traversing links to next pages returned in each partial response.
153153

154-
Normally this is something you don't have to think about. In case you do want to change the maximum number of results returned in one page, you can use the parameter called ``max_page_size`` to alter this number.
154+
Normally this is something you don't have to think about. In case you do want to change the maximum number of results returned in one page, you can use the parameter called ``max_page_size`` to alter this number.
155+
156+
Using the :py:mod:`include_empty_aggregations` parameter in :py:mod:`get_samples_aggregate` method
157+
---------------------------------------------------------------------------------------------------
158+
159+
The :py:meth:`Client.get_samples_aggregate` method aggregates data into fixed intervals based on the ``aggregation_period`` parameter. By default, the method only returns aggregations that contain data.
160+
161+
The ``include_empty_aggregations`` parameter controls whether to include aggregation intervals that have no data points. This is useful when you need a complete time series with regular intervals, even for periods where no measurements were recorded.
162+
163+
**Default behavior (include_empty_aggregations=False):**
164+
165+
When ``include_empty_aggregations`` is ``False`` (default), only aggregations with data are returned. This results in a sparse series that may have gaps.
166+
167+
.. code-block:: python
168+
169+
import datareservoirio as drio
170+
171+
auth = drio.Authenticator()
172+
client = drio.Client(auth)
173+
174+
# Returns only aggregations with data
175+
timeseries = client.get_samples_aggregate(
176+
'your-series-id',
177+
start='2026-02-23',
178+
end='2026-02-24',
179+
aggregation_period='1m',
180+
aggregation_function='mean',
181+
include_empty_aggregations=False # Default
182+
)
183+
184+
print(timeseries)
185+
186+
# Result will only include time intervals that have data.
187+
# 2026-02-23 00:03:00+00:00 2.2
188+
# 2026-02-23 23:56:00+00:00 1.0
189+
190+
**With empty aggregations (include_empty_aggregations=True):**
191+
192+
When ``include_empty_aggregations`` is ``True``, all aggregation intervals within the specified time range are returned, with ``NaN`` (Not a Number) values for intervals that contain no data.
193+
194+
.. code-block:: python
195+
196+
import datareservoirio as drio
197+
198+
auth = drio.Authenticator()
199+
client = drio.Client(auth)
200+
201+
# Returns all aggregations, including those with no data
202+
timeseries = client.get_samples_aggregate(
203+
'your-series-id',
204+
start='2026-02-23',
205+
end='2026-02-24',
206+
aggregation_period='1m',
207+
aggregation_function='mean',
208+
include_empty_aggregations=True
209+
)
210+
211+
print(timeseries)
212+
213+
# Result has a complete time series with NaN values where data is missing
214+
# 2026-02-23 00:00:00+00:00 NaN
215+
# 2026-02-23 00:01:00+00:00 NaN
216+
# 2026-02-23 00:02:00+00:00 NaN
217+
# 2026-02-23 00:03:00+00:00 2.2
218+
# 2026-02-23 00:04:00+00:00 NaN
219+
# ..
220+
# 2026-02-23 23:55:00+00:00 NaN
221+
# 2026-02-23 23:56:00+00:00 1.0
222+
# 2026-02-23 23:57:00+00:00 NaN
223+
# 2026-02-23 23:58:00+00:00 NaN
224+
# 2026-02-23 23:59:00+00:00 NaN
225+
226+
**Use Cases:**
227+
228+
* **Analysis requiring regular intervals:** Set ``include_empty_aggregations=True`` when your analysis requires evenly-spaced data points (e.g., time-series forecasting models that expect regular intervals).
229+
230+
* **Detecting data gaps:** Set ``include_empty_aggregations=True`` if you need to identify periods with missing measurements.
231+
232+
* **Visualization:** Set ``include_empty_aggregations=True`` when creating time-series plots that should display the full time range uniformly.
233+
234+
* **Memory efficiency:** Use ``include_empty_aggregations=False`` (default) if storage or memory is a concern and you only need data-bearing intervals.

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ classifiers = [
2121
dependencies = [
2222
"numpy",
2323
"oauthlib",
24-
"pandas",
24+
"pandas < 3",
2525
"pyarrow",
2626
"requests",
2727
"requests-oauthlib",

0 commit comments

Comments
 (0)