Skip to content

Commit f839918

Browse files
authored
Reference period (#41)
* Change to reference period
1 parent 9dec856 commit f839918

5 files changed

Lines changed: 69 additions & 56 deletions

File tree

doc/main.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ upload your datasets to HDX.
2121
- [Configuring Logging](#configuring-logging)
2222
- [Operations on HDX Objects](#operations-on-hdx-objects)
2323
- [Dataset Specific Operations](#dataset-specific-operations)
24-
- [Dataset Date](#dataset-date)
24+
- [Reference Period](#reference-period)
2525
- [Expected Update Frequency](#expected-update-frequency)
2626
- [Location](#location)
2727
- [Tags](#tags)
@@ -48,6 +48,10 @@ The library has detailed API documentation which can be found in the menu at the
4848

4949

5050
## Breaking Changes
51+
From 5.9.8, get_date_of_dataset has become get_reference_period,
52+
set_date_of_dataset has become set_reference_period and set_dataset_year_range
53+
has become set_reference_period_year_range
54+
5155
From 5.9.7, Python 3.7 no longer supported
5256

5357
From 5.8.2, date handling uses timezone aware dates instead of naive dates and defaults
@@ -208,7 +212,7 @@ virtualenv if not installed:
208212
from HDX and view the date of the dataset:
209213

210214
dataset = Dataset.read_from_hdx("novel-coronavirus-2019-ncov-cases")
211-
print(dataset.get_date_of_dataset())
215+
print(dataset.get_reference_period())
212216

213217
11. You can search for datasets on HDX and get their resources:
214218

@@ -226,14 +230,14 @@ virtualenv if not installed:
226230
server. With a dataset to which you have permissions, change the dataset date:
227231

228232
dataset = Dataset.read_from_hdx("ID OR NAME OF DATASET")
229-
print(dataset.get_date_of_dataset()) # record this
230-
dataset.set_date_of_dataset("2015-07-26")
231-
print(dataset.get_date_of_dataset())
233+
print(dataset.get_reference_period()) # record this
234+
dataset.set_reference_period("2015-07-26")
235+
print(dataset.get_reference_period())
232236
dataset.update_in_hdx()
233237

234238
14. You can view it on HDX before changing it back (if you have an API key):
235239

236-
dataset.set_date_of_dataset("PREVIOUS DATE")
240+
dataset.set_reference_period("PREVIOUS DATE")
237241
dataset.update_in_hdx()
238242

239243
15. Exit and remove virtualenv:
@@ -561,35 +565,37 @@ example:
561565

562566
dataset.remove_showcase(showcase)
563567

564-
### Dataset Date
568+
### Reference Period
565569

566-
Dataset date is a mandatory field in HDX. This date is the date of the data in the
567-
dataset, not to be confused with when data was last added/changed in the dataset. It can
568-
be a single date or a range.
570+
Reference Period is a mandatory field in HDX. It is the time period for which
571+
data are collected or calculated and to which, as a result, they refer. The
572+
reference period may be of any length: a year, a month, or even a day. It
573+
should not to be confused with when data was last added/changed in the dataset.
574+
It can be a single date or a range.
569575

570-
To get the dataset date, you can do as shown below. It returns a dictionary containing
576+
To get the reference period, you can do as shown below. It returns a dictionary containing
571577
keys "startdate" (start date as datetime), "enddate" (end date as datetime),
572578
"startdate_str" (start date as string), "enddate_str" (end date as string) and ongoing
573579
(whether the end date is a rolls forward every day). You can supply a
574580
[date format](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior).
575581
If you do not, the output format will be an
576582
[ISO 8601 date](https://en.wikipedia.org/wiki/ISO_8601) eg. 2007-01-25.
577583

578-
dataset_date = dataset.get_date_of_dataset("OPTIONAL FORMAT")
584+
reference_period = dataset.get_reference_period("OPTIONAL FORMAT")
579585

580-
To set the dataset date, you must pass either datetime.datetime objects or strings to
586+
To set the reference period, you must pass either datetime.datetime objects or strings to
581587
the function below. It accepts a start date and an optional end date which if not
582588
supplied is assumed to be the same as the start date. Instead of the end date, the flag
583589
"ongoing" which by default is False can be set to True which indicates that the end date
584590
rolls forward every day.
585591

586-
dataset.set_date_of_dataset("START DATE", "END DATE")
592+
dataset.set_reference_period("START DATE", "END DATE")
587593

588-
The method below allows you to set the dataset's date using a year range. The start and
594+
The method below allows you to set the reference period using a year range. The start and
589595
end year can be supplied as integers or strings. If no end year is supplied then the
590596
range will be from the beginning of the start year to the end of that year.
591597

592-
dataset.set_dataset_year_range(START YEAR, END YEAR)
598+
dataset.set_reference_period_year_range(START YEAR, END YEAR)
593599

594600
### Expected Update Frequency
595601

src/hdx/data/dataset.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1232,7 +1232,7 @@ def autocomplete(
12321232
"""
12331233
return cls._autocomplete(name, limit, configuration)
12341234

1235-
def get_date_of_dataset(
1235+
def get_reference_period(
12361236
self,
12371237
date_format: Optional[str] = None,
12381238
today: datetime = now_utc(),
@@ -1253,14 +1253,14 @@ def get_date_of_dataset(
12531253
self.data.get("dataset_date"), date_format, today
12541254
)
12551255

1256-
def set_date_of_dataset(
1256+
def set_reference_period(
12571257
self,
12581258
startdate: Union[datetime, str],
12591259
enddate: Union[datetime, str, None] = None,
12601260
ongoing: bool = False,
12611261
ignore_timeinfo: bool = True,
12621262
) -> None:
1263-
"""Set dataset date from either datetime objects or strings. Any time and time
1263+
"""Set reference period from either datetime objects or strings. Any time and time
12641264
zone information will be ignored by default (meaning that the time of the start
12651265
date is set to 00:00:00, the time of any end date is set to 23:59:59 and the
12661266
time zone is set to UTC). To have the time and time zone accounted for, set
@@ -1279,12 +1279,12 @@ def set_date_of_dataset(
12791279
startdate, enddate, ongoing, ignore_timeinfo
12801280
)
12811281

1282-
def set_dataset_year_range(
1282+
def set_reference_period_year_range(
12831283
self,
12841284
dataset_year: Union[str, int, Iterable],
12851285
dataset_end_year: Optional[Union[str, int]] = None,
12861286
) -> List[int]:
1287-
"""Set dataset date as a range from year or start and end year.
1287+
"""Set reference period as a range from year or start and end year.
12881288
12891289
Args:
12901290
dataset_year (Union[str, int, Iterable]): Dataset year given as string or int or range in an iterable
@@ -2307,17 +2307,17 @@ def get_hdx_url(self) -> Optional[str]:
23072307
return f"{self.configuration.get_hdx_site_url()}/dataset/{name}"
23082308

23092309
def remove_dates_from_title(
2310-
self, change_title: bool = True, set_dataset_date: bool = False
2310+
self, change_title: bool = True, set_reference_period: bool = False
23112311
) -> List[Tuple[datetime, datetime]]:
23122312
"""Remove dates from dataset title returning sorted the dates that were found in
23132313
title. The title in the dataset metadata will be changed by default. The
2314-
dataset's metadata field dataset date will not be changed by default, but if
2315-
set_dataset_date is True, then the range with the lowest start date will be used
2316-
to set the dataset date field.
2314+
dataset's metadata field reference period will not be changed by default, but if
2315+
set_reference_period is True, then the range with the lowest start date will be used
2316+
to set the reference period field.
23172317
23182318
Args:
23192319
change_title (bool): Whether to change the dataset title. Defaults to True.
2320-
set_dataset_date (bool): Whether to set dataset date from date or range in title. Defaults to False.
2320+
set_reference_period (bool): Whether to set reference period from date or range in title. Defaults to False.
23212321
23222322
Returns:
23232323
List[Tuple[datetime,datetime]]: Date ranges found in title
@@ -2328,9 +2328,9 @@ def remove_dates_from_title(
23282328
newtitle, ranges = DatasetTitleHelper.get_dates_from_title(title)
23292329
if change_title:
23302330
self.data["title"] = newtitle
2331-
if set_dataset_date and len(ranges) != 0:
2331+
if set_reference_period and len(ranges) != 0:
23322332
startdate, enddate = ranges[0]
2333-
self.set_date_of_dataset(startdate, enddate)
2333+
self.set_reference_period(startdate, enddate)
23342334
return ranges
23352335

23362336
def generate_resource_from_rows(
@@ -2431,7 +2431,7 @@ def generate_resource_from_iterator(
24312431
to it the dataset. The returned dictionary will contain the resource in the key
24322432
resource, headers in the key headers and list of rows in the key rows.
24332433
2434-
The date of dataset can optionally be set by supplying a column in which the
2434+
The reference period can optionally be set by supplying a column in which the
24352435
date or year is to be looked up. Note that any timezone information is ignored
24362436
and UTC assumed. Alternatively, a function can be supplied to handle any dates
24372437
in a row. It should accept a row and should return None to ignore the row or a
@@ -2465,7 +2465,7 @@ def generate_resource_from_iterator(
24652465
folder (str): Folder to which to write file containing rows
24662466
filename (str): Filename of file to write rows
24672467
resourcedata (Dict): Resource data
2468-
datecol (Optional[Union[int,str]]): Date column for setting dataset date. Defaults to None (don't set).
2468+
datecol (Optional[Union[int,str]]): Date column for setting reference period. Defaults to None (don't set).
24692469
yearcol (Optional[Union[int,str]]): Year column for setting dataset year range. Defaults to None (don't set).
24702470
date_function (Optional[Callable[[Dict],Optional[Dict]]]): Date function to call for each row. Defaults to None.
24712471
quickcharts (Optional[Dict]): Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtags
@@ -2588,7 +2588,7 @@ def datecol_function(row):
25882588
else:
25892589
retdict["startdate"] = dates[0]
25902590
retdict["enddate"] = dates[1]
2591-
self.set_date_of_dataset(dates[0], dates[1])
2591+
self.set_reference_period(dates[0], dates[1])
25922592
resource = self.generate_resource_from_rows(
25932593
folder,
25942594
filename,
@@ -2648,7 +2648,7 @@ def download_and_generate_resource(
26482648
(which will be in dict or list form depending upon the dict_rows argument) and
26492649
outputs a modified row.
26502650
2651-
The date of dataset can optionally be set by supplying a column in which the
2651+
The reference period can optionally be set by supplying a column in which the
26522652
date or year is to be looked up. Note that any timezone information is ignored
26532653
and UTC assumed. Alternatively, a function can be supplied to handle any dates
26542654
in a row. It should accept a row and should return None to ignore the row or a
@@ -2684,7 +2684,7 @@ def download_and_generate_resource(
26842684
resourcedata (Dict): Resource data
26852685
header_insertions (Optional[ListTuple[Tuple[int,str]]]): List of (position, header) to insert. Defaults to None.
26862686
row_function (Optional[Callable[[List[str],Dict],Dict]]): Function to call for each row. Defaults to None.
2687-
datecol (Optional[str]): Date column for setting dataset date. Defaults to None (don't set).
2687+
datecol (Optional[str]): Date column for setting reference period. Defaults to None (don't set).
26882688
yearcol (Optional[str]): Year column for setting dataset year range. Defaults to None (don't set).
26892689
date_function (Optional[Callable[[Dict],Optional[Dict]]]): Date function to call for each row. Defaults to None.
26902690
quickcharts (Optional[Dict]): Dictionary containing optional keys: hashtag, values, cutdown and/or cutdownhashtags

test-requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
pytest==7.2.1
22
pytest-cov==4.0.0
3-
tox==4.4.4
3+
tox==4.4.5
44
-r requirements.txt

tests/hdx/data/test_dataset_noncore.py

Lines changed: 27 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -194,34 +194,34 @@ def test_get_hdx_url(
194194

195195
def test_get_set_date_of_dataset(self):
196196
dataset = Dataset({"dataset_date": "[2020-01-07T00:00:00 TO *]"})
197-
result = dataset.get_date_of_dataset(today=datetime(2020, 11, 17))
197+
result = dataset.get_reference_period(today=datetime(2020, 11, 17))
198198
assert result == {
199199
"startdate": datetime(2020, 1, 7, 0, 0, tzinfo=timezone.utc),
200200
"enddate": datetime(2020, 11, 17, 23, 59, 59, tzinfo=timezone.utc),
201201
"startdate_str": "2020-01-07T00:00:00+00:00",
202202
"enddate_str": "2020-11-17T23:59:59+00:00",
203203
"ongoing": True,
204204
}
205-
dataset.set_date_of_dataset("2020-02-09")
206-
result = dataset.get_date_of_dataset("%d/%m/%Y")
205+
dataset.set_reference_period("2020-02-09")
206+
result = dataset.get_reference_period("%d/%m/%Y")
207207
assert result == {
208208
"startdate": datetime(2020, 2, 9, 0, 0, tzinfo=timezone.utc),
209209
"enddate": datetime(2020, 2, 9, 23, 59, 59, tzinfo=timezone.utc),
210210
"startdate_str": "09/02/2020",
211211
"enddate_str": "09/02/2020",
212212
"ongoing": False,
213213
}
214-
dataset.set_date_of_dataset("2020-02-09", "2020-10-20")
215-
result = dataset.get_date_of_dataset("%d/%m/%Y")
214+
dataset.set_reference_period("2020-02-09", "2020-10-20")
215+
result = dataset.get_reference_period("%d/%m/%Y")
216216
assert result == {
217217
"startdate": datetime(2020, 2, 9, 0, 0, tzinfo=timezone.utc),
218218
"enddate": datetime(2020, 10, 20, 23, 59, 59, tzinfo=timezone.utc),
219219
"startdate_str": "09/02/2020",
220220
"enddate_str": "20/10/2020",
221221
"ongoing": False,
222222
}
223-
dataset.set_date_of_dataset("2020-02-09", ongoing=True)
224-
result = dataset.get_date_of_dataset(
223+
dataset.set_reference_period("2020-02-09", ongoing=True)
224+
result = dataset.get_reference_period(
225225
"%d/%m/%Y", today=datetime(2020, 3, 9, 0, 0)
226226
)
227227
assert result == {
@@ -234,17 +234,17 @@ def test_get_set_date_of_dataset(self):
234234

235235
def test_set_dataset_year_range(self, configuration):
236236
dataset = Dataset()
237-
retval = dataset.set_dataset_year_range(2001, 2015)
237+
retval = dataset.set_reference_period_year_range(2001, 2015)
238238
assert retval == [2001, 2015]
239-
retval = dataset.set_dataset_year_range("2010", "2017")
239+
retval = dataset.set_reference_period_year_range("2010", "2017")
240240
assert retval == [2010, 2017]
241-
retval = dataset.set_dataset_year_range("2013")
241+
retval = dataset.set_reference_period_year_range("2013")
242242
assert retval == [2013]
243-
retval = dataset.set_dataset_year_range({2005, 2002, 2003})
243+
retval = dataset.set_reference_period_year_range({2005, 2002, 2003})
244244
assert retval == [2002, 2003, 2005]
245-
retval = dataset.set_dataset_year_range([2005, 2002, 2003])
245+
retval = dataset.set_reference_period_year_range([2005, 2002, 2003])
246246
assert retval == [2002, 2003, 2005]
247-
retval = dataset.set_dataset_year_range((2005, 2002, 2003))
247+
retval = dataset.set_reference_period_year_range((2005, 2002, 2003))
248248
assert retval == [2002, 2003, 2005]
249249

250250
def test_is_set_subnational(self):
@@ -879,7 +879,10 @@ def test_remove_dates_from_title(self):
879879
assert dataset.remove_dates_from_title() == list()
880880
assert dataset["title"] == title
881881
assert "dataset_date" not in dataset
882-
assert dataset.remove_dates_from_title(set_dataset_date=True) == list()
882+
assert (
883+
dataset.remove_dates_from_title(set_reference_period=True)
884+
== list()
885+
)
883886
title = "ICA Armenia, 2017 - Drought Risk, 1981-2015"
884887
dataset["title"] = title
885888
expected = [
@@ -901,7 +904,8 @@ def test_remove_dates_from_title(self):
901904
assert "dataset_date" not in dataset
902905
dataset["title"] = title
903906
assert (
904-
dataset.remove_dates_from_title(set_dataset_date=True) == expected
907+
dataset.remove_dates_from_title(set_reference_period=True)
908+
== expected
905909
)
906910
assert dataset["title"] == newtitle
907911
assert (
@@ -917,7 +921,8 @@ def test_remove_dates_from_title(self):
917921
)
918922
]
919923
assert (
920-
dataset.remove_dates_from_title(set_dataset_date=True) == expected
924+
dataset.remove_dates_from_title(set_reference_period=True)
925+
== expected
921926
)
922927
assert dataset["title"] == "Mon_State_Village_Tract_Boundaries 9999"
923928
assert (
@@ -926,7 +931,8 @@ def test_remove_dates_from_title(self):
926931
)
927932
dataset["title"] = "Mon_State_Village_Tract_Boundaries 2001 99"
928933
assert (
929-
dataset.remove_dates_from_title(set_dataset_date=True) == expected
934+
dataset.remove_dates_from_title(set_reference_period=True)
935+
== expected
930936
)
931937
assert dataset["title"] == "Mon_State_Village_Tract_Boundaries 99"
932938
assert (
@@ -935,7 +941,8 @@ def test_remove_dates_from_title(self):
935941
)
936942
dataset["title"] = "Mon_State_Village_Tract_Boundaries 9999 2001 99"
937943
assert (
938-
dataset.remove_dates_from_title(set_dataset_date=True) == expected
944+
dataset.remove_dates_from_title(set_reference_period=True)
945+
== expected
939946
)
940947
assert dataset["title"] == "Mon_State_Village_Tract_Boundaries 9999 99"
941948
assert (
@@ -1702,7 +1709,7 @@ def test_load_save_to_json(self, vocabulary_read):
17021709
dataset.set_organization("fb7c2910-6080-4b66-8b4f-0be9b6dc4d8e")
17031710
start_date = "2020-02-09"
17041711
end_date = "2020-10-20"
1705-
dataset.set_date_of_dataset(start_date, end_date)
1712+
dataset.set_reference_period(start_date, end_date)
17061713
expected_update_frequency = "Every day"
17071714
dataset.set_expected_update_frequency(expected_update_frequency)
17081715
dataset.set_subnational(False)
@@ -1721,7 +1728,7 @@ def test_load_save_to_json(self, vocabulary_read):
17211728
dataset = Dataset.load_from_json(path)
17221729
assert dataset["name"] == name
17231730
assert dataset["maintainer"] == maintainer
1724-
dateinfo = dataset.get_date_of_dataset()
1731+
dateinfo = dataset.get_reference_period()
17251732
assert dateinfo["startdate_str"][:10] == start_date
17261733
assert dateinfo["enddate_str"][:10] == end_date
17271734
assert (

tests/hdx/data/test_update_logic.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def new_dataset(self, dataset_data):
7070
new_dataset.set_organization("c021f6be-3598-418e-8f7f-c7a799194dba")
7171
new_dataset.set_expected_update_frequency("Every month")
7272
new_dataset.set_subnational(False)
73-
new_dataset.set_dataset_year_range(1961, 2019)
73+
new_dataset.set_reference_period_year_range(1961, 2019)
7474
new_dataset.add_country_location("zmb")
7575
new_dataset.add_tag("hxl")
7676
return new_dataset
@@ -82,7 +82,7 @@ def dataset(self, dataset_data, resources_yaml):
8282
dataset.set_organization("c021f6be-3598-418e-8f7f-c7a799194dba")
8383
dataset.set_expected_update_frequency("Every month")
8484
dataset.set_subnational(False)
85-
dataset.set_dataset_year_range(1961, 2019)
85+
dataset.set_reference_period_year_range(1961, 2019)
8686
dataset.add_country_location("zmb")
8787
dataset.add_tag("hxl")
8888
dataset["id"] = "3adc4bb0-faef-42ae-bd67-0ea08918a629"

0 commit comments

Comments
 (0)