Skip to content

Commit f51065d

Browse files
author
Mike
committed
Add resource method mark_data_updated and parameter data_updated to create_in_hdx and update_in_hdx of resource
1 parent 9912e19 commit f51065d

4 files changed

Lines changed: 164 additions & 65 deletions

File tree

documentation/main.md

Lines changed: 83 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ the drop down menu
111111
a. Pass this key as a parameter or within a dictionary
112112

113113
b. Create a JSON or YAML file. The default path is
114-
**.hdx\_configuration.yaml** in the current user's home
114+
**.hdx_configuration.yaml** in the current user's home
115115
directory. Then put in the YAML file:
116116

117117
hdx_key: "HDX API KEY"
@@ -203,16 +203,17 @@ virtualenv if not installed:
203203
9. Use configuration defaults.
204204

205205
If you only want to read data, then connect to the production HDX
206-
server, replacing A_Quick_Example with something short that describes your project:
206+
server, making sure that you replace MyOrg_MyProject with something that
207+
describes your organisation and project:
207208

208-
Configuration.create(hdx_site="prod", user_agent="A_Quick_Example", hdx_read_only=True)
209+
Configuration.create(hdx_site="prod", user_agent="MyOrg_MyProject", hdx_read_only=True)
209210

210211
If you want to write data, then for experimentation, do not use the
211212
production HDX server. Instead you can use one of the test servers.
212213
Assuming you have an API key stored in a file **.hdxkey** in the
213214
current user's home directory:
214215

215-
Configuration.create(hdx_site="stage", user_agent="A_Quick_Example")
216+
Configuration.create(hdx_site="stage", user_agent="MyOrg_MyProject")
216217

217218
10. Read this dataset
218219
[Novel Coronavirus (COVID-19) Cases Data](https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases)
@@ -247,7 +248,22 @@ virtualenv if not installed:
247248
dataset.set_reference_period("PREVIOUS DATE")
248249
dataset.update_in_hdx()
249250

250-
15. Exit and remove virtualenv:
251+
15. If you are storing your data on HDX, you can upload a new file to a
252+
resource:
253+
254+
resource = dataset.get_resource(0)
255+
resource.set_file_to_upload("PATH TO FILE")
256+
resource.update_in_hdx()
257+
258+
16. Alternatively, if you are using a URL to point to data held externally from
259+
HDX, you can mark that the data has been updated before updating the
260+
resource or parent dataset:
261+
262+
resource = dataset.get_resource(2)
263+
resource.mark_data_updated()
264+
dataset.update_in_hdx()
265+
266+
17. Exit and remove virtualenv:
251267

252268
exit()
253269
deactivate
@@ -270,7 +286,7 @@ facades set up both logging and HDX configuration.
270286
The default configuration loads an internal HDX configuration located within the
271287
library, and assumes that there is an API key file called **.hdxkey** in the current
272288
user's home directory **\~** and a YAML project configuration located relative to your
273-
working directory at **config/project\_configuration.yaml** which you must create. The
289+
working directory at **config/project_configuration.yaml** which you must create. The
274290
project configuration is used for any configuration specific to your project.
275291

276292
The default logging configuration reads a configuration file internal to the library
@@ -335,34 +351,35 @@ appropriate keyword arguments ie.
335351

336352
You must supply a user agent using one of the following approaches:
337353

338-
1. Populate parameter **user\_agent** (which can simply be the name of your project)
339-
2. Supply **user\_agent\_config\_yaml** which should point to a YAML file which
340-
contains a parameter **user\_agent**
341-
3. Supply **user\_agent\_config\_yaml** which should point to a YAML file and populate
342-
**user\_agent\_lookup** which is a key to look up in the YAML file which should be of
343-
form:
354+
1. Populate parameter **user_agent** (which should be the name of your
355+
organisation and project)
356+
2. Supply **user_agent_config_yaml** which should point to a YAML file which
357+
contains a parameter **user_agent**
358+
3. Supply **user_agent_config_yaml** which should point to a YAML file and populate
359+
**user_agent_lookup** which is a key to look up in the YAML file which should
360+
be of form:
344361

345362
myproject:
346363
user_agent: test
347364
myproject2:
348365
user_agent: test2
349366

350-
4. Include **user\_agent** in one of the configuration dictionaries or files outlined in
367+
4. Include **user_agent** in one of the configuration dictionaries or files outlined in
351368
the table below eg.
352-
**hdx\_config\_json** or **project\_config\_dict**.
369+
**hdx_config_json** or **project_config_dict**.
353370

354371
**KEYWORD ARGUMENTS** can be:
355372

356373
|Choose|Argument|Type|Value|Default|
357374
|---|---|---|---|---|
358-
| |hdx\_site|Optional\[str\]|HDX site to use eg. prod, feature|test|
359-
| |hdx\_read\_only|bool|Read only or read/write access to HDX|False|
360-
| |hdx\_key|Optional\[str\]|HDX key (not needed for read only)||
361-
|Above or one of:|hdx\_config\_dict|dict|Dictionary with hdx\_site, hdx\_read\_only, hdx\_key||
362-
|or|hdx\_config\_json|str|Path to JSON configuration with values as above||
363-
|or|hdx\_config\_yaml|str|Path to YAML configuration with values as above||
364-
|Zero or one of:|project\_config\_dict|dict|Project specific configuration dictionary||
365-
|or|project\_config\_json|str|Path to JSON Project||
375+
| |hdx_site|Optional\[str\]|HDX site to use eg. prod, feature|test|
376+
| |hdx_read_only|bool|Read only or read/write access to HDX|False|
377+
| |hdx_key|Optional\[str\]|HDX key (not needed for read only)||
378+
|Above or one of:|hdx_config_dict|dict|Dictionary with hdx_site, hdx_read_only, hdx_key||
379+
|or|hdx_config_json|str|Path to JSON configuration with values as above||
380+
|or|hdx_config_yaml|str|Path to YAML configuration with values as above||
381+
|Zero or one of:|project_config_dict|dict|Project specific configuration dictionary||
382+
|or|project_config_json|str|Path to JSON Project||
366383

367384
To access the configuration, you use the **read** method of the **Configuration** class as follows:
368385

@@ -383,7 +400,7 @@ Configuration instances passed to the constructors of HDX objects like Dataset e
383400
## Configuring Logging
384401

385402
If you use a facade from **hdx.facades**, then logging will go to console and errors to
386-
file. If you are not using a facade, you can call **setup\_logging** which takes
403+
file. If you are not using a facade, you can call **setup_logging** which takes
387404
an argument error_file which is False by default. If set to True, errors will be written
388405
to a file.
389406

@@ -409,13 +426,13 @@ Then use the logger like this:
409426

410427
## Operations on HDX Objects
411428

412-
You can read an existing HDX object with the static **read\_from\_hdx** method
429+
You can read an existing HDX object with the static **read_from_hdx** method
413430
which takes an identifier parameter and returns the an object of the appropriate HDX
414431
object type eg. **Dataset** or **None** depending upon whether the object was read eg.
415432

416433
dataset = Dataset.read_from_hdx("DATASET_ID_OR_NAME")
417434

418-
You can search for datasets and resources in HDX using the **search\_in\_hdx** method
435+
You can search for datasets and resources in HDX using the **search_in_hdx** method
419436
which takes a query parameter and returns the a list of objects of the appropriate HDX
420437
object type eg. **list[Dataset]**. Here is an example:
421438

@@ -464,9 +481,9 @@ and recommended, while JSON is also accepted eg.
464481

465482
dataset.update_from_json([path])
466483

467-
The default path if unspecified is **config/hdx\_TYPE\_static.yaml** for YAML and
468-
**config/hdx\_TYPE\_static.json** for JSON where TYPE is an HDX object's type like
469-
dataset or resource eg. **config/hdx\_showcase\_static.json**. The YAML file takes the
484+
The default path if unspecified is **config/hdx_TYPE_static.yaml** for YAML and
485+
**config/hdx_TYPE_static.json** for JSON where TYPE is an HDX object's type like
486+
dataset or resource eg. **config/hdx_showcase_static.json**. The YAML file takes the
470487
following form:
471488

472489
owner_org: "acled"
@@ -485,35 +502,37 @@ Notice how you can define resources (each resource starts with a dash "-") withi
485502
file as shown above.
486503

487504
You can check if all the fields required by HDX are populated by
488-
calling **check\_required\_fields**. This will throw an exception if any fields are
505+
calling **check_required_fields**. This will throw an exception if any fields are
489506
missing. Before the library posts data to HDX, it will call this method automatically.
490507
You can provide a list of fields to ignore in the check. An example usage:
491508

492509
resource.check_required_fields([ignore_fields])
493510

494511
Once the HDX object is ready ie. it has all the required metadata, you simply
495-
call **create\_in\_hdx** eg.
512+
call **create_in_hdx** eg.
496513

497514
dataset.create_in_hdx(allow_no_resources, update_resources,
498515
update_resources_by_name,
499516
remove_additional_resources)
500517

501-
Existing HDX objects can be updated by calling **update\_in\_hdx** eg.
518+
If the object already exists, it will be updated. You can also update
519+
explicitly by calling **update_in_hdx** eg.
502520

503521
dataset.update_in_hdx(update_resources, update_resources_by_name,
504522
remove_additional_resources)
505523

506-
You can delete HDX objects using **delete\_from\_hdx** and update an object that
507-
already exists in HDX with the method **update\_in\_hdx**. These take various boolean
508-
parameters that all have defaults and are documented in the API docs. They do not return
509-
anything and they throw exceptions for failures like the object to update not existing.
524+
You can delete HDX objects using **delete_from_hdx** and update an object that
525+
already exists in HDX with the method **update_in_hdx**. These take various
526+
boolean parameters that all have defaults and are documented in the API docs.
527+
They do not return anything and they throw exceptions for failures like the
528+
object to update not existing.
510529

511530
## Dataset Specific Operations
512531

513532
A dataset can have resources and can be in a showcase.
514533

515534
If you wish to add resources, you can supply a list and call
516-
the **add\_update\_resources** function, for example:
535+
the **add_update_resources** function, for example:
517536

518537
resources = [{
519538
"name": xlsx_resourcename,
@@ -528,27 +547,27 @@ the **add\_update\_resources** function, for example:
528547
resource["description"] = resource["url"].rsplit("/", 1)[-1]
529548
dataset.add_update_resources(resources)
530549

531-
Calling **add\_update\_resources** creates a list of HDX Resource objects in
550+
Calling **add_update_resources** creates a list of HDX Resource objects in
532551
dataset and operations can be performed on those objects.
533552

534-
To see the list of resources, you use the **get\_resources** function eg.
553+
To see the list of resources, you use the **get_resources** function eg.
535554

536555
resources = dataset.get_resources()
537556

538557
If you wish to add one resource, you can supply an id string, dictionary or Resource
539-
object and call the **add\_update\_resource**\* function, for example:
558+
object and call the **add_update_resource** function, for example:
540559

541560
dataset.add_update_resource(resource)
542561

543-
You can delete a Resource object from the dataset using the **delete\_resource** function, for example:
562+
You can delete a Resource object from the dataset using the **delete_resource** function, for example:
544563

545564
dataset.delete_resource(resource)
546565

547566
You can get all the resources from a list of datasets as follows:
548567

549568
resources = Dataset.get_all_resources(datasets)
550569

551-
To see the list of showcases a dataset is in, you use the **get\_showcases** function eg.
570+
To see the list of showcases a dataset is in, you use the **get_showcases** function eg.
552571

553572
showcases = dataset.get_showcases()
554573

@@ -562,12 +581,12 @@ If you wish to add the dataset to a showcase, you must first create the showcase
562581
"url": "http://visualisation/url/"})
563582
showcase.create_in_hdx()
564583

565-
Then you can supply an id, dictionary or Showcase object and call the **add\_showcase**
584+
Then you can supply an id, dictionary or Showcase object and call the **add_showcase**
566585
function, for example:
567586

568587
dataset.add_showcase(showcase)
569588

570-
You can remove the dataset from a showcase using the **remove\_showcase** function, for
589+
You can remove the dataset from a showcase using the **remove_showcase** function, for
571590
example:
572591

573592
dataset.remove_showcase(showcase)
@@ -678,7 +697,7 @@ occur if a valid region name is supplied.
678697

679698
dataset.add_region_location("M49 REGION CODE")
680699

681-
**add\_region\_location** accepts regions, intermediate regions or subregions as
700+
**add_region_location** accepts regions, intermediate regions or subregions as
682701
specified on the
683702
[UNStats M49](https://unstats.un.org/unsd/methodology/m49/overview/) website.
684703

@@ -875,7 +894,7 @@ You can download a resource using the **download** function eg.
875894

876895
url, path = resource.download("FOLDER_TO_DOWNLOAD_TO")
877896

878-
If you do not supply **FOLDER\_TO\_DOWNLOAD\_TO**, then a temporary folder is used.
897+
If you do not supply **FOLDER_TO_DOWNLOAD_TO**, then a temporary folder is used.
879898

880899
Before creating or updating a resource, it is possible to specify the path to a local
881900
file to upload to the HDX filestore if that is preferred over hosting the file
@@ -889,16 +908,29 @@ There is a getter to read the value back:
889908

890909
file_to_upload = resource.get_file_to_upload()
891910

911+
To indicate that the data in an external resource (given by a URL) has been
912+
updated, call **mark_data_updated** on the resource, before calling
913+
**create_in_hdx** or **update_in_hdx** on the dataset which will result in the
914+
resource `last_modified` field being set to now. Alternatively, when calling
915+
**create_in_hdx** or **update_in_hdx** on the resource, it is possible to
916+
supply the parameter `data_updated` eg.
917+
918+
resource.update_in_hdx(data_updated=True)
919+
920+
If the method **set_file_to_upload** is used to supply a file, the resource
921+
`last_modified` field is set to now automatically regardless of the value of
922+
`data_updated`.
923+
892924
## Showcase Management
893925

894926
The **Showcase** class enables you to manage showcases, creating, deleting and updating
895927
(as for other HDX objects) according to your permissions.
896928

897-
To see the list of datasets a showcase is in, you use the **get\_datasets** function eg.
929+
To see the list of datasets a showcase is in, you use the **get_datasets** function eg.
898930

899931
datasets = showcase.get_datasets()
900932

901-
If you wish to add a dataset to a showcase, you call the **add\_dataset** function, for
933+
If you wish to add a dataset to a showcase, you call the **add_dataset** function, for
902934
example:
903935

904936
showcase.add_dataset(dataset)
@@ -1028,10 +1060,10 @@ Next create a file called **run.py** and copy into it the code below.
10281060
facade(main, hdx_site="test")
10291061

10301062
The above file will create in HDX a dataset generated by a function called
1031-
**generate\_dataset** that can be found in the file **my\_code.py** which we will now
1063+
**generate_dataset** that can be found in the file **my_code.py** which we will now
10321064
write.
10331065

1034-
Create a file **my\_code.py** and copy into it the code below:
1066+
Create a file **my_code.py** and copy into it the code below:
10351067

10361068
#!/usr/bin/python
10371069
# -*- coding: utf-8 -*-
@@ -1050,7 +1082,7 @@ Create a file **my\_code.py** and copy into it the code below:
10501082
"""
10511083
logger.debug("Generating dataset!")
10521084

1053-
You can then fill out the function **generate\_dataset** as required.
1085+
You can then fill out the function **generate_dataset** as required.
10541086

10551087
# IDMC Example
10561088

@@ -1061,7 +1093,7 @@ folder. If you run it unchanged, it will overwrite the existing datasets in the
10611093
organisation! Therefore, you should run it against a test server. If you use it as a
10621094
basis for your code, you will need to modify the dataset **name** in **idmc.py** and
10631095
change the organisation information to your organisation. Also update metadata in
1064-
**config/hdx\_dataset\_static.yaml** appropriately.
1096+
**config/hdx_dataset_static.yaml** appropriately.
10651097

10661098
The IDMC scraper creates a dataset per country in HDX, populating all the required
10671099
metadata. It then creates resources with files held on the HDX filestore.

src/hdx/data/hdxobject.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -291,13 +291,13 @@ def _hdx_update(
291291
force_active (bool): Make object state active. Defaults to False.
292292
**kwargs: See below
293293
operation (str): Operation to perform eg. patch. Defaults to update.
294-
ignore_field (str): Any field to ignore when checking dataset metadata. Defaults to None.
294+
ignore_field (str): Any field to ignore when checking metadata. Defaults to None.
295295
296296
Returns:
297297
None
298298
"""
299299
self._check_kwargs_fields(object_type, **kwargs)
300-
operation = kwargs.get("operation", "update")
300+
operation = kwargs.pop("operation", "update")
301301
self._save_to_hdx(
302302
operation, id_field_name, files_to_upload, force_active
303303
)
@@ -319,7 +319,7 @@ def _merge_hdx_update(
319319
force_active (bool): Make object state active. Defaults to False.
320320
**kwargs: See below
321321
operation (str): Operation to perform eg. patch. Defaults to update.
322-
ignore_field (str): Any field to ignore when checking dataset metadata. Defaults to None.
322+
ignore_field (str): Any field to ignore when checking metadata. Defaults to None.
323323
324324
Returns:
325325
None

0 commit comments

Comments
 (0)