data api client design

Discussion of the data-api client requirements and implementation

Motivation

The data-api client, climada.util.api_client.Client is meant for

providing a generic python interface to the public CLIMADA data api
creating climada Python objects, such as Exposures, Hazard or ImpactFunc, from dataset files of the CLIMADA data api in a comfortable, easy to use way.

The implemented methods are supposed to be as natural as possible and hiding away boilerplate code that downloads files, reads and converts content into CLIMADA objects. Additionally they should take care of caching files on the local filesystem in order to save resources of the api server.

Classes

DataTypeShortInfo

only datatype and group

DataTypeInfo

... plus status, description and properties

DatasetInfo

datatype (DataTypeShortInfo), name and version (unique in the db)
status, activation date and expiration date
uuid
description, doi and license
files (FileInfo)
properties ("name", "value" pairs)

FileInfo

dataset uuid and file name (unique in the db)
format, size and checksum
url

Methods

list_data_type_infos

returns: a list of DataTypeInfo objects

arguments:

data type group (exposures, hazard, impact_func)

purpose: show what kind of datasets are available from climada.ethz.ch

comments:

used to be get_datatypes

suggestions:

get_data_type_info

returns: a DataTypeInfo object

arguments:

data type name

purpose: give additional information about a data type: mandatory and optional properties of datasets from this type.

comments:

used to be get_data_type
used to be more useful when collecting data type properties was expensive, and list_data_type_infos skipped it.
with CLIMADA Data API 1.0 this is not the case anymore.

suggestions:

remove the method and add an optional parameter to list_data_type_infos.

get_properties_datatype

returns:

arguments:

purpose:

coments:

suggestions:

list_dataset_infos

returns: a list of DatasetInfo objects

arguments:

data type
a dictionary of properties
dataset status

purpose: query climada.ethz.ch for datasets matching given arguments

comments:

used to be get_datasets

suggestions:

get_dataset_info

returns: a single DatasetInfo object

arguments: same as list_dataset_infos

purpose: same as list_dataset_info - but raise a descriptive exception if the result of the query doesn't yield exactly 1 dataset.

comments:

used to be get_dataset
is somewhat superfluous. However the method may be used within get_hazard and get_exposures and there it may provide easy to understand feedback if a query is ambiguous or contradictory.

suggestions:

download_dataset

returns: the path of the target directory and the downloaded files

arguments:

dataset
target directory, default: SYSTEM_DIR from config
organize path hirarchically? 'data group type'/'data type'/'dataset name'/'version', default: yes
consistency check method, default: compare sizes

purpose: download the whole data set (all files) into the given target directory - or just point to the files if they have been downloaded before

comments:

the default check is perhaps too optimistic. In case of a mischievous replacing of files on the server a more thorough check (md5) could increase security.

suggestions:

remove optional arguments?
return files only without target directory

download_file

returns:

path

arguments:

FileInfo
target directory
consistency check
number of retries

purpose: downloads a single file from a dataset to the given target destination, checking success and retrying in case of failure

coments: used in download_dataset, to_hazard and to_exposures. In the latter two mainly to allow target destination and skip downloads of non-hdf5 files which is a hypothetical use case.

suggestions:

turn it into a private method

get_hazard

returns: a Hazard object

arguments:

hazard type (i.e., any data type from the 'hazard' data type group)
(target directory for file download)
arguments from get_dataset_info (without data type or data type group)

purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.

comments:

used to include a concatenation of Hazard objects in case more than one dataset matches the requirements and the concatenation is somehow supported (depending on the properties of the datasets). But in the current version concatenation must now be done outside of the Client.

suggestions:

remove the target directory argument and use download_dataset instead of download_file

get_exposures

returns: an Exposures object

arguments:

exposures type (i.e., any data type from the 'exposures' data type group)
(target directory for file download)
arguments from get_dataset_info (without data type or data type group)

purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.

comments:

used to include a concatenation of Exposures objects in case more than one dataset matches the requirements. But in the current version concatenation must now be done outside of the Client.

suggestions:

remove the target directory argument and use download_dataset instead of download_file

get_litpop_default

returns: a Litpop object

arguments:

country
(target directory for file download)

purpose: get the global or country litpop exposures object with fin_mode 'pc' and exponents '(1,1)'

comments:

the method itself is carrier of default values for litpop exposures. That is a bit odd.

suggestions:

move the default values to the config file?
introduce default values for all data types, not just litpop? In this case augment get_exposures and get_hazard with default handling. Which would be more comfortable for the user than getting error messages about ambiguous queries too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

data api client design

Motivation

Classes

DataTypeShortInfo

DataTypeInfo

DatasetInfo

FileInfo

Methods

list_data_type_infos

get_data_type_info

get_properties_datatype

list_dataset_infos

get_dataset_info

download_dataset

download_file

get_hazard

get_exposures

get_litpop_default

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally