-
Notifications
You must be signed in to change notification settings - Fork 154
data api client design
Discussion of the data-api client requirements and implementation
The data-api client, climada.util.api_client.Client is meant for
- providing a generic python interface to the public CLIMADA data api
- creating
climadaPython objects, such asExposures,HazardorImpactFunc, from dataset files of the CLIMADA data api in a comfortable, easy to use way.
The implemented methods are supposed to be as natural as possible and hiding away boilerplate code that downloads files, reads and converts content into CLIMADA objects. Additionally they should take care of caching files on the local filesystem in order to save resources of the api server.
only datatype and group
... plus status, description and properties
- datatype (
DataTypeShortInfo), name and version (unique in the db) - status, activation date and expiration date
- uuid
- description, doi and license
- files (FileInfo)
- properties ("name", "value" pairs)
- dataset uuid and file name (unique in the db)
- format, size and checksum
- url
returns: a list of DataTypeInfo objects
arguments:
- data type group (
exposures,hazard,impact_func)
purpose: show what kind of datasets are available from climada.ethz.ch
comments:
- used to be
get_datatypes
suggestions:
returns: a DataTypeInfo object
arguments:
- data type name
purpose: give additional information about a data type: mandatory and optional properties of datasets from this type.
comments:
- used to be
get_data_type - used to be more useful when collecting data type properties was expensive, and
list_data_type_infosskipped it.
with CLIMADA Data API 1.0 this is not the case anymore.
suggestions:
- remove the method and add an optional parameter to list_data_type_infos.
returns:
arguments:
purpose:
coments:
suggestions:
returns: a list of DatasetInfo objects
arguments:
- data type
- a dictionary of properties
- dataset status
purpose: query climada.ethz.ch for datasets matching given arguments
comments:
- used to be
get_datasets
suggestions:
returns: a single DatasetInfo object
arguments: same as list_dataset_infos
purpose: same as list_dataset_info - but raise a descriptive exception if the result of the query doesn't yield exactly 1 dataset.
comments:
- used to be
get_dataset - is somewhat superfluous. However the method may be used within
get_hazardandget_exposuresand there it may provide easy to understand feedback if a query is ambiguous or contradictory.
suggestions:
returns: the path of the target directory and the downloaded files
arguments:
- dataset
- target directory, default: SYSTEM_DIR from config
- organize path hirarchically? 'data group type'/'data type'/'dataset name'/'version', default: yes
- consistency check method, default: compare sizes
purpose: download the whole data set (all files) into the given target directory - or just point to the files if they have been downloaded before
comments:
- the default check is perhaps too optimistic. In case of a mischievous replacing of files on the server a more thorough check (md5) could increase security.
suggestions:
- remove optional arguments?
- return files only without target directory
returns:
- path
arguments:
- FileInfo
- target directory
- consistency check
- number of retries
purpose: downloads a single file from a dataset to the given target destination, checking success and retrying in case of failure
coments: used in download_dataset, to_hazard and to_exposures. In the latter two mainly to allow target destination and skip downloads of non-hdf5 files which is a hypothetical use case.
suggestions:
- turn it into a private method
returns: a Hazard object
arguments:
- hazard type (i.e., any data type from the 'hazard' data type group)
- (target directory for file download)
- arguments from
get_dataset_info(without data type or data type group)
purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.
comments:
- used to include a concatenation of
Hazardobjects in case more than one dataset matches the requirements and the concatenation is somehow supported (depending on the properties of the datasets). But in the current version concatenation must now be done outside of theClient.
suggestions:
- remove the target directory argument and use
download_datasetinstead ofdownload_file
returns: an Exposures object
arguments:
- exposures type (i.e., any data type from the 'exposures' data type group)
- (target directory for file download)
- arguments from
get_dataset_info(without data type or data type group)
purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.
comments:
- used to include a concatenation of
Exposuresobjects in case more than one dataset matches the requirements. But in the current version concatenation must now be done outside of theClient.
suggestions:
- remove the target directory argument and use
download_datasetinstead ofdownload_file
returns: a Litpop object
arguments:
- country
- (target directory for file download)
purpose: get the global or country litpop exposures object with fin_mode 'pc' and exponents '(1,1)'
comments:
- the method itself is carrier of default values for litpop exposures. That is a bit odd.
suggestions:
- move the default values to the config file?
- introduce default values for all data types, not just litpop? In this case augment
get_exposuresandget_hazardwith default handling. Which would be more comfortable for the user than getting error messages about ambiguous queries too.