Skip to content

data api client design

emanuel-schmid edited this page Jan 17, 2022 · 20 revisions

Discussion of the data-api client requirements and implementation

Motivation

The data-api client, climada.util.api_client.Client is meant for

  • providing a generic python interface to the public CLIMADA data api
  • creating climada Python objects, such as Exposures, Hazard or ImpactFunc, from dataset files of the CLIMADA data api in a comfortable, easy to use way.

The implemented methods are supposed to be as natural as possible and hiding away boilerplate code that downloads files, reads and converts content into CLIMADA objects. Additionally they should take care of caching files on the local filesystem in order to save resources of the api server.

Classes

DataTypeShortInfo

only datatype and group

DataTypeInfo

... plus status, description and properties

DatasetInfo

  • datatype, name and version (unique in the db)
  • status, activation date and expiration date
  • uuid
  • description, doi and license
  • files (FileInfo)
  • properties ("name", "value" pairs)

FileInfo

  • dataset uuid and file name (unique in the db)
  • format, size and checksum
  • url

Methods

list_data_type_infos

returns: a list of DataTypeInfo objects

arguments:

  • data type group (exposures, hazard, impact_func)

purpose: show what kind of datasets are available from climada.ethz.ch

comments:

  • used to be get_datatypes

suggestions:

get_data_type_info

returns: a DataTypeInfo object

arguments:

  • data type name

purpose: give additional information about a data type: mandatory and optional properties of datasets from this type.

comments:

  • used to be get_data_type
  • used to be more useful when collecting data type properties was expensive, and list_data_type_infos skipped it.
    with CLIMADA Data API 1.0 this is not the case anymore.

suggestions:

  • remove the method and add an optional parameter to list_data_type_infos.

list_dataset_infos

returns: a list of DatasetInfo objects

arguments:

  • data type
  • a dictionary of properties
  • dataset status

purpose: query climada.ethz.ch for datasets matching given arguments

comments:

  • used to be get_datasets

suggestions:

get_dataset_info

returns: a single DatasetInfo object

arguments: same as list_dataset_infos

purpose: same as list_dataset_info - but raise a descriptive exception if the result of the query doesn't yield exactly 1 dataset.

comments:

  • used to be get_dataset
  • is somewhat superfluous. However the method may be used within get_hazard and get_exposures and there it may provide easy to understand feedback if a query is ambiguous or contradictory.

suggestions:

get_hazard

returns: a Hazard object

arguments:

  • hazard type (i.e., any data type from the 'hazard' data type group)
  • arguments from get_dataset_info (without data type or data type group)

purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.

comments:

  • used to include a concatenation of Hazard objects in case more than one dataset matches the requirements and the concatenation is somehow supported (depending on the properties of the datasets). But in the current version concatenation must now be done outside of the Client.

suggestions:

get_exposures

returns: an Exposures object

arguments:

  • exposures type (i.e., any data type from the 'exposures' data type group)

purpose: search climada.ethz.ch for a matching hazard, download the file and read it into a Hazard object.

comments:

  • used to include a concatenation of Exposures objects in case more than one dataset matches the requirements. But in the current version concatenation must now be done outside of the Client.

suggestions:

Clone this wiki locally