Background
The current sources field of the schema contains an array of sourcePropertyItems. Each item contains the following elements
- property: which property of the feature the source is associated with
- dataset: the dataset identifier, the name of the source
- license: the license that the data falls under
- record_id: the feature source identifier (how you identify the input feature in the dataset)
- update_time: inconsistently applied, but the time that the source data was last updated
- confidence: even more inconsistently applied, but how confident we are in the quality of the data or its existence
The goal here is to provide provenance for our data allowing users to track back and see what Overture data was constructed from. In other words the user can look at the specific record_id within the identified dataset and see where the data really comes from. The problem with this is that our dataset identifier is not unique; we are missing version information. As such I would suggest that we update our sourcePropertyItem definition to include the following:
- property: unchanged
- dataset: unchanged, but will be put on a deprecation path
- license: unchanged
- record_id: unchanged
- update_time: unchanged
- confidence: unchanged
- provider: The name of the entity that produced the data: meta, esri, microsoft, osm, etc.
- resource: The subject or type of data given by the provider: division-names, buildings, planet, etc.
- version: The sortable identifier such as a date or number: 2026-02-13, 5.3, A5692
Basically the dataset is being broken up into the provider + resource. This is being done because a number of our providers give us multiple resources and we want to make it easy for users to filter either a single resource or all. When combined, the provider + resource + version uniquely identify a specific input data snapshot and the record_id can be used to find the specific entity.
Background
The current
sourcesfield of the schema contains an array ofsourcePropertyItems. Each item contains the following elementsThe goal here is to provide provenance for our data allowing users to track back and see what Overture data was constructed from. In other words the user can look at the specific record_id within the identified dataset and see where the data really comes from. The problem with this is that our dataset identifier is not unique; we are missing version information. As such I would suggest that we update our
sourcePropertyItemdefinition to include the following:Basically the
datasetis being broken up into theprovider+resource. This is being done because a number of our providers give us multiple resources and we want to make it easy for users to filter either a single resource or all. When combined, theprovider+resource+versionuniquely identify a specific input data snapshot and the record_id can be used to find the specific entity.