Skip to content

Commit d7e03af

Browse files
clevinsonclaude
andauthored
Clev/model metadata v1 (#11)
* Add Model experiment type and refactor Experiment hierarchy Split Experiment into abstract base class + InSituExperiment to separate in-situ-specific fields (vertical_coverage, permits, meteorological data, etc.) from fields shared with model experiments. Intervention and Tracer now inherit from InSituExperiment. New Model class (is_a: Experiment) with ModelComponent, ModelGrid, and ModelInputDetails. Renamed model_simulation.yaml to model.yaml. Added GridType and ModelComponentType enums. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add Dataset hierarchy: abstract Dataset, FieldDataset, and ModelSimulationDataset Refactor Dataset into an abstract base class with two concrete subclasses, mirroring the Experiment hierarchy pattern: - Dataset (abstract): common fields shared by all dataset types (name, description, project_id, experiment_id, dataset_type, data_submitter, license, filenames, etc.) - FieldDataset: field/observational-specific fields (temporal_coverage, platform_info, data_product_type, qc_flag_scheme, calibration_files, variables) - ModelSimulationDataset: model-specific fields (simulation_type, start/end_datetime, spin_up_protocol, output_frequency, time_stepping_scheme, alkalinity_perturbation_description, hardware_configuration, model_output_variables) with a conditional rule requiring alkalinity_perturbation_description when simulation_type = "perturbation" Also adds HardwareConfiguration class and two new enums: SimulationType (counterfactual/perturbation) and ModelSimulationVariable (10 common model output variables). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * reset Dataset and Experiment to no longer be abstract classes * experiment-level model metadata v1 feedback * reset custom field from grid_type to model_component_type * add conditional rendering rule for model_component_type_custom Show and require model_component_type_custom only when model_component_type is "other". Also fixes typo "Compoent" → "Component". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix 'Veritical' typo in ModelGrid vertical_coordinate_type title Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * updates for dataset level model metadata --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 66b91f3 commit d7e03af

8 files changed

Lines changed: 1937 additions & 326 deletions

File tree

project/jsonschema/oae_data_protocol.schema.json

Lines changed: 843 additions & 132 deletions
Large diffs are not rendered by default.

src/oae_data_protocol/datamodel/oae_data_protocol.py

Lines changed: 683 additions & 61 deletions
Large diffs are not rendered by default.

src/oae_data_protocol/schema/dataset.yaml

Lines changed: 117 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,23 +5,27 @@ default_prefix: oae
55

66
classes:
77
Dataset:
8+
# TODO: We may want to revert to set 'abstract: true' again once we set `include_range_class_descendents: true`
9+
# As of right now, setting Dataset to abstract means we don't get it in the JSON Schema at all and it breaks
10+
# downstream usage
11+
# abstract: true
812
description: >-
9-
A dataset related to an OAE experiment. Generally following guidelines & best practices as outlined in
13+
Abstract base class for all dataset types. Contains fields common to both field/observational and model simulation
14+
datasets. Generally following guidelines & best practices as outlined in
1015
[science-on-schema.org](https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md)
1116
slots:
1217
- name
1318
- description
1419
- project_id
1520
- experiment_id
16-
- temporal_coverage
1721
slot_usage:
1822
- name:
1923
title: Dataset Title
2024
range: string
2125
required: true
2226
description: >-
2327
A brief descriptive sentence that summarizes the content of a dataset. Here is one example:
24-
28+
2529
"Dissolved inorganic carbon, total alkalinity, pH, temperature, salinity and other variables collected from
2630
profile and discrete sample observations using CTD, Niskin bottle, and other instruments from R/V Wecoma in
2731
the U.S. West Coast California Current System during the 2011 West Coast Ocean Acidification Cruise
@@ -46,7 +50,7 @@ classes:
4650
# TODO: Add pre/post condition slots for _custom field rendering
4751
title: Dataset Type
4852
description: >-
49-
Selected controlled vocabularies for data types relevant to mCDR have been referenced from NASAs SeaBASS
53+
Selected controlled vocabularies for data types relevant to mCDR have been referenced from NASA's SeaBASS
5054
metadata system and are provided below, for additional data types of optical characteristics see the [SeaBASS
5155
controlled definitions list](https://seabass.gsfc.nasa.gov/wiki/metadataheaders#data_type). Additional data
5256
types have been included to meet the needs of mCDR field projects.
@@ -82,6 +86,30 @@ classes:
8286
title: Fair Use Data Request
8387
description: A statement from the data producer regarding how this dataset should be used.
8488
range: string
89+
filenames:
90+
# TODO: Should be upgraded to be compatilbe with schema:distribution
91+
# https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#distributions
92+
title: Filenames
93+
range: string
94+
required: true
95+
multivalued: true
96+
# TODO: workaround since JSON Schema treats [] values as satisfying the "required" constraint
97+
# It might be nice to fix this in a linkml PR to have minItems=1 automatically set for required lists in the
98+
# JSONSchema generator
99+
minimum_cardinality: 1
100+
101+
exact_mappings:
102+
- "schema:Dataset"
103+
- "dcat:Dataset"
104+
105+
FieldDataset:
106+
is_a: Dataset
107+
description: >-
108+
A field or observational dataset related to an OAE experiment. Contains fields specific to in-situ data
109+
collection such as platform information, calibration files, QC flags, and measured variables.
110+
slots:
111+
- temporal_coverage
112+
attributes:
85113
data_product_type:
86114
title: Data Product Type
87115
description: >-
@@ -119,18 +147,93 @@ classes:
119147
slot_uri: schema:variableMeasured
120148
range: Variable
121149
multivalued: true
150+
151+
ModelOutputDataset:
152+
is_a: Dataset
153+
description: >-
154+
A model simulation output dataset. Contains fields specific to computational model output including
155+
simulation configuration, output variables, and hardware information.
156+
slot_usage:
122157
filenames:
123-
# TODO: Should be upgraded to be compatilbe with schema:distribution
124-
# https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#distributions
125-
title: Filenames
158+
title: Model Output Filenames
159+
attributes:
160+
simulation_type:
161+
title: Simulation Type
162+
description: "Whether this is a counterfactual (control/baseline) or perturbation simulation."
163+
range: SimulationType
164+
required: true
165+
spin_up_protocol:
166+
title: Spin-up Protocol
167+
description: "Description of the model spin-up process."
126168
range: string
169+
start_datetime:
170+
title: Start Date and Time
171+
description: "Start date and time of the simulation in UTC ISO-8601."
172+
range: datetime
173+
required: true
174+
end_datetime:
175+
title: End Date and Time
176+
description: "End date and time of the simulation in UTC ISO-8601."
177+
range: datetime
127178
required: true
179+
output_frequency:
180+
title: Output Frequency
181+
description: "Frequency of model output (e.g., 'hourly mean', 'daily mean')."
182+
range: string
183+
time_stepping_scheme:
184+
title: Time-stepping Scheme
185+
description: "Time-stepping method and time step used in the simulation."
186+
range: string
187+
alkalinity_perturbation_description:
188+
title: Alkalinity Perturbation Description
189+
description: >-
190+
Description of the alkalinity perturbation applied in the simulation. Required when simulation_type
191+
is "perturbation".
192+
range: string
193+
hardware_configuration:
194+
title: Hardware Configuration
195+
description: "Details about the computational hardware used for the simulation."
196+
range: HardwareConfiguration
197+
inlined: true
198+
model_output_variables:
199+
title: Key Model Output Variables
200+
description: "Checklist of variables included in the model simulation output."
201+
range: ModelOutputVariable
128202
multivalued: true
129-
# TODO: workaround since JSON Schema treats [] values as satisfying the "required" constraint
130-
# It might be nice to fix this in a linkml PR to have minItems=1 automatically set for required lists in the
131-
# JSONSchema generator
132-
minimum_cardinality: 1
203+
rules:
204+
- preconditions:
205+
slot_conditions:
206+
simulation_type:
207+
equals_string: "perturbation"
208+
postconditions:
209+
slot_conditions:
210+
alkalinity_perturbation_description:
211+
required: true
133212

134-
exact_mappings:
135-
- "schema:Dataset"
136-
- "dcat:Dataset"
213+
HardwareConfiguration:
214+
description: "Details about the computational hardware used to run a model simulation."
215+
attributes:
216+
machine:
217+
title: Machine
218+
description: "Name of the machine or cluster (e.g., 'Perlmutter')."
219+
range: string
220+
operating_system:
221+
title: Operating System
222+
description: "Operating system used (e.g., 'Linux')."
223+
range: string
224+
cpu_gpu_details:
225+
title: CPU/GPU Details
226+
description: "Details about CPU/GPU hardware or link to specifications."
227+
range: string
228+
memory:
229+
title: Memory
230+
description: "Memory available (e.g., '512 GB DDR4')."
231+
range: string
232+
storage:
233+
title: Storage
234+
description: "Storage available (e.g., '44 PB')."
235+
range: string
236+
parallelization:
237+
title: Parallelization
238+
description: "Parallelization details (e.g., '3 nodes, 108 ntasks per node')."
239+
range: string

src/oae_data_protocol/schema/enums.yaml

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,11 +224,75 @@ enums:
224224
description: "Nahcolite (NaHCO₃) used as an alkalinity source."
225225
other:
226226
description: "Enter a custom value in the field provided"
227+
GridType:
228+
description: "Type of grid in a multi-grid or nested model configuration"
229+
permissible_values:
230+
inner_grid:
231+
description: "Inner (nested, higher-resolution) grid"
232+
outer_grid:
233+
description: "Outer (coarser-resolution) grid"
234+
single_grid:
235+
description: "Single grid (no nesting)"
236+
237+
ModelComponentType:
238+
description: "Type of model component"
239+
permissible_values:
240+
physics:
241+
title: "Physics"
242+
description: "Physical model component (e.g., ocean circulation)"
243+
bgc_ecosystem:
244+
title: "BGC / Ecosystem"
245+
description: "Biogeochemical or ecosystem model component"
246+
sea_ice:
247+
title: "Sea Ice"
248+
description: "Sea Ice model component"
249+
atmosphere:
250+
title: "Atmosphere"
251+
description: "Atmosphere model component"
252+
other:
253+
description: "Other model component (e.g., sea ice, sediment, atmosphere)"
254+
227255
DataProductType:
228256
permissible_values:
229257
originally_collected_dataset:
230258
description: A dataset collected from a research cruise or laboratory experiment
231259
data_compilation_product:
232260
description: (e.g., SOCAT, GLODAP)
233261
derived_product:
234-
description: (e.g. gridded products, or model output).
262+
description: (e.g. gridded products, or model output).
263+
264+
SimulationType:
265+
description: "Type of model simulation dataset"
266+
permissible_values:
267+
counterfactual:
268+
description: "Control/baseline simulation without alkalinity perturbation"
269+
perturbation:
270+
description: "Simulation with alkalinity perturbation applied"
271+
272+
ModelOutputVariable:
273+
description: "Variables commonly included in model simulation output datasets"
274+
permissible_values:
275+
air_sea_co2_flux:
276+
title: "Air-sea CO2 flux"
277+
description: "Air-sea exchange of carbon dioxide"
278+
dissolved_inorganic_carbon:
279+
title: "Dissolved Inorganic Carbon"
280+
description: "Dissolved inorganic carbon (DIC)"
281+
total_alkalinity:
282+
title: "Total Alkalinity"
283+
description: "Total alkalinity (TA)"
284+
temperature:
285+
description: "Temperature"
286+
salinity:
287+
description: "Salinity"
288+
ph:
289+
title: "pH"
290+
description: "pH of seawater"
291+
phytoplankton:
292+
description: "Phytoplankton biomass or concentration"
293+
horizontal_velocity:
294+
title: "Horizontal velocity"
295+
description: "Horizontal velocity components (u, v)"
296+
vertical_velocity:
297+
title: "Vertical velocity"
298+
description: "Vertical velocity component (w)"

src/oae_data_protocol/schema/experiment.yaml

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,16 @@ default_prefix: oae
99

1010
classes:
1111
Experiment:
12+
# TODO: We may want to revert to set 'abstract: true' again once we set `include_range_class_descendents: true`
13+
# As of right now, setting Experiment to abstract means we don't get it in the JSON Schema at all and it breaks
14+
# downstream usage
15+
# abstract: true
1216
description: >-
13-
Experiment metadata applies to a specific study but remains consistent across datasets.
17+
Abstract base class for all experiment types. Contains fields common to both in-situ and model experiments.
1418
slots:
1519
- name
1620
- description
1721
- spatial_coverage
18-
- vertical_coverage
19-
- permits
2022
- project_id
2123
- experiment_id
2224
slot_usage:
@@ -40,8 +42,6 @@ classes:
4042
Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost
4143
longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]
4244
required: true
43-
vertical_coverage:
44-
description: Minimum and maximum depths of observations in meters.
4545
attributes:
4646
experiment_type:
4747
title: mCDR Experiment Type
@@ -60,15 +60,26 @@ classes:
6060
inlined_as_list: true
6161
required: true
6262
start_datetime:
63-
title: Start Date and Time
63+
title: Start Date and Time (UTC)
6464
description: "Start date and time of experiment in UTC ISO-8601"
6565
range: datetime
6666
required: true
6767
end_datetime:
68-
title: End Date and Time
68+
title: End Date and Time (UTC)
6969
description: "End date and time of experiment in UTC ISO-8601"
7070
range: datetime
71-
required: true
71+
InSituExperiment:
72+
is_a: Experiment
73+
description: >-
74+
Experiment metadata for in-situ studies (interventions, tracer studies, etc.). Contains fields specific to
75+
field-based experiments that don't apply to model experiments.
76+
slots:
77+
- vertical_coverage
78+
- permits
79+
slot_usage:
80+
vertical_coverage:
81+
description: Minimum and maximum depths of observations in meters.
82+
attributes:
7283
data_conflicts_and_unreported_data:
7384
title: Data Conflicts and Unreported Data
7485
description: >-
@@ -91,7 +102,7 @@ classes:
91102
as digitized laboratory notebooks, blogs, etc., may be linked here.
92103
range: string
93104
Intervention:
94-
is_a: Experiment
105+
is_a: InSituExperiment
95106
mixins:
96107
- InterventionDetails
97108
- DosingDetails
@@ -118,7 +129,7 @@ classes:
118129

119130

120131
Tracer:
121-
is_a: Experiment
132+
is_a: InSituExperiment
122133
description: "Additional metadata that applies to experiments where a tracer study was conducted"
123134
mixins:
124135
- TracerDetails

0 commit comments

Comments
 (0)