Skip to content

Commit 9a8b983

Browse files
clevinsonclaude
andauthored
feat: add underway dataset type, data_accessibility field, align box format with SOSO (#28)
* chore: regenerate stale validation.schema.json (restores missing CO2Equilibrator etc.) oae_data_protocol.validation.schema.json on main was stale relative to the source YAMLs — it was missing several classes that exist in src/oae_data_protocol/schema/ (e.g. CO2Equilibrator from instrument.yaml, ContinuousCO2Calibration, and related discriminated-union variants). Someone merged YAML changes without running make gen-validation-schema. Running the generator against the current YAMLs with the pinned linkml toolchain (linkml 1.8.7 / linkml-runtime 1.8.3 per poetry.lock) produces this diff: adds 1229 lines of legitimate missing content, no removals. Also bumps the generation timestamp in datamodel/oae_data_protocol.py by 1 line as a side effect. No schema meaning changes — main's oae_data_protocol.schema.json is already up to date, so this only touches validation.schema.json and the datamodel timestamp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add underway dataset type, data_accessibility field, align box format with SOSO Three issues from oae-data-commons: #97 — Add 'underway' to DatasetType enum, for data collected continuously from a moving platform (e.g., ship underway system sampling surface seawater during transit). #99 — Add new DataAccessibility enum (open_access, conditional_access, scheduled_access) and a required data_accessibility slot on the base Dataset class. Field-level description enumerates all three options so the form can surface them via a single tooltip; per-value descriptions remain on the enum for downstream consumers. #88 — Align the GeoShape box format with science-on-schema.org. SOSO specifies the box string as two space-separated corner points — the southwest (lower-left) corner followed by the northeast (upper-right) corner — with each point written as `<latitude> <longitude>` in decimal degrees: "<minLat> <minLon> <maxLat> <maxLon>". The protocol's spatial_coverage and GeoShape.box descriptions previously documented a longitude-first ordering, which contradicted SOSO and was the root cause of the lat/lon flip reported in jstorylong's bug. This commit rewrites the box format documentation in core.yaml, experiment.yaml, and model.yaml; updates example box strings in src/docs/files/metadata-format.md to lat-first; and cites the SOSO guide URL. Doc audit (per CLAUDE.md): - src/docs/files/vocabularies.md — added underway and DataAccessibility rows - src/docs/files/Datasets/index.md — added data_accessibility to key fields table - src/docs/files/metadata-format.md — flipped box example to lat-first Refs: submarine-mrv/oae-data-commons#97 #99 #88 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Apply suggestion from @clevinson * fix: correct SOSO anchor link to #spatial-coverage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: simplify spatial_coverage descriptions, keep SOSO detail on GeoShape.box only The serialization format (lat-first, SOSO spec, example, link) now lives only on the GeoShape.box attribute description. The spatial_coverage slot descriptions on Project, Experiment, and ModelGrid just describe what the field represents. The nested GeoShape.box description propagates through the schema and is visible to both UI tooltips and LLM consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c23ec11 commit 9a8b983

12 files changed

Lines changed: 1216 additions & 129 deletions

File tree

project/jsonschema/oae_data_protocol.schema.json

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1851,6 +1851,16 @@
18511851
"title": "ContinuousTAVariable",
18521852
"type": "object"
18531853
},
1854+
"DataAccessibility": {
1855+
"description": "Level of access to a dataset.",
1856+
"enum": [
1857+
"open_access",
1858+
"conditional_access",
1859+
"scheduled_access"
1860+
],
1861+
"title": "DataAccessibility",
1862+
"type": "string"
1863+
},
18541864
"DataProductType": {
18551865
"description": "",
18561866
"enum": [
@@ -1870,6 +1880,11 @@
18701880
"title": "Author List (for citation)",
18711881
"type": "string"
18721882
},
1883+
"data_accessibility": {
1884+
"$ref": "#/$defs/DataAccessibility",
1885+
"description": "Level of access to this dataset. Open Access data are freely available without restriction. Conditional Access data are available upon request, subject to review. Scheduled Access data will become openly available after a specified date.",
1886+
"title": "Data Accessibility"
1887+
},
18731888
"data_submitter": {
18741889
"$ref": "#/$defs/Person",
18751890
"title": "Data Submitter"
@@ -1930,7 +1945,8 @@
19301945
"experiment_id",
19311946
"filenames",
19321947
"dataset_type",
1933-
"data_submitter"
1948+
"data_submitter",
1949+
"data_accessibility"
19341950
],
19351951
"title": "Dataset",
19361952
"type": "object"
@@ -1957,6 +1973,7 @@
19571973
"model_output",
19581974
"socioeconomic",
19591975
"net_tow",
1976+
"underway",
19601977
"other"
19611978
],
19621979
"title": "DatasetType",
@@ -3199,7 +3216,7 @@
31993216
},
32003217
"spatial_coverage": {
32013218
"$ref": "#/$defs/SpatialCoverage",
3202-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
3219+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
32033220
"title": "Spatial Coverage"
32043221
},
32053222
"start_datetime": {
@@ -3261,7 +3278,7 @@
32613278
},
32623279
"spatial_coverage": {
32633280
"$ref": "#/$defs/SpatialCoverage",
3264-
"description": "Latitude/longitude bounds of project site (e.g., boundary domain of observations or relevant activities) provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
3281+
"description": "Latitude/longitude bounds of project site (e.g., boundary domain of observations or relevant activities), expressed as a schema.org GeoShape bounding box.",
32653282
"title": "Spatial Coverage"
32663283
},
32673284
"temporal_coverage": {
@@ -3320,6 +3337,11 @@
33203337
"null"
33213338
]
33223339
},
3340+
"data_accessibility": {
3341+
"$ref": "#/$defs/DataAccessibility",
3342+
"description": "Level of access to this dataset. Open Access data are freely available without restriction. Conditional Access data are available upon request, subject to review. Scheduled Access data will become openly available after a specified date.",
3343+
"title": "Data Accessibility"
3344+
},
33233345
"data_product_type": {
33243346
"$ref": "#/$defs/DataProductType",
33253347
"description": "\"Controlled vocabulary\" One of the three choices: (a) Originally collected dataset (e.g., a dataset collected from a research cruise or laboratory experiment), (b) Data compilation product (e.g., SOCAT, GLODAP), or (c) Derived product (e.g., gridded products, or model output).",
@@ -3459,7 +3481,8 @@
34593481
"experiment_id",
34603482
"filenames",
34613483
"dataset_type",
3462-
"data_submitter"
3484+
"data_submitter",
3485+
"data_accessibility"
34633486
],
34643487
"title": "FieldDataset",
34653488
"type": "object"
@@ -3548,7 +3571,7 @@
35483571
"description": "The geographic shape of a place. A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points. (imported from schema.org)",
35493572
"properties": {
35503573
"box": {
3551-
"description": "A box defined by two latitude-longitude points, southwest and northeast.",
3574+
"description": "A bounding box defined by two corner points \u2014 the southwest (lower-left) corner followed by the northeast (upper-right) corner. Per science-on-schema.org, each point is written as `<latitude> <longitude>` in decimal degrees (WGS 84), with all four values space-separated: `\"<minLat> <minLon> <maxLat> <maxLon>\"`. Example: `\"39.3280 120.1633 40.445 123.7878\"`. See https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#spatial-coverage",
35523575
"type": "string"
35533576
},
35543577
"line": {
@@ -3867,7 +3890,7 @@
38673890
},
38683891
"spatial_coverage": {
38693892
"$ref": "#/$defs/SpatialCoverage",
3870-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
3893+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
38713894
"title": "Spatial Coverage"
38723895
},
38733896
"start_datetime": {
@@ -4096,7 +4119,7 @@
40964119
},
40974120
"spatial_coverage": {
40984121
"$ref": "#/$defs/SpatialCoverage",
4099-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
4122+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
41004123
"title": "Spatial Coverage"
41014124
},
41024125
"start_datetime": {
@@ -4358,7 +4381,7 @@
43584381
},
43594382
"spatial_coverage": {
43604383
"$ref": "#/$defs/SpatialCoverage",
4361-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
4384+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
43624385
"title": "Spatial Coverage"
43634386
},
43644387
"start_datetime": {
@@ -4568,7 +4591,7 @@
45684591
},
45694592
"spatial_coverage": {
45704593
"$ref": "#/$defs/SpatialCoverage",
4571-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
4594+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
45724595
"title": "Spatial Coverage"
45734596
},
45744597
"spin_up_protocol": {
@@ -4743,7 +4766,7 @@
47434766
},
47444767
"spatial_coverage": {
47454768
"$ref": "#/$defs/SpatialCoverage",
4746-
"description": "Bounding box for this grid, provided as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude.",
4769+
"description": "Bounding box for this grid, expressed as a schema.org GeoShape bounding box.",
47474770
"title": "Spatial Coverage"
47484771
},
47494772
"vertical_coordinate_type": {
@@ -4870,6 +4893,11 @@
48704893
"title": "Author List (for citation)",
48714894
"type": "string"
48724895
},
4896+
"data_accessibility": {
4897+
"$ref": "#/$defs/DataAccessibility",
4898+
"description": "Level of access to this dataset. Open Access data are freely available without restriction. Conditional Access data are available upon request, subject to review. Scheduled Access data will become openly available after a specified date.",
4899+
"title": "Data Accessibility"
4900+
},
48734901
"data_submitter": {
48744902
"$ref": "#/$defs/Person",
48754903
"title": "Data Submitter"
@@ -4980,7 +5008,8 @@
49805008
"experiment_id",
49815009
"filenames",
49825010
"dataset_type",
4983-
"data_submitter"
5011+
"data_submitter",
5012+
"data_accessibility"
49845013
],
49855014
"then": {
49865015
"properties": {
@@ -5620,7 +5649,7 @@
56205649
},
56215650
"spatial_coverage": {
56225651
"$ref": "#/$defs/SpatialCoverage",
5623-
"description": "Latitude/longitude bounds of project site (e.g., boundary domain of observations or relevant activities) provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
5652+
"description": "Latitude/longitude bounds of project site (e.g., boundary domain of observations or relevant activities), expressed as a schema.org GeoShape bounding box.",
56245653
"title": "Spatial Coverage"
56255654
},
56265655
"temporal_coverage": {
@@ -6082,7 +6111,7 @@
60826111
},
60836112
"spatial_coverage": {
60846113
"$ref": "#/$defs/SpatialCoverage",
6085-
"description": "Latitude/longitude bounds of observed data in experiment, provided in decimal degrees as westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude. [S, W, N, E]",
6114+
"description": "Latitude/longitude bounds of observed data in experiment, expressed as a schema.org GeoShape bounding box.",
60866115
"title": "Spatial Coverage"
60876116
},
60886117
"start_datetime": {

0 commit comments

Comments
 (0)