Skip to content

Commit a3f42c4

Browse files
Merge branch 'improved-terms'
# Conflicts: # standard/template/sections/clause_9_zarr_encoding_core.adoc
2 parents 413a25a + 4f68cb6 commit a3f42c4

10 files changed

Lines changed: 304 additions & 404 deletions

standard/template/geozarr-spec.adoc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,23 +45,23 @@ include::sections/clause_6_informative_text.adoc[]
4545

4646
include::sections/clause_7_unified_data_model.adoc[]
4747

48-
include::sections/clause_8_conformance.adoc[]
48+
// Discarded: include::sections/clause_8_conformance.adoc[]
4949

5050
include::sections/clause_9_zarr_encoding.adoc[]
5151

52-
include::sections/clause_10_geotiff_encoding.adoc[]
52+
// include::sections/clause_10_geotiff_encoding.adoc[]
5353

5454
////
5555
add or remove annexes after "A" as necessary
5656
////
57-
include::sections/annex-a.adoc[]
57+
//include::sections/annex-a.adoc[]
5858

59-
include::sections/annex-n.adoc[]
59+
// include::sections/annex-n.adoc[]
6060

6161
////
6262
Revision History should be the last annex before the Bibliography
6363
Bibliography should be the last annex
6464
////
65-
include::sections/annex-history.adoc[]
65+
// include::sections/annex-history.adoc[]
6666

67-
include::sections/annex-bibliography.adoc[]
67+
//include::sections/annex-bibliography.adoc[]
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
.Preface
22

3-
The GeoZarr Unified Data Model and Encoding Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. It integrates foundational specifications such as the Unidata Common Data Model (CDM), the CF Conventions, and selected OGC and community standards to enable semantic, structural, and operational interoperability across Earth observation platforms and geospatial ecosystems.
4-
5-
This Standard introduces a unified model that harmonises metadata structures, array-based data representations, coordinate referencing, and multiscale tiling semantics. It provides a coherent framework that facilitates encoding into Zarr v2 and v3, supporting scalable, cloud-native workflows.
6-
7-
The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
3+
The GeoZarr Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
84

95
This Standard has been developed in collaboration with contributors from Earth observation, climate science, geospatial analysis, and cloud-native geodata infrastructure communities. Future work may extend this model to additional storage formats, API services, and semantic layers.
106

117
[abstract]
128
== Abstract
139

14-
The GeoZarr Unified Data Model and Encoding Standard specifies a conceptual and implementation framework for representing multidimensional, geospatial datasets using the Zarr format. This Standard builds upon the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, and introduces interoperable constructs for tiling, georeferencing, and metadata integration.
10+
Zarr provides efficient chunked storage for n-dimensional arrays but do not provide with the semantic constructs required for geospatial and scientific data workflows.
1511

16-
The model defines core elements—dimensions, coordinate variables, data variables, attributes—and optional extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. Encoding guidance is provided for Zarr Version 2 and Zarr Version 3, including chunking, group hierarchy, and metadata conventions.
12+
GeoZarr defines an abstract data model and a set of conventions for representing geospatial and scientific datasets in the Zarr format:
1713

18-
GeoZarr aims to bridge scientific and geospatial communities by enabling round-trip transformations with formats such as NetCDF and GeoTIFF, and supporting compatibility with tools in the scientific Python and geospatial ecosystems. This Standard enables scalable, standards-compliant, and semantically rich data structures for cloud-native Earth observation applications.
14+
- GeoZarr bridges the Unidata CDM and the Zarr format. GeoZarr establishes the link between the Unidata Common Data Model (CDM) and the Zarr format by defining how the semantic constructs of the CDM are represented within Zarr’s storage model.
15+
- Supports community metadata standards like CF, GeoTIFF, and GDAL.
16+
- Extends CDM for geospatial through multiscale overviews and affine transformations.
17+
18+
By providing a standardized framework for geospatial semantics, GeoZarr enables scientific and geospatial applications to fully utilize cloud-native storage architectures while maintaining the rich metadata and coordinate referencing required for Earth observation workflows. The result is a modern, scalable approach to storing and accessing geospatial data that meets the needs of both data providers and consumers.
1919

2020
== Submitters
2121

@@ -29,4 +29,4 @@ All questions regarding this submission should be directed to the editor or the
2929
|Brianna Pagán _(editor)_ | DevSeed
3030
|Ryan Abernathey| EarthMover
3131
| TBD | TBD
32-
|===
32+
|===
Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,31 @@
11
== Scope
22

3-
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards.
3+
The GeoZarr Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic data model, the specification of its encoding into Zarr Version 2 and Version 3, and a set of extensions to support affine transformations and overviews.
44

5-
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis.
5+
These capabilities are necessary for geospatial data because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics.
66

7-
Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets.
7+
=== Why GeoZarr Exists
8+
9+
Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require:
10+
11+
* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions
12+
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata
13+
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes
14+
* *Variable Relationships:* No formal distinction between coordinate variables and data variables
15+
16+
These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages.
17+
18+
=== Relationship to Zarr Core Concepts
19+
20+
GeoZarr builds upon Zarr's foundational concepts of <<term-store,stores>> and <<term-hierarchy, hierarchies>>. A Zarr store provides the storage and retrieval interface (e.g., filesystem, cloud object storage), while a hierarchy defines the logical tree structure of groups and arrays within that store. GeoZarr specifies how to organize and structure hierarchies to support geospatial semantics, without modifying the underlying store interface.
21+
22+
=== Use Cases and Applications
23+
24+
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis.
25+
26+
Typical use cases include:
27+
* Storage and processing of raster and gridded data
28+
* Management of data cubes with temporal or vertical dimensions
29+
* Integration with catalogue systems through standardized metadata
30+
* Multi-resolution tiling for efficient visualization and analysis
31+
* Cloud-optimized access to large geospatial datasets

standard/template/sections/clause_2_conformance.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
== Conformance
22

3+
> WARNING: This section should be ignored and requirements classes should be designed and summarized here once the specification is completed.
4+
35
The GeoZarr Unified Data Model is structured around a modular set of requirements classes. These classes define the conformance criteria for datasets and implementations adopting the GeoZarr specification. Each class provides a distinct set of structural or semantic expectations, facilitating interoperability across a broad spectrum of geospatial and scientific use cases.
46

57
The *Core* requirements class defines the minimal compliance necessary to claim conformance with the GeoZarr Unified Data Model. It is intentionally open and permissive, supporting incremental adoption and broad compatibility with existing Zarr tools and data models based on the Unidata Common Data Model (CDM).

standard/template/sections/clause_4_terms_and_definitions.adoc

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,34 @@
22

33
=== Terms and definitions
44

5+
GeoZarr specification inherits the terms from the following sources:
6+
7+
* https://docs.unidata.ucar.edu/netcdf-java/5.2/userguide/common_data_model_overview.html#data-access-layer-object-model[Unidata Common Data Model]
8+
9+
* https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[Zarr concepts and terminology].
10+
11+
12+
==== affine transformation
13+
14+
An affine transformation is a geometric mapping that preserves points, straight lines, and parallelism. It combines linear transformations (such as rotation, scaling, reflection, or shear) with translation.
15+
16+
517
==== array
618

719
A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.
820

921
==== chunk
1022

11-
A sub-array representing a partition of a larger array, used to optimise data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
23+
A sub-array representing a partition of a larger array, used to optimize data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
1224

1325
==== coordinate variable
1426

1527
A one-dimensional array whose values define the coordinate system for a dimension of one or more data variables. Typical examples include latitude, longitude, time, or vertical levels.
1628

29+
==== data model
30+
31+
A data model is an *abstract*, conceptual framework that defines how data is structured, organized, and interpreted, independent of any particular storage medium or implementation. In contrast, a file format represents a concrete realization of this model, defining how the data is stored on disk.
32+
1733
==== data variable
1834

1935
An array containing the primary geospatial or scientific measurements of interest (e.g., temperature, reflectance). Data variables are defined over one or more dimensions and associated with attributes.
@@ -22,29 +38,32 @@ An array containing the primary geospatial or scientific measurements of interes
2238

2339
An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`).
2440

25-
==== group
41+
==== dataset
2642

27-
A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
43+
*Avoid using:* this term is overloaded and avoided in this document. A dataset usually represent a self-contained group of variables within a hierarchical data structure. They often share one or more dimensions and represent the unit that can be opened by a data access library (see <<variable-group,variable group>>)
2844

2945
==== metadata
3046

3147
Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.
3248

33-
==== multiscale dataset
49+
==== overview
50+
51+
A downscaled representation of a variable that facilitates rapid data display and efficient zooming. Overviews provide lower-resolution versions of the original data, enabling quick visualization and access without reading the full-resolution array. Multiple overview levels may be generated to support progressive rendering across different scales.
52+
53+
==== store
3454

35-
A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
55+
A system that provides storage and retrieval operations for Zarr hierarchies, as defined in the https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#stores[Zarr core specification]. A store implements the abstract store interface and can be backed by various storage technologies such as filesystems, cloud object storage, or databases. GeoZarr hierarchies are stored within and accessed through Zarr stores.
3656

3757
==== tile matrix set
3858

3959
A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.
4060

41-
==== transform
61+
[[variable-group]]
62+
==== variable group
4263

43-
An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.
64+
A variable group is a container that includes a coherent collection of variables sharing the same dimensional structure and coordinate system ( and may contain additional variables or subgroups). It is conceptually equivalent to an xarray Dataset..
4465

45-
==== unified data model (UDM)
4666

47-
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.
4867

4968
=== Abbreviated Terms
5069

Lines changed: 6 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,14 @@
11
[[overview]]
22
== Overview
33

4-
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing multidimensional geospatial data using the Zarr format. Developed under the guidance of the OGC GeoZarr Standards Working Group (SWG), the Standard establishes conventions for encoding scientific and Earth observation datasets in a way that promotes scalability, interoperability, and compatibility with cloud-native infrastructure.
4+
The **GeoZarr Standard** defines an **abstract data model** and a set of **conventions** for representing and describing geospatial and scientific datasets using the **Zarr** format.
55

6-
GeoZarr is built on widely adopted community standards, including the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions. It introduces additional extensions and structural constructs to support multi-resolution tiling, geospatial referencing, and catalogue-enabled metadata integration (e.g., STAC).
6+
Zarr provides efficient, chunked storage for n-dimensional arrays but does not include the semantic constructs required for geospatial and scientific data workflows. The **Unidata Common Data Model (CDM)** addresses this gap by introducing essential concepts that structure information through **variables**, **groups**, **coordinates**, and **metadata**. This abstract data model provides the semantic framework that enables structured interpretation of array-based data on top of Zarr’s storage foundation.
77

8-
This Standard provides both:
8+
The **primary objective** of GeoZarr is to specify how the **CDM** is encoded within Zarr. GeoZarr provides normative rules for encoding these CDM concepts in Zarr and thereby standardises the encoding practices already adopted by CDM-compatible libraries such as **xarray** and **nczarr**, promoting consistent interpretation and interoperability across tools and platforms.
99

10-
* **Core requirements**, which define minimal compliance to represent array-based datasets using CDM constructs in Zarr, supporting open and permissive adoption across use cases.
11-
* **Modular extension classes**, which define additional capabilities such as time series support, affine geotransform referencing, multi-resolution overviews, and projection coordinates, in line with OGC and community practices.
10+
By defining an **abstract model** based on the **CDM** and a corresponding **encoding for Zarr**, GeoZarr establishes an explicit relationship between **the conceptual structure of the data** and **its physical storage representation**. Zarr defines how data are stored and accessed as chunked, hierarchical arrays, while GeoZarr specifies how this stored structure represents the scientific and geospatial meaning of the dataset..
1211

13-
These modular components enable GeoZarr to serve a wide range of applications—from basic EO data storage to high-performance, cloud-native visualisation and analytics workflows.
14-
15-
=== Encodings
16-
17-
GeoZarr supports encoding in both Zarr Version 2 and Zarr Version 3. Each version defines how arrays, groups, and metadata are stored within a directory-based structure. All metadata is encoded in JSON-compatible formats, ensuring both human readability and machine interoperability.
18-
19-
Encoding guidelines include:
20-
21-
* Hierarchical grouping of datasets via Zarr groups.
22-
* Dimension indexing and binding via dimension metadata.
23-
* Attribute-based metadata compliant with CF conventions.
24-
* Multi-resolution overviews aligned with OGC Tile Matrix Sets.
25-
* Optional integration of STAC metadata for discovery and cataloguing.
26-
27-
JSON is the primary format for metadata, attributes, and structural declarations. Implementations are encouraged to support standardised naming conventions, EPSG code references, and structured metadata to facilitate search, validation, and transformation across platforms.
28-
29-
GeoZarr does not prescribe a single interface for data access. Instead, it enables **serverless and cloud-native** data access strategies by aligning its model with chunked, parallelisable storage patterns that are optimised for use in object stores and analytical environments.
12+
As a **secondary objective**, GeoZarr extends the **CDM base layer** with additional capabilities required for geospatial and cloud-native applications. These extensions include **multiscale overviews**, which enable the representation of data at multiple levels of detail, and **affine transformations**, which define the spatial relationship between data coordinates and real-world locations. All extensions remain fully aligned with the CDM framework.
3013

14+
The **CDM** base layer also provides a **generic framework** capable of hosting metadata from a wide range of community standards. GeoZarr encourages the use of the **Climate and Forecast (CF) Conventions**, which are themselves defined around the CDM model, without imposing them as mandatory. This flexibility also supports metadata from other domain-specific standards such as **GeoTIFF**, **GDAL**, and similar geospatial conventions.

0 commit comments

Comments
 (0)