You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The GeoZarr Unified Data Model and Encoding Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. It integrates foundational specifications such as the Unidata Common Data Model (CDM), the CF Conventions, and selected OGC and community standards to enable semantic, structural, and operational interoperability across Earth observation platforms and geospatial ecosystems.
4
-
5
-
This Standard introduces a unified model that harmonises metadata structures, array-based data representations, coordinate referencing, and multiscale tiling semantics. It provides a coherent framework that facilitates encoding into Zarr v2 and v3, supporting scalable, cloud-native workflows.
6
-
7
-
The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
3
+
The GeoZarr Standard defines a layered, standards-based framework for representing and encoding geospatial and scientific datasets in the Zarr format. The purpose of this document is to provide implementation guidance and normative structure for consistent, interoperable adoption of GeoZarr across tools, platforms, and services. This work extends prior standardisation efforts within the OGC, including OGC API – Tiles, the Tile Matrix Set Standard, and EO metadata conventions, and anticipates integration with catalogue systems such as STAC.
8
4
9
5
This Standard has been developed in collaboration with contributors from Earth observation, climate science, geospatial analysis, and cloud-native geodata infrastructure communities. Future work may extend this model to additional storage formats, API services, and semantic layers.
10
6
11
7
[abstract]
12
8
== Abstract
13
9
14
-
The GeoZarr Unified Data Model and Encoding Standard specifies a conceptual and implementation framework for representing multidimensional, geospatial datasets using the Zarr format. This Standard builds upon the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, and introduces interoperable constructs for tiling, georeferencing, and metadata integration.
10
+
Zarr provides efficient chunked storage for n-dimensional arrays but do not provide with the semantic constructs required for geospatial and scientific data workflows.
15
11
16
-
The model defines core elements—dimensions, coordinate variables, data variables, attributes—and optional extensions for multi-resolution overviews, affine geotransforms, and STAC metadata. Encoding guidance is provided for Zarr Version 2 and Zarr Version 3, including chunking, group hierarchy, and metadata conventions.
12
+
GeoZarr defines an abstract data model and a set of conventions for representing geospatial and scientific datasets in the Zarr format:
17
13
18
-
GeoZarr aims to bridge scientific and geospatial communities by enabling round-trip transformations with formats such as NetCDF and GeoTIFF, and supporting compatibility with tools in the scientific Python and geospatial ecosystems. This Standard enables scalable, standards-compliant, and semantically rich data structures for cloud-native Earth observation applications.
14
+
- GeoZarr bridges the Unidata CDM and the Zarr format. GeoZarr establishes the link between the Unidata Common Data Model (CDM) and the Zarr format by defining how the semantic constructs of the CDM are represented within Zarr’s storage model.
15
+
- Supports community metadata standards like CF, GeoTIFF, and GDAL.
16
+
- Extends CDM for geospatial through multiscale overviews and affine transformations.
17
+
18
+
By providing a standardized framework for geospatial semantics, GeoZarr enables scientific and geospatial applications to fully utilize cloud-native storage architectures while maintaining the rich metadata and coordinate referencing required for Earth observation workflows. The result is a modern, scalable approach to storing and accessing geospatial data that meets the needs of both data providers and consumers.
19
19
20
20
== Submitters
21
21
@@ -29,4 +29,4 @@ All questions regarding this submission should be directed to the editor or the
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic unified data model, the specification of its encoding into Zarr Version 2 and Version 3, and the establishment of extension points to support interoperability with external metadata and tiling standards.
3
+
The GeoZarr Standard defines a conceptual and implementation framework for representing and encoding geospatial and scientific datasets using the Zarr format. The scope of this Standard includes the definition of a format-agnostic data model, the specification of its encoding into Zarr Version 2 and Version 3, and a set of extensions to support affine transformations and overviews.
4
4
5
-
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models, such as the Unidata Common Data Model (CDM) and the Climate and Forecast (CF) Conventions, with operational encoding formats suitable for cloud-native storage and analysis.
5
+
These capabilities are necessary for geospatial data because Zarr does not provide semantic constructs for geospatial data interpretation. Applications need to understand not just array shapes and values, but coordinate meanings, projection parameters, and scientific metadata. GeoZarr fills this gap without compromising Zarr's performance characteristics.
6
6
7
-
Typical use cases include the storage, transformation, discovery, and processing of raster and gridded data, data cubes with temporal or vertical dimensions, and catalogue-enabled datasets integrated with metadata standards such as STAC and OGC Tile Matrix Sets.
7
+
=== Why GeoZarr Exists
8
+
9
+
Zarr, by design, is a low-level container for storing n-dimensional arrays and metadata. While this simplicity is a strength for performance and interoperability, it means Zarr lacks higher-level concepts that geospatial applications require:
10
+
11
+
* *Coordinate Systems:* No native way to associate spatial or temporal meaning with array dimensions
12
+
* *Grid Mappings:* No standard mechanism for projection and coordinate reference system metadata
13
+
* *Semantic Metadata:* No conventions for units, standard names, or scientific attributes
14
+
* *Variable Relationships:* No formal distinction between coordinate variables and data variables
15
+
16
+
These concepts are essential for geospatial workflows but must be layered on top of Zarr's array storage. GeoZarr provides this semantic layer through proven standards (Common Data Model and CF conventions) while preserving Zarr's cloud-native advantages.
17
+
18
+
=== Relationship to Zarr Core Concepts
19
+
20
+
GeoZarr builds upon Zarr's foundational concepts of <<term-store,stores>> and <<term-hierarchy, hierarchies>>. A Zarr store provides the storage and retrieval interface (e.g., filesystem, cloud object storage), while a hierarchy defines the logical tree structure of groups and arrays within that store. GeoZarr specifies how to organize and structure hierarchies to support geospatial semantics, without modifying the underlying store interface.
21
+
22
+
=== Use Cases and Applications
23
+
24
+
This Standard addresses the needs of Earth observation, environmental monitoring, and geospatial analysis applications that require efficient, scalable access to multidimensional datasets. It enables the harmonisation of existing data models with operational encoding formats suitable for cloud-native storage and analysis.
25
+
26
+
Typical use cases include:
27
+
* Storage and processing of raster and gridded data
28
+
* Management of data cubes with temporal or vertical dimensions
29
+
* Integration with catalogue systems through standardized metadata
30
+
* Multi-resolution tiling for efficient visualization and analysis
31
+
* Cloud-optimized access to large geospatial datasets
Copy file name to clipboardExpand all lines: standard/template/sections/clause_2_conformance.adoc
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
== Conformance
2
2
3
+
> WARNING: This section should be ignored and requirements classes should be designed and summarized here once the specification is completed.
4
+
3
5
The GeoZarr Unified Data Model is structured around a modular set of requirements classes. These classes define the conformance criteria for datasets and implementations adopting the GeoZarr specification. Each class provides a distinct set of structural or semantic expectations, facilitating interoperability across a broad spectrum of geospatial and scientific use cases.
4
6
5
7
The *Core* requirements class defines the minimal compliance necessary to claim conformance with the GeoZarr Unified Data Model. It is intentionally open and permissive, supporting incremental adoption and broad compatibility with existing Zarr tools and data models based on the Unidata Common Data Model (CDM).
Copy file name to clipboardExpand all lines: standard/template/sections/clause_4_terms_and_definitions.adoc
+28-9Lines changed: 28 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,18 +2,34 @@
2
2
3
3
=== Terms and definitions
4
4
5
+
GeoZarr specification inherits the terms from the following sources:
6
+
7
+
* https://docs.unidata.ucar.edu/netcdf-java/5.2/userguide/common_data_model_overview.html#data-access-layer-object-model[Unidata Common Data Model]
8
+
9
+
* https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#concepts-and-terminology[Zarr concepts and terminology].
10
+
11
+
12
+
==== affine transformation
13
+
14
+
An affine transformation is a geometric mapping that preserves points, straight lines, and parallelism. It combines linear transformations (such as rotation, scaling, reflection, or shear) with translation.
15
+
16
+
5
17
==== array
6
18
7
19
A multidimensional, regularly spaced collection of values (e.g., raster data or gridded measurements), typically indexed by dimensions such as time, latitude, longitude, or spectral band.
8
20
9
21
==== chunk
10
22
11
-
A sub-array representing a partition of a larger array, used to optimise data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
23
+
A sub-array representing a partition of a larger array, used to optimize data access and storage. In Zarr, data is stored and accessed as a collection of independently compressed chunks.
12
24
13
25
==== coordinate variable
14
26
15
27
A one-dimensional array whose values define the coordinate system for a dimension of one or more data variables. Typical examples include latitude, longitude, time, or vertical levels.
16
28
29
+
==== data model
30
+
31
+
A data model is an *abstract*, conceptual framework that defines how data is structured, organized, and interpreted, independent of any particular storage medium or implementation. In contrast, a file format represents a concrete realization of this model, defining how the data is stored on disk.
32
+
17
33
==== data variable
18
34
19
35
An array containing the primary geospatial or scientific measurements of interest (e.g., temperature, reflectance). Data variables are defined over one or more dimensions and associated with attributes.
@@ -22,29 +38,32 @@ An array containing the primary geospatial or scientific measurements of interes
22
38
23
39
An index axis along which arrays are organised. Dimensions provide a naming and ordering scheme for accessing data in multidimensional arrays (e.g., `time`, `x`, `y`, `band`).
24
40
25
-
==== group
41
+
==== dataset
26
42
27
-
A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
43
+
*Avoid using:* this term is overloaded and avoided in this document. A dataset usually represent a self-contained group of variables within a hierarchical data structure. They often share one or more dimensions and represent the unit that can be opened by a data access library (see <<variable-group,variable group>>)
28
44
29
45
==== metadata
30
46
31
47
Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.
32
48
33
-
==== multiscale dataset
49
+
==== overview
50
+
51
+
A downscaled representation of a variable that facilitates rapid data display and efficient zooming. Overviews provide lower-resolution versions of the original data, enabling quick visualization and access without reading the full-resolution array. Multiple overview levels may be generated to support progressive rendering across different scales.
52
+
53
+
==== store
34
54
35
-
A dataset that includes multiple representations of the same data variable at varying spatial resolutions. Each resolution level is associated with a tile matrix from an OGC Tile Matrix Set.
55
+
A system that provides storage and retrieval operations for Zarr hierarchies, as defined in the https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#stores[Zarr core specification]. A store implements the abstract store interface and can be backed by various storage technologies such as filesystems, cloud object storage, or databases. GeoZarr hierarchies are stored within and accessed through Zarr stores.
36
56
37
57
==== tile matrix set
38
58
39
59
A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.
40
60
41
-
==== transform
61
+
[[variable-group]]
62
+
==== variable group
42
63
43
-
An affine transformation used to convert between grid coordinates and geospatial coordinates, typically defined using the GDAL GeoTransform convention.
64
+
A variable group is a container that includes a coherent collection of variables sharing the same dimensional structure and coordinate system ( and may contain additional variables or subgroups). It is conceptually equivalent to an xarray Dataset..
44
65
45
-
==== unified data model (UDM)
46
66
47
-
A conceptual model that defines how to structure geospatial data in Zarr using CDM-based constructs, including support for coordinate referencing, metadata integration, and multiscale representations.
The GeoZarr Unified Data Model and Encoding Standard defines a conceptual and implementation framework for representing multidimensional geospatial data using the Zarr format. Developed under the guidance of the OGC GeoZarr Standards Working Group (SWG), the Standard establishes conventions for encoding scientific and Earth observation datasets in a way that promotes scalability, interoperability, and compatibility with cloud-native infrastructure.
4
+
The **GeoZarr Standard** defines an **abstract data model** and a set of **conventions** for representing and describing geospatial and scientific datasets using the **Zarr** format.
5
5
6
-
GeoZarr is built on widely adopted community standards, including the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions. It introduces additional extensions and structural constructs to support multi-resolution tiling, geospatial referencing, and catalogue-enabled metadata integration (e.g., STAC).
6
+
Zarr provides efficient, chunked storage for n-dimensional arrays but does not include the semantic constructs required for geospatial and scientific data workflows. The **Unidata Common Data Model (CDM)** addresses this gap by introducing essential concepts that structure information through **variables**, **groups**, **coordinates**, and **metadata**. This abstract data model provides the semantic framework that enables structured interpretation of array-based data on top of Zarr’s storage foundation.
7
7
8
-
This Standard provides both:
8
+
The **primary objective** of GeoZarr is to specify how the **CDM** is encoded within Zarr. GeoZarr provides normative rules for encoding these CDM concepts in Zarr and thereby standardises the encoding practices already adopted by CDM-compatible libraries such as **xarray** and **nczarr**, promoting consistent interpretation and interoperability across tools and platforms.
9
9
10
-
* **Core requirements**, which define minimal compliance to represent array-based datasets using CDM constructs in Zarr, supporting open and permissive adoption across use cases.
11
-
* **Modular extension classes**, which define additional capabilities such as time series support, affine geotransform referencing, multi-resolution overviews, and projection coordinates, in line with OGC and community practices.
10
+
By defining an **abstract model** based on the **CDM** and a corresponding **encoding for Zarr**, GeoZarr establishes an explicit relationship between **the conceptual structure of the data** and **its physical storage representation**. Zarr defines how data are stored and accessed as chunked, hierarchical arrays, while GeoZarr specifies how this stored structure represents the scientific and geospatial meaning of the dataset..
12
11
13
-
These modular components enable GeoZarr to serve a wide range of applications—from basic EO data storage to high-performance, cloud-native visualisation and analytics workflows.
14
-
15
-
=== Encodings
16
-
17
-
GeoZarr supports encoding in both Zarr Version 2 and Zarr Version 3. Each version defines how arrays, groups, and metadata are stored within a directory-based structure. All metadata is encoded in JSON-compatible formats, ensuring both human readability and machine interoperability.
18
-
19
-
Encoding guidelines include:
20
-
21
-
* Hierarchical grouping of datasets via Zarr groups.
22
-
* Dimension indexing and binding via dimension metadata.
23
-
* Attribute-based metadata compliant with CF conventions.
24
-
* Multi-resolution overviews aligned with OGC Tile Matrix Sets.
25
-
* Optional integration of STAC metadata for discovery and cataloguing.
26
-
27
-
JSON is the primary format for metadata, attributes, and structural declarations. Implementations are encouraged to support standardised naming conventions, EPSG code references, and structured metadata to facilitate search, validation, and transformation across platforms.
28
-
29
-
GeoZarr does not prescribe a single interface for data access. Instead, it enables **serverless and cloud-native** data access strategies by aligning its model with chunked, parallelisable storage patterns that are optimised for use in object stores and analytical environments.
12
+
As a **secondary objective**, GeoZarr extends the **CDM base layer** with additional capabilities required for geospatial and cloud-native applications. These extensions include **multiscale overviews**, which enable the representation of data at multiple levels of detail, and **affine transformations**, which define the spatial relationship between data coordinates and real-world locations. All extensions remain fully aligned with the CDM framework.
30
13
14
+
The **CDM** base layer also provides a **generic framework** capable of hosting metadata from a wide range of community standards. GeoZarr encourages the use of the **Climate and Forecast (CF) Conventions**, which are themselves defined around the CDM model, without imposing them as mandatory. This flexibility also supports metadata from other domain-specific standards such as **GeoTIFF**, **GDAL**, and similar geospatial conventions.
0 commit comments