Skip to content

Commit 9add75b

Browse files
Add GeoZarr Specification Contribution documentation and update navigation (#21)
1 parent a817ad6 commit 9add75b

3 files changed

Lines changed: 154 additions & 0 deletions

File tree

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,20 @@ This library implements the GeoZarr specification 0.4 with the following key req
290290
4. **Multiscales Structure**: Overview levels are stored as children groups with proper tile matrix metadata
291291
5. **Native CRS**: Coordinate reference systems are preserved without reprojection
292292

293+
## Contributing to GeoZarr Specification
294+
295+
Our implementation has contributed valuable feedback to the GeoZarr specification development process. Based on our real-world experience with Earth observation data, we have identified and reported several areas for improvement:
296+
297+
### Key Contributions
298+
299+
- **[Arbitrary Coordinate Systems Support](https://github.com/zarr-developers/geozarr-spec/issues/81)**: Advocating for native CRS preservation instead of web mapping bias
300+
- **[Chunking Performance Optimization](https://github.com/zarr-developers/geozarr-spec/issues/82)**: Proposing flexible chunking strategies for optimal performance
301+
- **[Multiscale Hierarchy Clarification](https://github.com/zarr-developers/geozarr-spec/issues/83)**: Providing clear structure definitions for multiscale implementations
302+
303+
Our implementation demonstrates that scientific accuracy and performance can be maintained while working with arbitrary coordinate systems, not just web mapping projections. This is particularly important for Earth observation data that often comes in UTM zones, polar stereographic, or other scientific projections.
304+
305+
For detailed information about our contributions, see our [GeoZarr Specification Contribution documentation](docs/geozarr-specification-contribution.md).
306+
293307
## Development
294308

295309
### Setting up Development Environment
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# GeoZarr Specification Contribution
2+
3+
This document outlines our contribution to the GeoZarr specification based on our implementation experience with the EOPF GeoZarr data model.
4+
5+
## Overview
6+
7+
Our implementation of GeoZarr-compliant data conversion for Earth Observation data has revealed several areas where the current specification could be improved to better support scientific use cases. We have contributed feedback to the GeoZarr specification development process through detailed GitHub issues.
8+
9+
## Key Issues Identified and Reported
10+
11+
### 1. Arbitrary Coordinate Systems Support
12+
13+
**Issue:** [zarr-developers/geozarr-spec#81](https://github.com/zarr-developers/geozarr-spec/issues/81)
14+
15+
**Problem:** The current specification has an implicit bias toward web mapping tile schemes (WebMercatorQuad), which may discourage scientific applications that work with native coordinate reference systems.
16+
17+
**Our Solution:** Our implementation successfully demonstrates:
18+
19+
- Creation of "Native CRS Tile Matrix Sets" for arbitrary projections
20+
- Multiscale pyramids working with UTM and other scientific projections
21+
- Proper scale denominator calculations for non-web CRS
22+
- Chunking strategies optimized for native coordinate systems
23+
24+
**Impact:** This is critical for Earth observation data that often comes in UTM zones, polar stereographic, or other scientific projections where preserving the native CRS maintains scientific accuracy.
25+
26+
### 2. Chunking Performance Optimization
27+
28+
**Issue:** [zarr-developers/geozarr-spec#82](https://github.com/zarr-developers/geozarr-spec/issues/82)
29+
30+
**Problem:** The specification requires strict 1:1 mapping between Zarr chunks and tile matrix tiles, which prevents optimal chunking strategies for different data types and storage backends.
31+
32+
**Our Solution:** We implemented sophisticated chunk alignment logic:
33+
34+
```python
35+
def calculate_aligned_chunk_size(dimension_size: int, target_chunk_size: int) -> int:
36+
"""Calculate a chunk size that divides evenly into the dimension size."""
37+
if target_chunk_size >= dimension_size:
38+
return dimension_size
39+
40+
# Find the largest divisor that is <= target_chunk_size
41+
for chunk_size in range(target_chunk_size, 0, -1):
42+
if dimension_size % chunk_size == 0:
43+
return chunk_size
44+
return 1
45+
```
46+
47+
**Impact:** This approach prevents chunk overlap issues with Dask while optimizing for actual data dimensions rather than arbitrary tile sizes, significantly improving performance.
48+
49+
### 3. Multiscale Hierarchy Structure Clarification
50+
51+
**Issue:** [zarr-developers/geozarr-spec#83](https://github.com/zarr-developers/geozarr-spec/issues/83)
52+
53+
**Problem:** The specification describes multiscale encoding but doesn't clearly define the exact hierarchical structure and relationship between parent groups and zoom level children.
54+
55+
**Our Solution:** We implemented a clear hierarchy structure:
56+
57+
```
58+
/measurements/r10m/ # Parent group with multiscales metadata
59+
├── 0/ # Native resolution (zoom level 0)
60+
│ ├── band1
61+
│ ├── band2
62+
│ └── spatial_ref
63+
├── 1/ # First overview level
64+
│ ├── band1
65+
│ ├── band2
66+
│ └── spatial_ref
67+
└── 2/ # Second overview level
68+
├── band1
69+
├── band2
70+
└── spatial_ref
71+
```
72+
73+
**Impact:** This provides a concrete, tested pattern for implementing multiscale hierarchies that other implementations can follow.
74+
75+
## Implementation Evidence
76+
77+
Our implementation provides concrete evidence for these improvements:
78+
79+
### Native CRS Preservation
80+
81+
- **Function:** `create_native_crs_tile_matrix_set()`
82+
- **Purpose:** Creates custom tile matrix sets for arbitrary coordinate reference systems
83+
- **Benefit:** Maintains scientific accuracy without unnecessary reprojection
84+
85+
### Robust Processing
86+
87+
- **Function:** `write_dataset_band_by_band_with_validation()`
88+
- **Purpose:** Handles large datasets with retry logic and validation
89+
- **Benefit:** Production-ready robustness for real-world data processing
90+
91+
### Comprehensive Metadata Handling
92+
93+
- **Function:** `_add_coordinate_metadata()`
94+
- **Purpose:** Handles diverse coordinate types (time, angle, band, detector)
95+
- **Benefit:** Supports the full range of Earth observation data structures
96+
97+
### Cloud Storage Optimization
98+
99+
- **Features:** S3 support with credential validation, storage options handling
100+
- **Benefit:** Enables cloud-native workflows with proper error handling
101+
102+
## Specification Sections Addressed
103+
104+
Our contributions target specific sections of the GeoZarr specification:
105+
106+
- **Section 9.7.3** (Tile Matrix Set Representation) - Native CRS support
107+
- **Section 9.7.4** (Chunk Layout Alignment) - Flexible chunking
108+
- **Section 9.7.1** (Hierarchical Layout) - Clear structure definition
109+
- **Section 9.7.2** (Metadata Encoding) - Metadata placement guidance
110+
111+
## Benefits for the Earth Observation Community
112+
113+
These contributions specifically benefit Earth observation and scientific data applications:
114+
115+
1. **Scientific Accuracy:** Preserving native CRS prevents distortion from unnecessary reprojections
116+
2. **Performance:** Optimized chunking improves processing speed and reduces memory usage
117+
3. **Clarity:** Clear hierarchy definitions enable consistent implementations
118+
4. **Robustness:** Production patterns support real-world deployment scenarios
119+
120+
## Future Work
121+
122+
We continue to monitor the specification development and will contribute additional feedback as our implementation evolves. Areas for potential future contribution include:
123+
124+
- Cloud storage optimization patterns
125+
- Coordinate variable handling for diverse data types
126+
- Integration with STAC metadata standards
127+
- Guidance for time dimension handling
128+
129+
## Related Documentation
130+
131+
- [Converter Documentation](converter.md) - Technical details of our implementation
132+
- [README](../README.md) - Project overview and usage
133+
- [ADR-101](adr/ADR-101-geozarr-specification-implementation-approach.md) - Architecture decisions (when available)
134+
135+
## Links
136+
137+
- [GeoZarr Specification Repository](https://github.com/zarr-developers/geozarr-spec)
138+
- [Our GitHub Issues](https://github.com/zarr-developers/geozarr-spec/issues?q=is%3Aissue+author%3Aemmanuelmathot)
139+
- [Project Issue #74](https://github.com/developmentseed/sentinel-zarr-explorer-coordination/issues/74)

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,4 @@ markdown_extensions:
5252
nav:
5353
- Home: index.md
5454
- Using the Converter: converter.md
55+
- GeoZarr Specification Contribution: geozarr-specification-contribution.md

0 commit comments

Comments
 (0)