Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,20 @@ This library implements the GeoZarr specification 0.4 with the following key req
4. **Multiscales Structure**: Overview levels are stored as children groups with proper tile matrix metadata
5. **Native CRS**: Coordinate reference systems are preserved without reprojection

## Contributing to GeoZarr Specification

Our implementation has contributed valuable feedback to the GeoZarr specification development process. Based on our real-world experience with Earth observation data, we have identified and reported several areas for improvement:

### Key Contributions

- **[Arbitrary Coordinate Systems Support](https://github.com/zarr-developers/geozarr-spec/issues/81)**: Advocating for native CRS preservation instead of web mapping bias
- **[Chunking Performance Optimization](https://github.com/zarr-developers/geozarr-spec/issues/82)**: Proposing flexible chunking strategies for optimal performance
- **[Multiscale Hierarchy Clarification](https://github.com/zarr-developers/geozarr-spec/issues/83)**: Providing clear structure definitions for multiscale implementations

Our implementation demonstrates that scientific accuracy and performance can be maintained while working with arbitrary coordinate systems, not just web mapping projections. This is particularly important for Earth observation data that often comes in UTM zones, polar stereographic, or other scientific projections.

For detailed information about our contributions, see our [GeoZarr Specification Contribution documentation](docs/geozarr-specification-contribution.md).

## Development

### Setting up Development Environment
Expand Down
139 changes: 139 additions & 0 deletions docs/geozarr-specification-contribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# GeoZarr Specification Contribution

This document outlines our contribution to the GeoZarr specification based on our implementation experience with the EOPF GeoZarr data model.

## Overview

Our implementation of GeoZarr-compliant data conversion for Earth Observation data has revealed several areas where the current specification could be improved to better support scientific use cases. We have contributed feedback to the GeoZarr specification development process through detailed GitHub issues.

## Key Issues Identified and Reported

### 1. Arbitrary Coordinate Systems Support

**Issue:** [zarr-developers/geozarr-spec#81](https://github.com/zarr-developers/geozarr-spec/issues/81)

**Problem:** The current specification has an implicit bias toward web mapping tile schemes (WebMercatorQuad), which may discourage scientific applications that work with native coordinate reference systems.

**Our Solution:** Our implementation successfully demonstrates:

- Creation of "Native CRS Tile Matrix Sets" for arbitrary projections
- Multiscale pyramids working with UTM and other scientific projections
- Proper scale denominator calculations for non-web CRS
- Chunking strategies optimized for native coordinate systems

**Impact:** This is critical for Earth observation data that often comes in UTM zones, polar stereographic, or other scientific projections where preserving the native CRS maintains scientific accuracy.

### 2. Chunking Performance Optimization

**Issue:** [zarr-developers/geozarr-spec#82](https://github.com/zarr-developers/geozarr-spec/issues/82)

**Problem:** The specification requires strict 1:1 mapping between Zarr chunks and tile matrix tiles, which prevents optimal chunking strategies for different data types and storage backends.

**Our Solution:** We implemented sophisticated chunk alignment logic:

```python
def calculate_aligned_chunk_size(dimension_size: int, target_chunk_size: int) -> int:
"""Calculate a chunk size that divides evenly into the dimension size."""
if target_chunk_size >= dimension_size:
return dimension_size

# Find the largest divisor that is <= target_chunk_size
for chunk_size in range(target_chunk_size, 0, -1):
if dimension_size % chunk_size == 0:
return chunk_size
return 1
```

**Impact:** This approach prevents chunk overlap issues with Dask while optimizing for actual data dimensions rather than arbitrary tile sizes, significantly improving performance.

### 3. Multiscale Hierarchy Structure Clarification

**Issue:** [zarr-developers/geozarr-spec#83](https://github.com/zarr-developers/geozarr-spec/issues/83)

**Problem:** The specification describes multiscale encoding but doesn't clearly define the exact hierarchical structure and relationship between parent groups and zoom level children.

**Our Solution:** We implemented a clear hierarchy structure:

```
/measurements/r10m/ # Parent group with multiscales metadata
├── 0/ # Native resolution (zoom level 0)
│ ├── band1
│ ├── band2
│ └── spatial_ref
├── 1/ # First overview level
│ ├── band1
│ ├── band2
│ └── spatial_ref
└── 2/ # Second overview level
├── band1
├── band2
└── spatial_ref
```

**Impact:** This provides a concrete, tested pattern for implementing multiscale hierarchies that other implementations can follow.

## Implementation Evidence

Our implementation provides concrete evidence for these improvements:

### Native CRS Preservation

- **Function:** `create_native_crs_tile_matrix_set()`
- **Purpose:** Creates custom tile matrix sets for arbitrary coordinate reference systems
- **Benefit:** Maintains scientific accuracy without unnecessary reprojection

### Robust Processing

- **Function:** `write_dataset_band_by_band_with_validation()`
- **Purpose:** Handles large datasets with retry logic and validation
- **Benefit:** Production-ready robustness for real-world data processing

### Comprehensive Metadata Handling

- **Function:** `_add_coordinate_metadata()`
- **Purpose:** Handles diverse coordinate types (time, angle, band, detector)
- **Benefit:** Supports the full range of Earth observation data structures

### Cloud Storage Optimization

- **Features:** S3 support with credential validation, storage options handling
- **Benefit:** Enables cloud-native workflows with proper error handling

## Specification Sections Addressed

Our contributions target specific sections of the GeoZarr specification:

- **Section 9.7.3** (Tile Matrix Set Representation) - Native CRS support
- **Section 9.7.4** (Chunk Layout Alignment) - Flexible chunking
- **Section 9.7.1** (Hierarchical Layout) - Clear structure definition
- **Section 9.7.2** (Metadata Encoding) - Metadata placement guidance

## Benefits for the Earth Observation Community

These contributions specifically benefit Earth observation and scientific data applications:

1. **Scientific Accuracy:** Preserving native CRS prevents distortion from unnecessary reprojections
2. **Performance:** Optimized chunking improves processing speed and reduces memory usage
3. **Clarity:** Clear hierarchy definitions enable consistent implementations
4. **Robustness:** Production patterns support real-world deployment scenarios

## Future Work

We continue to monitor the specification development and will contribute additional feedback as our implementation evolves. Areas for potential future contribution include:

- Cloud storage optimization patterns
- Coordinate variable handling for diverse data types
- Integration with STAC metadata standards
- Guidance for time dimension handling

## Related Documentation

- [Converter Documentation](converter.md) - Technical details of our implementation
- [README](../README.md) - Project overview and usage
- [ADR-101](adr/ADR-101-geozarr-specification-implementation-approach.md) - Architecture decisions (when available)

## Links

- [GeoZarr Specification Repository](https://github.com/zarr-developers/geozarr-spec)
- [Our GitHub Issues](https://github.com/zarr-developers/geozarr-spec/issues?q=is%3Aissue+author%3Aemmanuelmathot)
- [Project Issue #74](https://github.com/developmentseed/sentinel-zarr-explorer-coordination/issues/74)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ markdown_extensions:
nav:
- Home: index.md
- Using the Converter: converter.md
- GeoZarr Specification Contribution: geozarr-specification-contribution.md
Loading