Skip to content

Commit 03a727d

Browse files
Add comprehensive FAQ documentation and enhance installation guide
- Created a new FAQ document (faq.md) covering common questions and troubleshooting for the EOPF GeoZarr library. - Added detailed installation instructions (installation.md) including methods, dependencies, and troubleshooting tips. - Introduced a Quick Start guide (quickstart.md) to facilitate initial usage of the library. - Updated index.md to improve navigation and highlight key features of the library. - Revised geozarr-specification-contribution.md to reflect changes in related documentation links. - Enhanced mkdocs.yml for better organization and navigation of the documentation site. - Specified version requirements for pydantic-zarr in uv.lock for consistency.
1 parent 9add75b commit 03a727d

12 files changed

Lines changed: 2257 additions & 58 deletions

.github/workflows/build_uv_cache.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ on:
88
- "uv.lock"
99
- "pyproject.toml"
1010
workflow_dispatch:
11+
schedule:
12+
- cron: "0 0 */5 * *" # Every 5 days, before cache expiry
1113

1214
jobs:
1315
build-cache:

docs/api-reference.md

Lines changed: 361 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,361 @@
1+
# API Reference
2+
3+
Complete reference for the EOPF GeoZarr library's Python API.
4+
5+
## Core Functions
6+
7+
### create_geozarr_dataset
8+
9+
The main function for converting EOPF datasets to GeoZarr format.
10+
11+
```python
12+
def create_geozarr_dataset(
13+
dt_input: xr.DataTree,
14+
groups: List[str],
15+
output_path: str,
16+
spatial_chunk: int = 4096,
17+
min_dimension: int = 256,
18+
tile_width: int = 256,
19+
max_retries: int = 3,
20+
**storage_kwargs
21+
) -> xr.DataTree
22+
```
23+
24+
**Parameters:**
25+
26+
- `dt_input` (xr.DataTree): Input EOPF DataTree to convert
27+
- `groups` (List[str]): List of group paths to process (e.g., `["/measurements/r10m"]`)
28+
- `output_path` (str): Output path for the GeoZarr dataset (local or S3)
29+
- `spatial_chunk` (int, optional): Target spatial chunk size. Default: 4096
30+
- `min_dimension` (int, optional): Minimum dimension size for processing. Default: 256
31+
- `tile_width` (int, optional): Tile width for multiscale levels. Default: 256
32+
- `max_retries` (int, optional): Maximum retry attempts for operations. Default: 3
33+
- `**storage_kwargs`: Additional storage options (S3 credentials, etc.)
34+
35+
**Returns:**
36+
- `xr.DataTree`: The converted GeoZarr-compliant DataTree
37+
38+
**Example:**
39+
```python
40+
import xarray as xr
41+
from eopf_geozarr import create_geozarr_dataset
42+
43+
dt = xr.open_datatree("input.zarr", engine="zarr")
44+
dt_geozarr = create_geozarr_dataset(
45+
dt_input=dt,
46+
groups=["/measurements/r10m", "/measurements/r20m"],
47+
output_path="output.zarr",
48+
spatial_chunk=2048
49+
)
50+
```
51+
52+
## Conversion Functions
53+
54+
### setup_datatree_metadata_geozarr_spec_compliant
55+
56+
Sets up GeoZarr-compliant metadata for a DataTree.
57+
58+
```python
59+
def setup_datatree_metadata_geozarr_spec_compliant(
60+
dt: xr.DataTree,
61+
geozarr_groups: Dict[str, xr.Dataset]
62+
) -> None
63+
```
64+
65+
### write_geozarr_group
66+
67+
Writes a single group to GeoZarr format with proper metadata.
68+
69+
```python
70+
def write_geozarr_group(
71+
group_path: str,
72+
datasets: Dict[str, xr.Dataset],
73+
output_path: str,
74+
spatial_chunk: int = 4096,
75+
max_retries: int = 3,
76+
**storage_kwargs
77+
) -> None
78+
```
79+
80+
### create_geozarr_compliant_multiscales
81+
82+
Creates multiscales metadata compliant with GeoZarr specification.
83+
84+
```python
85+
def create_geozarr_compliant_multiscales(
86+
datasets: Dict[str, xr.Dataset],
87+
tile_width: int = 256
88+
) -> List[Dict[str, Any]]
89+
```
90+
91+
## Utility Functions
92+
93+
### calculate_aligned_chunk_size
94+
95+
Calculates optimal chunk size that aligns with data dimensions.
96+
97+
```python
98+
def calculate_aligned_chunk_size(
99+
dimension_size: int,
100+
target_chunk_size: int
101+
) -> int
102+
```
103+
104+
**Parameters:**
105+
- `dimension_size` (int): Size of the data dimension
106+
- `target_chunk_size` (int): Desired chunk size
107+
108+
**Returns:**
109+
- `int`: Optimal aligned chunk size
110+
111+
**Example:**
112+
```python
113+
from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size
114+
115+
# For a 10980x10980 image with target 4096 chunks
116+
chunk_size = calculate_aligned_chunk_size(10980, 4096)
117+
print(chunk_size) # Returns 3660 (10980 / 3 = 3660)
118+
```
119+
120+
### downsample_2d_array
121+
122+
Downsamples a 2D array by factor of 2 using mean aggregation.
123+
124+
```python
125+
def downsample_2d_array(
126+
data: np.ndarray,
127+
factor: int = 2
128+
) -> np.ndarray
129+
```
130+
131+
### validate_existing_band_data
132+
133+
Validates existing band data against expected specifications.
134+
135+
```python
136+
def validate_existing_band_data(
137+
dataset: xr.Dataset,
138+
band_name: str,
139+
expected_shape: Tuple[int, ...],
140+
expected_chunks: Tuple[int, ...]
141+
) -> bool
142+
```
143+
144+
## File System Functions
145+
146+
### Storage Path Utilities
147+
148+
```python
149+
# Path normalization and validation
150+
def normalize_path(path: str) -> str
151+
def is_s3_path(path: str) -> bool
152+
def parse_s3_path(s3_path: str) -> tuple[str, str]
153+
154+
# Storage options
155+
def get_storage_options(path: str, **kwargs: Any) -> Optional[Dict[str, Any]]
156+
def get_s3_storage_options(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any]
157+
```
158+
159+
### S3 Operations
160+
161+
```python
162+
# S3 store creation and validation
163+
def create_s3_store(s3_path: str, **s3_kwargs: Any) -> str
164+
def validate_s3_access(s3_path: str, **s3_kwargs: Any) -> tuple[bool, Optional[str]]
165+
def s3_path_exists(s3_path: str, **s3_kwargs: Any) -> bool
166+
167+
# S3 metadata operations
168+
def write_s3_json_metadata(
169+
s3_path: str,
170+
metadata: Dict[str, Any],
171+
**s3_kwargs: Any
172+
) -> None
173+
174+
def read_s3_json_metadata(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any]
175+
```
176+
177+
### Zarr Operations
178+
179+
```python
180+
# Zarr group operations
181+
def open_zarr_group(path: str, mode: str = "r", **kwargs: Any) -> zarr.Group
182+
def open_s3_zarr_group(s3_path: str, mode: str = "r", **s3_kwargs: Any) -> zarr.Group
183+
184+
# Metadata consolidation
185+
def consolidate_metadata(output_path: str, **storage_kwargs) -> None
186+
async def async_consolidate_metadata(output_path: str, **storage_kwargs) -> None
187+
```
188+
189+
## Metadata Functions
190+
191+
### Coordinate Metadata
192+
193+
```python
194+
def _add_coordinate_metadata(ds: xr.Dataset) -> None
195+
```
196+
197+
Adds proper coordinate metadata including:
198+
- `_ARRAY_DIMENSIONS` attributes
199+
- CF standard names
200+
- Coordinate variable attributes
201+
202+
### Grid Mapping
203+
204+
```python
205+
def _setup_grid_mapping(ds: xr.Dataset, grid_mapping_var_name: str) -> None
206+
def _add_geotransform(ds: xr.Dataset, grid_mapping_var: str) -> None
207+
```
208+
209+
### CRS and Tile Matrix
210+
211+
```python
212+
def create_native_crs_tile_matrix_set(
213+
crs: Any,
214+
transform: Any,
215+
width: int,
216+
height: int,
217+
tile_width: int = 256
218+
) -> Dict[str, Any]
219+
```
220+
221+
Creates a tile matrix set for native CRS (non-Web Mercator).
222+
223+
## Overview Generation
224+
225+
### calculate_overview_levels
226+
227+
```python
228+
def calculate_overview_levels(
229+
width: int,
230+
height: int,
231+
min_dimension: int = 256
232+
) -> List[int]
233+
```
234+
235+
Calculates appropriate overview levels based on data dimensions.
236+
237+
### create_overview_dataset_all_vars
238+
239+
```python
240+
def create_overview_dataset_all_vars(
241+
ds: xr.Dataset,
242+
overview_factor: int
243+
) -> xr.Dataset
244+
```
245+
246+
Creates overview dataset with all variables downsampled.
247+
248+
## Error Handling
249+
250+
### Retry Logic
251+
252+
```python
253+
def write_dataset_band_by_band_with_validation(
254+
ds: xr.Dataset,
255+
output_path: str,
256+
max_retries: int = 3,
257+
**storage_kwargs
258+
) -> None
259+
```
260+
261+
Writes dataset with robust error handling and retry logic.
262+
263+
## Constants and Enums
264+
265+
### Coordinate Attributes
266+
267+
```python
268+
def _get_x_coord_attrs() -> Dict[str, Any]
269+
def _get_y_coord_attrs() -> Dict[str, Any]
270+
```
271+
272+
Returns standard attributes for X and Y coordinates.
273+
274+
### Grid Mapping Detection
275+
276+
```python
277+
def is_grid_mapping_variable(ds: xr.Dataset, var_name: str) -> bool
278+
```
279+
280+
Determines if a variable is a grid mapping variable.
281+
282+
## Usage Examples
283+
284+
### Basic Conversion
285+
286+
```python
287+
import xarray as xr
288+
from eopf_geozarr import create_geozarr_dataset
289+
290+
# Load and convert
291+
dt = xr.open_datatree("input.zarr", engine="zarr")
292+
dt_geozarr = create_geozarr_dataset(
293+
dt_input=dt,
294+
groups=["/measurements/r10m"],
295+
output_path="output.zarr"
296+
)
297+
```
298+
299+
### Advanced S3 Usage
300+
301+
```python
302+
from eopf_geozarr.conversion.fs_utils import (
303+
validate_s3_access,
304+
get_s3_storage_options
305+
)
306+
307+
# Validate S3 access
308+
s3_path = "s3://my-bucket/data.zarr"
309+
is_valid, error = validate_s3_access(s3_path)
310+
311+
if is_valid:
312+
# Get storage options
313+
storage_opts = get_s3_storage_options(s3_path)
314+
315+
# Convert with S3
316+
dt_geozarr = create_geozarr_dataset(
317+
dt_input=dt,
318+
groups=["/measurements/r10m"],
319+
output_path=s3_path,
320+
**storage_opts
321+
)
322+
```
323+
324+
### Custom Chunking
325+
326+
```python
327+
from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size
328+
329+
# Calculate optimal chunks for your data
330+
width, height = 10980, 10980
331+
optimal_chunk = calculate_aligned_chunk_size(width, 4096)
332+
333+
dt_geozarr = create_geozarr_dataset(
334+
dt_input=dt,
335+
groups=["/measurements/r10m"],
336+
output_path="output.zarr",
337+
spatial_chunk=optimal_chunk
338+
)
339+
```
340+
341+
## Type Hints
342+
343+
The library uses comprehensive type hints. Import types as needed:
344+
345+
```python
346+
from typing import Dict, List, Optional, Tuple, Any
347+
import xarray as xr
348+
import numpy as np
349+
```
350+
351+
## Error Types
352+
353+
Common exceptions you may encounter:
354+
355+
- `ValueError`: Invalid parameters or data
356+
- `FileNotFoundError`: Missing input files
357+
- `PermissionError`: Insufficient permissions for S3 or file operations
358+
- `zarr.errors.ArrayNotFoundError`: Missing Zarr arrays
359+
- `xarray.core.common.DataWithCoords`: Data structure issues
360+
361+
For detailed error handling examples, see the [FAQ](faq.md).

0 commit comments

Comments
 (0)