|
| 1 | +# API Reference |
| 2 | + |
| 3 | +Complete reference for the EOPF GeoZarr library's Python API. |
| 4 | + |
| 5 | +## Core Functions |
| 6 | + |
| 7 | +### create_geozarr_dataset |
| 8 | + |
| 9 | +The main function for converting EOPF datasets to GeoZarr format. |
| 10 | + |
| 11 | +```python |
| 12 | +def create_geozarr_dataset( |
| 13 | + dt_input: xr.DataTree, |
| 14 | + groups: List[str], |
| 15 | + output_path: str, |
| 16 | + spatial_chunk: int = 4096, |
| 17 | + min_dimension: int = 256, |
| 18 | + tile_width: int = 256, |
| 19 | + max_retries: int = 3, |
| 20 | + **storage_kwargs |
| 21 | +) -> xr.DataTree |
| 22 | +``` |
| 23 | + |
| 24 | +**Parameters:** |
| 25 | + |
| 26 | +- `dt_input` (xr.DataTree): Input EOPF DataTree to convert |
| 27 | +- `groups` (List[str]): List of group paths to process (e.g., `["/measurements/r10m"]`) |
| 28 | +- `output_path` (str): Output path for the GeoZarr dataset (local or S3) |
| 29 | +- `spatial_chunk` (int, optional): Target spatial chunk size. Default: 4096 |
| 30 | +- `min_dimension` (int, optional): Minimum dimension size for processing. Default: 256 |
| 31 | +- `tile_width` (int, optional): Tile width for multiscale levels. Default: 256 |
| 32 | +- `max_retries` (int, optional): Maximum retry attempts for operations. Default: 3 |
| 33 | +- `**storage_kwargs`: Additional storage options (S3 credentials, etc.) |
| 34 | + |
| 35 | +**Returns:** |
| 36 | +- `xr.DataTree`: The converted GeoZarr-compliant DataTree |
| 37 | + |
| 38 | +**Example:** |
| 39 | +```python |
| 40 | +import xarray as xr |
| 41 | +from eopf_geozarr import create_geozarr_dataset |
| 42 | + |
| 43 | +dt = xr.open_datatree("input.zarr", engine="zarr") |
| 44 | +dt_geozarr = create_geozarr_dataset( |
| 45 | + dt_input=dt, |
| 46 | + groups=["/measurements/r10m", "/measurements/r20m"], |
| 47 | + output_path="output.zarr", |
| 48 | + spatial_chunk=2048 |
| 49 | +) |
| 50 | +``` |
| 51 | + |
| 52 | +## Conversion Functions |
| 53 | + |
| 54 | +### setup_datatree_metadata_geozarr_spec_compliant |
| 55 | + |
| 56 | +Sets up GeoZarr-compliant metadata for a DataTree. |
| 57 | + |
| 58 | +```python |
| 59 | +def setup_datatree_metadata_geozarr_spec_compliant( |
| 60 | + dt: xr.DataTree, |
| 61 | + geozarr_groups: Dict[str, xr.Dataset] |
| 62 | +) -> None |
| 63 | +``` |
| 64 | + |
| 65 | +### write_geozarr_group |
| 66 | + |
| 67 | +Writes a single group to GeoZarr format with proper metadata. |
| 68 | + |
| 69 | +```python |
| 70 | +def write_geozarr_group( |
| 71 | + group_path: str, |
| 72 | + datasets: Dict[str, xr.Dataset], |
| 73 | + output_path: str, |
| 74 | + spatial_chunk: int = 4096, |
| 75 | + max_retries: int = 3, |
| 76 | + **storage_kwargs |
| 77 | +) -> None |
| 78 | +``` |
| 79 | + |
| 80 | +### create_geozarr_compliant_multiscales |
| 81 | + |
| 82 | +Creates multiscales metadata compliant with GeoZarr specification. |
| 83 | + |
| 84 | +```python |
| 85 | +def create_geozarr_compliant_multiscales( |
| 86 | + datasets: Dict[str, xr.Dataset], |
| 87 | + tile_width: int = 256 |
| 88 | +) -> List[Dict[str, Any]] |
| 89 | +``` |
| 90 | + |
| 91 | +## Utility Functions |
| 92 | + |
| 93 | +### calculate_aligned_chunk_size |
| 94 | + |
| 95 | +Calculates optimal chunk size that aligns with data dimensions. |
| 96 | + |
| 97 | +```python |
| 98 | +def calculate_aligned_chunk_size( |
| 99 | + dimension_size: int, |
| 100 | + target_chunk_size: int |
| 101 | +) -> int |
| 102 | +``` |
| 103 | + |
| 104 | +**Parameters:** |
| 105 | +- `dimension_size` (int): Size of the data dimension |
| 106 | +- `target_chunk_size` (int): Desired chunk size |
| 107 | + |
| 108 | +**Returns:** |
| 109 | +- `int`: Optimal aligned chunk size |
| 110 | + |
| 111 | +**Example:** |
| 112 | +```python |
| 113 | +from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size |
| 114 | + |
| 115 | +# For a 10980x10980 image with target 4096 chunks |
| 116 | +chunk_size = calculate_aligned_chunk_size(10980, 4096) |
| 117 | +print(chunk_size) # Returns 3660 (10980 / 3 = 3660) |
| 118 | +``` |
| 119 | + |
| 120 | +### downsample_2d_array |
| 121 | + |
| 122 | +Downsamples a 2D array by factor of 2 using mean aggregation. |
| 123 | + |
| 124 | +```python |
| 125 | +def downsample_2d_array( |
| 126 | + data: np.ndarray, |
| 127 | + factor: int = 2 |
| 128 | +) -> np.ndarray |
| 129 | +``` |
| 130 | + |
| 131 | +### validate_existing_band_data |
| 132 | + |
| 133 | +Validates existing band data against expected specifications. |
| 134 | + |
| 135 | +```python |
| 136 | +def validate_existing_band_data( |
| 137 | + dataset: xr.Dataset, |
| 138 | + band_name: str, |
| 139 | + expected_shape: Tuple[int, ...], |
| 140 | + expected_chunks: Tuple[int, ...] |
| 141 | +) -> bool |
| 142 | +``` |
| 143 | + |
| 144 | +## File System Functions |
| 145 | + |
| 146 | +### Storage Path Utilities |
| 147 | + |
| 148 | +```python |
| 149 | +# Path normalization and validation |
| 150 | +def normalize_path(path: str) -> str |
| 151 | +def is_s3_path(path: str) -> bool |
| 152 | +def parse_s3_path(s3_path: str) -> tuple[str, str] |
| 153 | + |
| 154 | +# Storage options |
| 155 | +def get_storage_options(path: str, **kwargs: Any) -> Optional[Dict[str, Any]] |
| 156 | +def get_s3_storage_options(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any] |
| 157 | +``` |
| 158 | + |
| 159 | +### S3 Operations |
| 160 | + |
| 161 | +```python |
| 162 | +# S3 store creation and validation |
| 163 | +def create_s3_store(s3_path: str, **s3_kwargs: Any) -> str |
| 164 | +def validate_s3_access(s3_path: str, **s3_kwargs: Any) -> tuple[bool, Optional[str]] |
| 165 | +def s3_path_exists(s3_path: str, **s3_kwargs: Any) -> bool |
| 166 | + |
| 167 | +# S3 metadata operations |
| 168 | +def write_s3_json_metadata( |
| 169 | + s3_path: str, |
| 170 | + metadata: Dict[str, Any], |
| 171 | + **s3_kwargs: Any |
| 172 | +) -> None |
| 173 | + |
| 174 | +def read_s3_json_metadata(s3_path: str, **s3_kwargs: Any) -> Dict[str, Any] |
| 175 | +``` |
| 176 | + |
| 177 | +### Zarr Operations |
| 178 | + |
| 179 | +```python |
| 180 | +# Zarr group operations |
| 181 | +def open_zarr_group(path: str, mode: str = "r", **kwargs: Any) -> zarr.Group |
| 182 | +def open_s3_zarr_group(s3_path: str, mode: str = "r", **s3_kwargs: Any) -> zarr.Group |
| 183 | + |
| 184 | +# Metadata consolidation |
| 185 | +def consolidate_metadata(output_path: str, **storage_kwargs) -> None |
| 186 | +async def async_consolidate_metadata(output_path: str, **storage_kwargs) -> None |
| 187 | +``` |
| 188 | + |
| 189 | +## Metadata Functions |
| 190 | + |
| 191 | +### Coordinate Metadata |
| 192 | + |
| 193 | +```python |
| 194 | +def _add_coordinate_metadata(ds: xr.Dataset) -> None |
| 195 | +``` |
| 196 | + |
| 197 | +Adds proper coordinate metadata including: |
| 198 | +- `_ARRAY_DIMENSIONS` attributes |
| 199 | +- CF standard names |
| 200 | +- Coordinate variable attributes |
| 201 | + |
| 202 | +### Grid Mapping |
| 203 | + |
| 204 | +```python |
| 205 | +def _setup_grid_mapping(ds: xr.Dataset, grid_mapping_var_name: str) -> None |
| 206 | +def _add_geotransform(ds: xr.Dataset, grid_mapping_var: str) -> None |
| 207 | +``` |
| 208 | + |
| 209 | +### CRS and Tile Matrix |
| 210 | + |
| 211 | +```python |
| 212 | +def create_native_crs_tile_matrix_set( |
| 213 | + crs: Any, |
| 214 | + transform: Any, |
| 215 | + width: int, |
| 216 | + height: int, |
| 217 | + tile_width: int = 256 |
| 218 | +) -> Dict[str, Any] |
| 219 | +``` |
| 220 | + |
| 221 | +Creates a tile matrix set for native CRS (non-Web Mercator). |
| 222 | + |
| 223 | +## Overview Generation |
| 224 | + |
| 225 | +### calculate_overview_levels |
| 226 | + |
| 227 | +```python |
| 228 | +def calculate_overview_levels( |
| 229 | + width: int, |
| 230 | + height: int, |
| 231 | + min_dimension: int = 256 |
| 232 | +) -> List[int] |
| 233 | +``` |
| 234 | + |
| 235 | +Calculates appropriate overview levels based on data dimensions. |
| 236 | + |
| 237 | +### create_overview_dataset_all_vars |
| 238 | + |
| 239 | +```python |
| 240 | +def create_overview_dataset_all_vars( |
| 241 | + ds: xr.Dataset, |
| 242 | + overview_factor: int |
| 243 | +) -> xr.Dataset |
| 244 | +``` |
| 245 | + |
| 246 | +Creates overview dataset with all variables downsampled. |
| 247 | + |
| 248 | +## Error Handling |
| 249 | + |
| 250 | +### Retry Logic |
| 251 | + |
| 252 | +```python |
| 253 | +def write_dataset_band_by_band_with_validation( |
| 254 | + ds: xr.Dataset, |
| 255 | + output_path: str, |
| 256 | + max_retries: int = 3, |
| 257 | + **storage_kwargs |
| 258 | +) -> None |
| 259 | +``` |
| 260 | + |
| 261 | +Writes dataset with robust error handling and retry logic. |
| 262 | + |
| 263 | +## Constants and Enums |
| 264 | + |
| 265 | +### Coordinate Attributes |
| 266 | + |
| 267 | +```python |
| 268 | +def _get_x_coord_attrs() -> Dict[str, Any] |
| 269 | +def _get_y_coord_attrs() -> Dict[str, Any] |
| 270 | +``` |
| 271 | + |
| 272 | +Returns standard attributes for X and Y coordinates. |
| 273 | + |
| 274 | +### Grid Mapping Detection |
| 275 | + |
| 276 | +```python |
| 277 | +def is_grid_mapping_variable(ds: xr.Dataset, var_name: str) -> bool |
| 278 | +``` |
| 279 | + |
| 280 | +Determines if a variable is a grid mapping variable. |
| 281 | + |
| 282 | +## Usage Examples |
| 283 | + |
| 284 | +### Basic Conversion |
| 285 | + |
| 286 | +```python |
| 287 | +import xarray as xr |
| 288 | +from eopf_geozarr import create_geozarr_dataset |
| 289 | + |
| 290 | +# Load and convert |
| 291 | +dt = xr.open_datatree("input.zarr", engine="zarr") |
| 292 | +dt_geozarr = create_geozarr_dataset( |
| 293 | + dt_input=dt, |
| 294 | + groups=["/measurements/r10m"], |
| 295 | + output_path="output.zarr" |
| 296 | +) |
| 297 | +``` |
| 298 | + |
| 299 | +### Advanced S3 Usage |
| 300 | + |
| 301 | +```python |
| 302 | +from eopf_geozarr.conversion.fs_utils import ( |
| 303 | + validate_s3_access, |
| 304 | + get_s3_storage_options |
| 305 | +) |
| 306 | + |
| 307 | +# Validate S3 access |
| 308 | +s3_path = "s3://my-bucket/data.zarr" |
| 309 | +is_valid, error = validate_s3_access(s3_path) |
| 310 | + |
| 311 | +if is_valid: |
| 312 | + # Get storage options |
| 313 | + storage_opts = get_s3_storage_options(s3_path) |
| 314 | + |
| 315 | + # Convert with S3 |
| 316 | + dt_geozarr = create_geozarr_dataset( |
| 317 | + dt_input=dt, |
| 318 | + groups=["/measurements/r10m"], |
| 319 | + output_path=s3_path, |
| 320 | + **storage_opts |
| 321 | + ) |
| 322 | +``` |
| 323 | + |
| 324 | +### Custom Chunking |
| 325 | + |
| 326 | +```python |
| 327 | +from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size |
| 328 | + |
| 329 | +# Calculate optimal chunks for your data |
| 330 | +width, height = 10980, 10980 |
| 331 | +optimal_chunk = calculate_aligned_chunk_size(width, 4096) |
| 332 | + |
| 333 | +dt_geozarr = create_geozarr_dataset( |
| 334 | + dt_input=dt, |
| 335 | + groups=["/measurements/r10m"], |
| 336 | + output_path="output.zarr", |
| 337 | + spatial_chunk=optimal_chunk |
| 338 | +) |
| 339 | +``` |
| 340 | + |
| 341 | +## Type Hints |
| 342 | + |
| 343 | +The library uses comprehensive type hints. Import types as needed: |
| 344 | + |
| 345 | +```python |
| 346 | +from typing import Dict, List, Optional, Tuple, Any |
| 347 | +import xarray as xr |
| 348 | +import numpy as np |
| 349 | +``` |
| 350 | + |
| 351 | +## Error Types |
| 352 | + |
| 353 | +Common exceptions you may encounter: |
| 354 | + |
| 355 | +- `ValueError`: Invalid parameters or data |
| 356 | +- `FileNotFoundError`: Missing input files |
| 357 | +- `PermissionError`: Insufficient permissions for S3 or file operations |
| 358 | +- `zarr.errors.ArrayNotFoundError`: Missing Zarr arrays |
| 359 | +- `xarray.core.common.DataWithCoords`: Data structure issues |
| 360 | + |
| 361 | +For detailed error handling examples, see the [FAQ](faq.md). |
0 commit comments