You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Local Cluster**: Automatically starts a local dask cluster with multiple workers
137
+
-**Dashboard Access**: Provides access to the dask dashboard for monitoring (shown in verbose mode)
138
+
-**Automatic Cleanup**: Properly closes the cluster even if errors occur during processing
139
+
-**Chunk Alignment**: Automatically aligns Zarr chunks with dask chunks to prevent data corruption
140
+
-**Memory Efficiency**: Better memory management through parallel chunk processing
141
+
-**Error Handling**: Graceful handling of dask import errors with helpful installation instructions
142
+
143
+
#### Chunk Alignment
144
+
145
+
The library includes advanced chunk alignment logic to prevent the common issue of overlapping chunks when using dask:
146
+
147
+
-**Smart Detection**: Automatically detects if data is dask-backed and uses existing chunk structure
148
+
-**Aligned Calculation**: Uses `calculate_aligned_chunk_size()` to find optimal chunk sizes that divide evenly into data dimensions
149
+
-**Proper Rechunking**: Ensures datasets are rechunked to match encoding before writing
150
+
-**Fallback Logic**: For non-dask arrays, uses reasonable chunk sizes that don't exceed data dimensions
151
+
152
+
This prevents errors like:
153
+
```
154
+
❌ Failed to write tci after 2 attempts: Specified Zarr chunks encoding['chunks']=(1, 3660, 3660)
155
+
for variable named 'tci' would overlap multiple Dask chunks
156
+
```
157
+
114
158
#### S3 Python API
115
159
116
160
```python
@@ -202,7 +246,22 @@ Downsample a 2D array using block averaging.
202
246
203
247
#### `calculate_aligned_chunk_size`
204
248
205
-
Calculate a chunk size that aligns well with the data dimension.
249
+
Calculate a chunk size that divides evenly into the dimension size. This ensures that Zarr chunks align properly with the data dimensions, preventing chunk overlap issues when writing with Dask.
250
+
251
+
**Parameters:**
252
+
-`dimension_size` (int): Size of the dimension to chunk
253
+
-`target_chunk_size` (int): Desired chunk size
254
+
255
+
**Returns:**
256
+
-`int`: Aligned chunk size that divides evenly into dimension_size
257
+
258
+
**Example:**
259
+
```python
260
+
from eopf_geozarr.conversion.utils import calculate_aligned_chunk_size
261
+
262
+
# For a dimension of size 5490 with target chunk size 3660
0 commit comments