Skip to content

Commit 5ab750d

Browse files
feat: let Dask use shard-aligned writes when sharding enabled (#54)
When --enable-sharding is True, set align_chunks=False to let Dask rechunk to shard boundaries instead of chunk boundaries. Aligns with Dask PRs #12104/#12105 for atomic shard writes. Prevents partial-shard read-modify-rewrite cycles. Non-sharded workflows unchanged (align_chunks=True preserved). Co-authored-by: Emmanuel Mathot <emmanuel.mathot@gmail.com>
1 parent 47f5282 commit 5ab750d

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

src/eopf_geozarr/conversion/geozarr.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -716,14 +716,16 @@ def create_geozarr_compliant_multiscales(
716716

717717
# Write the overview dataset
718718
overview_group = f"{group_name}/{level}"
719+
# When sharding enabled, let Dask rechunk to shard boundaries
720+
align_chunks_flag = False if enable_sharding else True
719721
overview_ds.to_zarr(
720722
output_path,
721723
group=overview_group,
722724
mode="w",
723725
consolidated=True,
724726
zarr_format=3,
725727
encoding=encoding,
726-
align_chunks=True,
728+
align_chunks=align_chunks_flag,
727729
storage_options=storage_options,
728730
)
729731

0 commit comments

Comments
 (0)