Zarr version
v3.1.2
Numcodecs version
v0.16.2
Python Version
3.13.3
Operating System
Windows
Installation
pip install in venv
Description
When giving data directly into a create_array call with the shards parameter, only part of the data gets written to disk. The behaviour disappears when the shards parameter is not used. It also disappears when setting the data only after array creation.
Steps to reproduce
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
# ]
# ///
#
# This script automatically imports the development branch of zarr to check for issues
import zarr
import numpy as np
na = np.random.random((2000,2000))
store = zarr.storage.MemoryStore() # Can also be LocalStore, doesn't matter
root = zarr.group(store)
za_no_shard = root.create_array("noshard", data=na, chunks=(1000,1000), fill_value=np.nan, overwrite=True)
za_shard = root.create_array("shard", data=na, chunks=(1000,1000), shards=(2000,1000), fill_value=np.nan, overwrite=True)
print(np.isnan(na).sum().sum()) # 0 as expected
print(np.isnan(za_no_shard[:]).sum().sum()) # 0 as expected
print(f"{np.isnan(za_shard[:]).sum().sum()} should be 0!") # 2,000,000 (half the chunks are missing)
# Problem occurs only when using "data" Param in create_array. Direct assignment works:
za_shard[:] = na
print(np.isnan(za_shard[:]).sum().sum()) # 0 as expected
Additional output
No response
Zarr version
v3.1.2
Numcodecs version
v0.16.2
Python Version
3.13.3
Operating System
Windows
Installation
pip install in venv
Description
When giving data directly into a create_array call with the shards parameter, only part of the data gets written to disk. The behaviour disappears when the shards parameter is not used. It also disappears when setting the data only after array creation.
Steps to reproduce
Additional output
No response