Skip to content

Commit 93b6d1b

Browse files
authored
Merge branch 'main' into ian/simple-tree
2 parents 40fcc02 + 03355b8 commit 93b6d1b

19 files changed

Lines changed: 568 additions & 17 deletions

File tree

docs/overrides/stylesheets/extra.css

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,11 +56,6 @@
5656
.md-header .md-search__input {
5757
background-color: rgba(255, 255, 255, 0.15);
5858
border: 1px solid rgba(255, 255, 255, 0.2);
59-
color: white;
60-
}
61-
62-
.md-header .md-search__input::placeholder {
63-
color: rgba(255, 255, 255, 0.7);
6459
}
6560

6661
/* Navigation tabs */

docs/user-guide/glossary.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Glossary
2+
3+
This page defines key terms used throughout the zarr-python documentation and API.
4+
5+
## Array Structure
6+
7+
### Array
8+
9+
An N-dimensional typed array stored in a Zarr [store](#store). An array's
10+
[metadata](#metadata) defines its shape, data type, chunk layout, and codecs.
11+
12+
### Chunk
13+
14+
The fundamental unit of data in a Zarr array. An array is divided into chunks
15+
along each dimension according to the [chunk grid](#chunk-grid), which is currently
16+
part of Zarr's private API. Each chunk is independently compressed and encoded
17+
through the array's [codec](#codec) pipeline.
18+
19+
When [sharding](#shard) is used, "chunk" refers to the inner chunks within each
20+
shard, because those are the compressible units. The chunks are the smallest units
21+
that can be read independently.
22+
23+
!!! warning "Convention specific to zarr-python"
24+
The use of "chunk" to mean the inner sub-chunk within a shard is a convention
25+
adopted by zarr-python's `Array` API. In the Zarr V3 specification and in other
26+
Zarr implementations, "chunk" may refer to the top-level grid cells (which
27+
zarr-python calls "shards" when the sharding codec is used). Be aware of this
28+
distinction when working across libraries.
29+
30+
**API**: [`Array.chunks`][zarr.Array.chunks] returns the chunk shape. When
31+
sharding is used, this is the inner chunk shape.
32+
33+
### Chunk Grid
34+
35+
The partitioning of an array's elements into [chunks](#chunk). In Zarr V3, the
36+
chunk grid is defined in the array [metadata](#metadata) and determines the
37+
boundaries of each storage object.
38+
39+
When sharding is used, the chunk grid defines the [shard](#shard) boundaries,
40+
not the inner chunk boundaries. The inner chunk shape is defined within the
41+
[sharding codec](#shard).
42+
43+
**API**: The `chunk_grid` field in array metadata contains the storage-level
44+
grid.
45+
46+
### Shard
47+
48+
A storage object that contains one or more [chunks](#chunk). Sharding reduces the
49+
number of objects in a [store](#store) by grouping chunks together, which
50+
improves performance on file systems and object storage.
51+
52+
Within each shard, chunks are compressed independently and can be read
53+
individually. However, writing requires updating the full shard for consistency,
54+
making shards the unit of writing and chunks the unit of reading.
55+
56+
Sharding is implemented as a [codec](#codec) (the sharding indexed codec).
57+
When sharding is used:
58+
59+
- The [chunk grid](#chunk-grid) in metadata defines the shard boundaries
60+
- The sharding codec's `chunk_shape` defines the inner chunk size
61+
- Each shard contains `shard_shape / chunk_shape` chunks per dimension
62+
63+
**API**: [`Array.shards`][zarr.Array.shards] returns the shard shape, or `None`
64+
if sharding is not used. [`Array.chunks`][zarr.Array.chunks] returns the inner
65+
chunk shape.
66+
67+
## Storage
68+
69+
### Store
70+
71+
A key-value storage backend that holds Zarr data and metadata. Stores implement
72+
the [`zarr.abc.store.Store`][] interface. Examples include local file systems,
73+
cloud object storage (S3, GCS, Azure), zip files, and in-memory dictionaries.
74+
75+
Each [chunk](#chunk) or [shard](#shard) is stored as a single value (object or
76+
file) in the store, addressed by a key derived from its grid coordinates.
77+
78+
### Metadata
79+
80+
The JSON document (`zarr.json`) that describes an [array](#array) or group. For
81+
arrays, metadata includes the shape, data type, [chunk grid](#chunk-grid), fill
82+
value, and [codec](#codec) pipeline. Metadata is stored alongside the data in
83+
the [store](#store). Zarr-Python does not yet expose its internal metadata
84+
representation as part of its public API.
85+
86+
## Codecs
87+
88+
### Codec
89+
90+
A transformation applied to array data during reading and writing. Codecs are
91+
chained into a pipeline and come in three types:
92+
93+
- **Array-to-array**: Transforms like transpose that rearrange array elements
94+
- **Array-to-bytes**: Serialization that converts an array to a byte sequence
95+
(exactly one required)
96+
- **Bytes-to-bytes**: Compression or checksums applied to the serialized bytes
97+
98+
The [sharding indexed codec](#shard) is a special array-to-bytes codec that
99+
groups multiple [chunks](#chunk) into a single storage object.
100+
101+
## API Properties
102+
103+
The following properties are available on [`zarr.Array`][]:
104+
105+
| Property | Description |
106+
|----------|-------------|
107+
| `.chunks` | Chunk shape — the inner chunk shape when sharding is used |
108+
| `.shards` | Shard shape, or `None` if no sharding |
109+
| `.nchunks` | Total number of independently compressible units across the array |
110+
| `.cdata_shape` | Number of independently compressible units per dimension |

docs/user-guide/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,10 @@ Take your skills to the next level:
3535
- **[Extending](extending.md)** - Extend functionality with custom code
3636
- **[Consolidated Metadata](consolidated_metadata.md)** - Advanced metadata management
3737

38+
## Reference
39+
40+
- **[Glossary](glossary.md)** - Definitions of key terms (chunks, shards, codecs, etc.)
41+
3842
## Need Help?
3943

4044
- Browse the [API Reference](../api/zarr/index.md) for detailed function documentation

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ nav:
2727
- user-guide/gpu.md
2828
- user-guide/consolidated_metadata.md
2929
- user-guide/experimental.md
30+
- user-guide/glossary.md
3031
- Examples:
3132
- user-guide/examples/custom_dtype.md
3233
- API Reference:

src/zarr/abc/store.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,16 @@
1616

1717
from zarr.core.buffer import Buffer, BufferPrototype
1818

19-
__all__ = ["ByteGetter", "ByteSetter", "Store", "set_or_delete"]
19+
__all__ = [
20+
"ByteGetter",
21+
"ByteSetter",
22+
"Store",
23+
"SupportsDeleteSync",
24+
"SupportsGetSync",
25+
"SupportsSetSync",
26+
"SupportsSyncStore",
27+
"set_or_delete",
28+
]
2029

2130

2231
@dataclass(frozen=True, slots=True)
@@ -700,6 +709,31 @@ async def delete(self) -> None: ...
700709
async def set_if_not_exists(self, default: Buffer) -> None: ...
701710

702711

712+
@runtime_checkable
713+
class SupportsGetSync(Protocol):
714+
def get_sync(
715+
self,
716+
key: str,
717+
*,
718+
prototype: BufferPrototype | None = None,
719+
byte_range: ByteRequest | None = None,
720+
) -> Buffer | None: ...
721+
722+
723+
@runtime_checkable
724+
class SupportsSetSync(Protocol):
725+
def set_sync(self, key: str, value: Buffer) -> None: ...
726+
727+
728+
@runtime_checkable
729+
class SupportsDeleteSync(Protocol):
730+
def delete_sync(self, key: str) -> None: ...
731+
732+
733+
@runtime_checkable
734+
class SupportsSyncStore(SupportsGetSync, SupportsSetSync, SupportsDeleteSync, Protocol): ...
735+
736+
703737
async def set_or_delete(byte_setter: ByteSetter, value: Buffer | None) -> None:
704738
"""Set or delete a value in a byte setter
705739

src/zarr/core/chunk_grids.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,14 @@ def normalize_chunks(chunks: Any, shape: tuple[int, ...], typesize: int) -> tupl
126126
chunks = tuple(int(chunks) for _ in shape)
127127

128128
# handle dask-style chunks (iterable of iterables)
129-
if all(isinstance(c, (tuple | list)) for c in chunks):
130-
# take first chunk size for each dimension
131-
chunks = tuple(
132-
c[0] for c in chunks
133-
) # TODO: check/error/warn for irregular chunks (e.g. if c[0] != c[1:-1])
129+
if all(isinstance(c, (tuple, list)) for c in chunks):
130+
for i, c in enumerate(chunks):
131+
if any(x != y for x, y in itertools.pairwise(c[:-1])) or (len(c) > 1 and c[-1] > c[0]):
132+
raise ValueError(
133+
f"Irregular chunk sizes in dimension {i}: {tuple(c)}. "
134+
"Only uniform chunks (with an optional smaller final chunk) are supported."
135+
)
136+
chunks = tuple(c[0] for c in chunks)
134137

135138
# handle bad dimensionality
136139
if len(chunks) > len(shape):

src/zarr/storage/_common.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,13 @@
55
from pathlib import Path
66
from typing import TYPE_CHECKING, Any, Literal, Self, TypeAlias
77

8-
from zarr.abc.store import ByteRequest, Store
8+
from zarr.abc.store import (
9+
ByteRequest,
10+
Store,
11+
SupportsDeleteSync,
12+
SupportsGetSync,
13+
SupportsSetSync,
14+
)
915
from zarr.core.buffer import Buffer, default_buffer_prototype
1016
from zarr.core.common import (
1117
ANY_ACCESS_MODE,
@@ -228,6 +234,37 @@ async def is_empty(self) -> bool:
228234
"""
229235
return await self.store.is_empty(self.path)
230236

237+
# -------------------------------------------------------------------
238+
# Synchronous IO delegation
239+
# -------------------------------------------------------------------
240+
241+
def get_sync(
242+
self,
243+
*,
244+
prototype: BufferPrototype | None = None,
245+
byte_range: ByteRequest | None = None,
246+
) -> Buffer | None:
247+
"""Synchronous read — delegates to ``self.store.get_sync(self.path, ...)``."""
248+
if not isinstance(self.store, SupportsGetSync):
249+
raise TypeError(f"Store {type(self.store).__name__} does not support synchronous get.")
250+
if prototype is None:
251+
prototype = default_buffer_prototype()
252+
return self.store.get_sync(self.path, prototype=prototype, byte_range=byte_range)
253+
254+
def set_sync(self, value: Buffer) -> None:
255+
"""Synchronous write — delegates to ``self.store.set_sync(self.path, value)``."""
256+
if not isinstance(self.store, SupportsSetSync):
257+
raise TypeError(f"Store {type(self.store).__name__} does not support synchronous set.")
258+
self.store.set_sync(self.path, value)
259+
260+
def delete_sync(self) -> None:
261+
"""Synchronous delete — delegates to ``self.store.delete_sync(self.path)``."""
262+
if not isinstance(self.store, SupportsDeleteSync):
263+
raise TypeError(
264+
f"Store {type(self.store).__name__} does not support synchronous delete."
265+
)
266+
self.store.delete_sync(self.path)
267+
231268
def __truediv__(self, other: str) -> StorePath:
232269
"""Combine this store path with another path"""
233270
return self.__class__(self.store, _dereference_path(self.path, other))

src/zarr/storage/_local.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,56 @@ def __repr__(self) -> str:
187187
def __eq__(self, other: object) -> bool:
188188
return isinstance(other, type(self)) and self.root == other.root
189189

190+
# -------------------------------------------------------------------
191+
# Synchronous store methods
192+
# -------------------------------------------------------------------
193+
194+
def _ensure_open_sync(self) -> None:
195+
if not self._is_open:
196+
if not self.read_only:
197+
self.root.mkdir(parents=True, exist_ok=True)
198+
if not self.root.exists():
199+
raise FileNotFoundError(f"{self.root} does not exist")
200+
self._is_open = True
201+
202+
def get_sync(
203+
self,
204+
key: str,
205+
*,
206+
prototype: BufferPrototype | None = None,
207+
byte_range: ByteRequest | None = None,
208+
) -> Buffer | None:
209+
if prototype is None:
210+
prototype = default_buffer_prototype()
211+
self._ensure_open_sync()
212+
assert isinstance(key, str)
213+
path = self.root / key
214+
try:
215+
return _get(path, prototype, byte_range)
216+
except (FileNotFoundError, IsADirectoryError, NotADirectoryError):
217+
return None
218+
219+
def set_sync(self, key: str, value: Buffer) -> None:
220+
self._ensure_open_sync()
221+
self._check_writable()
222+
assert isinstance(key, str)
223+
if not isinstance(value, Buffer):
224+
raise TypeError(
225+
f"LocalStore.set(): `value` must be a Buffer instance. "
226+
f"Got an instance of {type(value)} instead."
227+
)
228+
path = self.root / key
229+
_put(path, value)
230+
231+
def delete_sync(self, key: str) -> None:
232+
self._ensure_open_sync()
233+
self._check_writable()
234+
path = self.root / key
235+
if path.is_dir():
236+
shutil.rmtree(path)
237+
else:
238+
path.unlink(missing_ok=True)
239+
190240
async def get(
191241
self,
192242
key: str,

src/zarr/storage/_memory.py

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,49 @@ def __eq__(self, other: object) -> bool:
7777
and self.read_only == other.read_only
7878
)
7979

80+
# -------------------------------------------------------------------
81+
# Synchronous store methods
82+
# -------------------------------------------------------------------
83+
84+
def get_sync(
85+
self,
86+
key: str,
87+
*,
88+
prototype: BufferPrototype | None = None,
89+
byte_range: ByteRequest | None = None,
90+
) -> Buffer | None:
91+
if prototype is None:
92+
prototype = default_buffer_prototype()
93+
if not self._is_open:
94+
self._is_open = True
95+
assert isinstance(key, str)
96+
try:
97+
value = self._store_dict[key]
98+
start, stop = _normalize_byte_range_index(value, byte_range)
99+
return prototype.buffer.from_buffer(value[start:stop])
100+
except KeyError:
101+
return None
102+
103+
def set_sync(self, key: str, value: Buffer) -> None:
104+
self._check_writable()
105+
if not self._is_open:
106+
self._is_open = True
107+
assert isinstance(key, str)
108+
if not isinstance(value, Buffer):
109+
raise TypeError(
110+
f"MemoryStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
111+
)
112+
self._store_dict[key] = value
113+
114+
def delete_sync(self, key: str) -> None:
115+
self._check_writable()
116+
if not self._is_open:
117+
self._is_open = True
118+
try:
119+
del self._store_dict[key]
120+
except KeyError:
121+
logger.debug("Key %s does not exist.", key)
122+
80123
async def get(
81124
self,
82125
key: str,
@@ -122,7 +165,6 @@ async def set(self, key: str, value: Buffer, byte_range: tuple[int, int] | None
122165
raise TypeError(
123166
f"MemoryStore.set(): `value` must be a Buffer instance. Got an instance of {type(value)} instead."
124167
)
125-
126168
if byte_range is not None:
127169
buf = self._store_dict[key]
128170
buf[byte_range[0] : byte_range[1]] = value

0 commit comments

Comments
 (0)