Skip to content

Commit 0ae8db9

Browse files
committed
Merge branch 'main' of github.com:zarr-developers/zarr-python into refactor/metadata-package
2 parents b7b055e + 8217046 commit 0ae8db9

26 files changed

Lines changed: 1382 additions & 141 deletions

changes/3679.feature.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Adds a new in-memory storage backend called `ManagedMemoryStore`. Instances of `ManagedMemoryStore`
2+
function similarly to `MemoryStore`, but instances of `ManagedMemoryStore` can be constructed from
3+
a URL like `memory://store`.

changes/3781.feature.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Added `Struct` class (subclass of `Structured`) implementing the zarr-extensions `struct` dtype spec. Uses object-style field format and dict fill values. Legacy `Structured` remains available for backward compatibility.

docs/user-guide/arrays.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,14 @@ np.random.seed(0)
1414

1515
```python exec="true" session="arrays" source="above" result="ansi"
1616
import zarr
17-
store = zarr.storage.MemoryStore()
18-
z = zarr.create_array(store=store, shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
17+
z = zarr.create_array(store="memory://arrays-demo", shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
1918
print(z)
2019
```
2120

2221
The code above creates a 2-dimensional array of 32-bit integers with 10000 rows
2322
and 10000 columns, divided into chunks where each chunk has 1000 rows and 1000
24-
columns (and so there will be 100 chunks in total). The data is written to a
25-
[`zarr.storage.MemoryStore`][] (e.g. an in-memory dict). See
23+
columns (and so there will be 100 chunks in total). The data is written to an
24+
in-memory store (see [`zarr.storage.MemoryStore`][] for more details). See
2625
[Persistent arrays](#persistent-arrays) for details on storing arrays in other stores,
2726
and see [Data types](data_types.md) for an in-depth look at the data types supported
2827
by Zarr.

docs/user-guide/attributes.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,32 @@
33
Zarr arrays and groups support custom key/value attributes, which can be useful for
44
storing application-specific metadata. For example:
55

6-
```python exec="true" session="arrays" source="above" result="ansi"
6+
```python exec="true" session="attributes" source="above" result="ansi"
77
import zarr
8-
store = zarr.storage.MemoryStore()
9-
root = zarr.create_group(store=store)
8+
root = zarr.create_group(store="memory://attributes-demo")
109
root.attrs['foo'] = 'bar'
1110
z = root.create_array(name='zzz', shape=(10000, 10000), dtype='int32')
1211
z.attrs['baz'] = 42
1312
z.attrs['qux'] = [1, 4, 7, 12]
1413
print(sorted(root.attrs))
1514
```
1615

17-
```python exec="true" session="arrays" source="above" result="ansi"
16+
```python exec="true" session="attributes" source="above" result="ansi"
1817
print('foo' in root.attrs)
1918
```
2019

21-
```python exec="true" session="arrays" source="above" result="ansi"
20+
```python exec="true" session="attributes" source="above" result="ansi"
2221
print(root.attrs['foo'])
2322
```
24-
```python exec="true" session="arrays" source="above" result="ansi"
23+
```python exec="true" session="attributes" source="above" result="ansi"
2524
print(sorted(z.attrs))
2625
```
2726

28-
```python exec="true" session="arrays" source="above" result="ansi"
27+
```python exec="true" session="attributes" source="above" result="ansi"
2928
print(z.attrs['baz'])
3029
```
3130

32-
```python exec="true" session="arrays" source="above" result="ansi"
31+
```python exec="true" session="attributes" source="above" result="ansi"
3332
print(z.attrs['qux'])
3433
```
3534

docs/user-guide/consolidated_metadata.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,7 @@ import zarr
2727
import warnings
2828

2929
warnings.filterwarnings("ignore", category=UserWarning)
30-
store = zarr.storage.MemoryStore()
31-
group = zarr.create_group(store=store)
30+
group = zarr.create_group(store="memory://consolidated-metadata-demo")
3231
print(group)
3332
array = group.create_array(shape=(1,), name='a', dtype='float64')
3433
print(array)
@@ -45,7 +44,7 @@ print(array)
4544
```
4645

4746
```python exec="true" session="consolidated_metadata" source="above" result="ansi"
48-
result = zarr.consolidate_metadata(store)
47+
result = zarr.consolidate_metadata("memory://consolidated-metadata-demo")
4948
print(result)
5049
```
5150

@@ -56,7 +55,7 @@ that can be used.:
5655
from pprint import pprint
5756
import io
5857

59-
consolidated = zarr.open_group(store=store)
58+
consolidated = zarr.open_group(store="memory://consolidated-metadata-demo")
6059
consolidated_metadata = consolidated.metadata.consolidated_metadata.metadata
6160

6261
# Note: pprint can be users without capturing the output regularly
@@ -76,7 +75,7 @@ With nested groups, the consolidated metadata is available on the children, recu
7675
```python exec="true" session="consolidated_metadata" source="above" result="ansi"
7776
child = group.create_group('child', attributes={'kind': 'child'})
7877
grandchild = child.create_group('child', attributes={'kind': 'grandchild'})
79-
consolidated = zarr.consolidate_metadata(store)
78+
consolidated = zarr.consolidate_metadata("memory://consolidated-metadata-demo")
8079

8180
output = io.StringIO()
8281
pprint(consolidated['child'].metadata.consolidated_metadata, stream=output, width=60)

docs/user-guide/data_types.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,37 @@ here, it's possible to create it yourself: see [Adding New Data Types](#adding-n
229229
#### Struct-like
230230
- [Structured][zarr.dtype.Structured]
231231

232+
!!! note "Zarr V3 Structured Data Types"
233+
234+
In Zarr V3, structured data types are specified using the `struct` extension defined in the
235+
[zarr-extensions repository](https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/struct).
236+
The JSON representation uses an object format for fields:
237+
238+
```json
239+
{
240+
"name": "struct",
241+
"configuration": {
242+
"fields": [
243+
{"name": "x", "data_type": "float32"},
244+
{"name": "y", "data_type": "int64"}
245+
]
246+
}
247+
}
248+
```
249+
250+
For backward compatibility, Zarr Python also accepts the legacy `structured` name with
251+
tuple-format fields when reading existing data.
252+
253+
Fill values for structured types are represented as JSON objects mapping field names to values:
254+
255+
```json
256+
{"x": 1.5, "y": 42}
257+
```
258+
259+
When using structured types with multi-byte fields, the `bytes` codec must specify an
260+
explicit `endian` parameter. If omitted, Zarr Python assumes little-endian for legacy
261+
compatibility but emits a warning.
262+
232263
### Example Usage
233264

234265
This section will demonstrates the basic usage of Zarr data types.

docs/user-guide/gpu.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,8 @@ buffers used internally by Zarr via `enable_gpu()`.
2020
import zarr
2121
import cupy as cp
2222
zarr.config.enable_gpu()
23-
store = zarr.storage.MemoryStore()
2423
z = zarr.create_array(
25-
store=store, shape=(100, 100), chunks=(10, 10), dtype="float32",
24+
store="memory://gpu-demo", shape=(100, 100), chunks=(10, 10), dtype="float32",
2625
)
2726
type(z[:10, :10])
2827
# cupy.ndarray

docs/user-guide/groups.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@ To create a group, use the [`zarr.group`][] function:
88

99
```python exec="true" session="groups" source="above" result="ansi"
1010
import zarr
11-
store = zarr.storage.MemoryStore()
12-
root = zarr.create_group(store=store)
11+
root = zarr.create_group(store="memory://groups-demo")
1312
print(root)
1413
```
1514

@@ -105,8 +104,7 @@ Diagnostic information about arrays and groups is available via the `info`
105104
property. E.g.:
106105

107106
```python exec="true" session="groups" source="above" result="ansi"
108-
store = zarr.storage.MemoryStore()
109-
root = zarr.group(store=store)
107+
root = zarr.group(store="memory://diagnostics-demo")
110108
foo = root.create_group('foo')
111109
bar = foo.create_array(name='bar', shape=1000000, chunks=100000, dtype='int64')
112110
bar[:] = 42

src/zarr/codecs/bytes.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
import sys
4+
import warnings
45
from dataclasses import dataclass, replace
56
from enum import Enum
67
from typing import TYPE_CHECKING
@@ -9,6 +10,7 @@
910
from zarr.core.buffer import Buffer, NDBuffer
1011
from zarr.core.common import JSON, parse_enum, parse_named_configuration
1112
from zarr.core.dtype.common import HasEndianness
13+
from zarr.core.dtype.npy.structured import Struct
1214

1315
if TYPE_CHECKING:
1416
from typing import Self
@@ -56,7 +58,20 @@ def to_dict(self) -> dict[str, JSON]:
5658
return {"name": "bytes", "configuration": {"endian": self.endian.value}}
5759

5860
def evolve_from_array_spec(self, array_spec: ArraySpec) -> Self:
59-
if not isinstance(array_spec.dtype, HasEndianness):
61+
if isinstance(array_spec.dtype, Struct):
62+
if array_spec.dtype.has_multi_byte_fields():
63+
if self.endian is None:
64+
warnings.warn(
65+
"Missing 'endian' for structured dtype with multi-byte fields. "
66+
"Assuming little-endian for legacy compatibility.",
67+
UserWarning,
68+
stacklevel=2,
69+
)
70+
return replace(self, endian=Endian.little)
71+
else:
72+
if self.endian is not None:
73+
return replace(self, endian=None)
74+
elif not isinstance(array_spec.dtype, HasEndianness):
6075
if self.endian is not None:
6176
return replace(self, endian=None)
6277
elif self.endian is None:

src/zarr/core/array.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
)
7070
from zarr.core.config import config as zarr_config
7171
from zarr.core.dtype import (
72+
Structured,
7273
VariableLengthBytes,
7374
VariableLengthUTF8,
7475
ZDType,
@@ -4879,10 +4880,13 @@ def default_serializer_v3(dtype: ZDType[Any, Any]) -> ArrayBytesCodec:
48794880
length strings and variable length bytes have hard-coded serializers -- ``VLenUTF8Codec`` and
48804881
``VLenBytesCodec``, respectively.
48814882
4883+
Structured data types with multi-byte fields use ``BytesCodec`` with little-endian encoding.
48824884
"""
48834885
serializer: ArrayBytesCodec = BytesCodec(endian=None)
48844886

4885-
if isinstance(dtype, HasEndianness):
4887+
if isinstance(dtype, HasEndianness) or (
4888+
isinstance(dtype, Structured) and dtype.has_multi_byte_fields()
4889+
):
48864890
serializer = BytesCodec(endian="little")
48874891
elif isinstance(dtype, HasObjectCodec):
48884892
if dtype.object_codec_id == "vlen-bytes":

0 commit comments

Comments
 (0)