Skip to content

Commit 8217046

Browse files
BrianMichelld-v-b
andauthored
Implement support for structured and struct zarr-extension defined dtypes (#3781)
* Implement official support for `structured` and `struct` dtypes according to new extension. * Update docs and add changelog * Implement `struct` subclass instead of modifying `structured` * Remove Structured from AnyDType to fix test_match_dtype_unique * Revert removal of linting ignore flag * Fix dtype support * Fix import sort order -- Linting * Resolve #3781 (comment) Only register `Struct` and allow it to appropriately pick up `structured` dtype * Linting --------- Co-authored-by: Davis Vann Bennett <davis.v.bennett@gmail.com>
1 parent be0a7b8 commit 8217046

10 files changed

Lines changed: 441 additions & 51 deletions

File tree

changes/3781.feature.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Added `Struct` class (subclass of `Structured`) implementing the zarr-extensions `struct` dtype spec. Uses object-style field format and dict fill values. Legacy `Structured` remains available for backward compatibility.

docs/user-guide/data_types.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,37 @@ here, it's possible to create it yourself: see [Adding New Data Types](#adding-n
229229
#### Struct-like
230230
- [Structured][zarr.dtype.Structured]
231231

232+
!!! note "Zarr V3 Structured Data Types"
233+
234+
In Zarr V3, structured data types are specified using the `struct` extension defined in the
235+
[zarr-extensions repository](https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/struct).
236+
The JSON representation uses an object format for fields:
237+
238+
```json
239+
{
240+
"name": "struct",
241+
"configuration": {
242+
"fields": [
243+
{"name": "x", "data_type": "float32"},
244+
{"name": "y", "data_type": "int64"}
245+
]
246+
}
247+
}
248+
```
249+
250+
For backward compatibility, Zarr Python also accepts the legacy `structured` name with
251+
tuple-format fields when reading existing data.
252+
253+
Fill values for structured types are represented as JSON objects mapping field names to values:
254+
255+
```json
256+
{"x": 1.5, "y": 42}
257+
```
258+
259+
When using structured types with multi-byte fields, the `bytes` codec must specify an
260+
explicit `endian` parameter. If omitted, Zarr Python assumes little-endian for legacy
261+
compatibility but emits a warning.
262+
232263
### Example Usage
233264

234265
This section will demonstrates the basic usage of Zarr data types.

src/zarr/codecs/bytes.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
import sys
4+
import warnings
45
from dataclasses import dataclass, replace
56
from enum import Enum
67
from typing import TYPE_CHECKING
@@ -9,6 +10,7 @@
910
from zarr.core.buffer import Buffer, NDBuffer
1011
from zarr.core.common import JSON, parse_enum, parse_named_configuration
1112
from zarr.core.dtype.common import HasEndianness
13+
from zarr.core.dtype.npy.structured import Struct
1214

1315
if TYPE_CHECKING:
1416
from typing import Self
@@ -56,7 +58,20 @@ def to_dict(self) -> dict[str, JSON]:
5658
return {"name": "bytes", "configuration": {"endian": self.endian.value}}
5759

5860
def evolve_from_array_spec(self, array_spec: ArraySpec) -> Self:
59-
if not isinstance(array_spec.dtype, HasEndianness):
61+
if isinstance(array_spec.dtype, Struct):
62+
if array_spec.dtype.has_multi_byte_fields():
63+
if self.endian is None:
64+
warnings.warn(
65+
"Missing 'endian' for structured dtype with multi-byte fields. "
66+
"Assuming little-endian for legacy compatibility.",
67+
UserWarning,
68+
stacklevel=2,
69+
)
70+
return replace(self, endian=Endian.little)
71+
else:
72+
if self.endian is not None:
73+
return replace(self, endian=None)
74+
elif not isinstance(array_spec.dtype, HasEndianness):
6075
if self.endian is not None:
6176
return replace(self, endian=None)
6277
elif self.endian is None:

src/zarr/core/array.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
)
7070
from zarr.core.config import config as zarr_config
7171
from zarr.core.dtype import (
72+
Structured,
7273
VariableLengthBytes,
7374
VariableLengthUTF8,
7475
ZDType,
@@ -4879,10 +4880,13 @@ def default_serializer_v3(dtype: ZDType[Any, Any]) -> ArrayBytesCodec:
48794880
length strings and variable length bytes have hard-coded serializers -- ``VLenUTF8Codec`` and
48804881
``VLenBytesCodec``, respectively.
48814882
4883+
Structured data types with multi-byte fields use ``BytesCodec`` with little-endian encoding.
48824884
"""
48834885
serializer: ArrayBytesCodec = BytesCodec(endian=None)
48844886

4885-
if isinstance(dtype, HasEndianness):
4887+
if isinstance(dtype, HasEndianness) or (
4888+
isinstance(dtype, Structured) and dtype.has_multi_byte_fields()
4889+
):
48864890
serializer = BytesCodec(endian="little")
48874891
elif isinstance(dtype, HasObjectCodec):
48884892
if dtype.object_codec_id == "vlen-bytes":

src/zarr/core/dtype/__init__.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,13 @@
2121
from zarr.core.dtype.npy.complex import Complex64, Complex128
2222
from zarr.core.dtype.npy.float import Float16, Float32, Float64
2323
from zarr.core.dtype.npy.int import Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64
24-
from zarr.core.dtype.npy.structured import Structured, StructuredJSON_V2, StructuredJSON_V3
24+
from zarr.core.dtype.npy.structured import (
25+
Struct,
26+
StructJSON_V3,
27+
Structured,
28+
StructuredJSON_V2,
29+
StructuredJSON_V3,
30+
)
2531
from zarr.core.dtype.npy.time import (
2632
DateTime64,
2733
DateTime64JSON_V2,
@@ -75,6 +81,8 @@
7581
"RawBytes",
7682
"RawBytesJSON_V2",
7783
"RawBytesJSON_V3",
84+
"Struct",
85+
"StructJSON_V3",
7886
"Structured",
7987
"StructuredJSON_V2",
8088
"StructuredJSON_V3",
@@ -124,7 +132,7 @@
124132
| ComplexFloatDType
125133
| StringDType
126134
| BytesDType
127-
| Structured
135+
| Struct
128136
| TimeDType
129137
| VariableLengthBytes
130138
)
@@ -137,7 +145,7 @@
137145
*COMPLEX_FLOAT_DTYPE,
138146
*STRING_DTYPE,
139147
*BYTES_DTYPE,
140-
Structured,
148+
Struct,
141149
*TIME_DTYPE,
142150
VariableLengthBytes,
143151
)

0 commit comments

Comments
 (0)