Skip to content

Commit f8c0c5d

Browse files
d-v-bclaude
andauthored
feat:metadata package (#3919)
* feat(metadata): scaffold zarr-metadata package structure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(metadata): depend on zarr-metadata via local uv workspace source Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add JSON, NamedConfig, NamedRequiredConfig primitives Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v3 array metadata types Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v3 consolidated metadata type Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v3 group metadata type Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): wire up zarr_metadata.v3 re-exports Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add faithful v2 array metadata types Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v2 group metadata type Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v2 consolidated metadata type (canonical impl, not spec) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): wire up zarr_metadata.v2 re-exports Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add ArrayMetadata, GroupMetadata version-polymorphic unions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add Codec envelope and blosc codec configurations Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add dtype types (DType, LengthBytesConfig, FixedLengthBytesConfig, TimeConfig) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(metadata): smoke + structural tests for the package Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(common): re-export JSON, NamedConfig, NamedRequiredConfig from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): re-export v3 types from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): re-export faithful v2 array metadata type Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(codecs): re-export blosc codec configurations from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(abc): re-export CodecJSON from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(dtype): re-export DTypeJSON from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(dtype): re-export LengthBytesConfig from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(dtype): re-export FixedLengthBytesConfig from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(dtype): re-export TimeConfig from zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): use tuple[int, ...] for fixed-length fields + typed NumcodecsConfig Spec-defined metadata fields with fixed length and no mutation semantics are typed as tuples, not Sequence. Applies to: - v2 ArrayMetadataV2.shape, .chunks - v2 DataTypeV2Structured.shape - v2 ArrayMetadataV2.filters (tuple of codec configs) - v3 RegularChunkGridConfig.chunk_shape - v3 RectilinearChunkGridConfig.chunk_shapes Adds zarr_metadata.v2.codec.NumcodecsConfig, a TypedDict modeling the v2 spec shape for compressors and filters: a required 'id' field plus arbitrary codec-specific extras. ArrayMetadataV2.compressor and .filters now reference this type instead of an untyped Mapping[str, JSON]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): fix explicit re-exports and complete DateTimeUnit migration Three fixes: 1. Add missing "μs" unit to zarr_metadata.dtype.time.DateTimeUnit so it matches zarr-python's DateTimeUnit. zarr.core.dtype.npy.common.DateTimeUnit now re-exports from zarr-metadata (downstream consumers like zarr.core.dtype.npy.time pick it up transitively). 2. Replace `from X import Y as LegacyName` with `from X import Y` followed by a module-level `LegacyName: TypeAlias = Y` binding. mypy under `strict = true` rejected the renamed-import form under the explicit- re-export check ("Module 'X' does not explicitly export attribute 'Y'"), affecting 13 call sites across the codebase. The TypeAlias form makes the alias a proper type (mypy uses it in annotations) while preserving runtime introspection (`.__annotations__` access on the aliased TypedDict). Affects: - src/zarr/core/dtype/common.py (DTypeJSON) - src/zarr/core/metadata/v2.py (ArrayV2MetadataDict) - src/zarr/core/metadata/v3.py (ArrayMetadataJSON_V3 + 5 others) 3. noqa: UP040 on the TypeAlias bindings. ruff prefers the `type` keyword (PEP 695), but that wraps the alias in a TypeAliasType which breaks `.__annotations__` lookup used by tests. The 12 remaining "unused type: ignore" mypy errors in v3.py are pre-existing (same count on the pre-refactor state) and unrelated to this work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): extract primitives to common.py to break import cycle Moves JSON, NamedConfig, NamedRequiredConfig out of zarr_metadata/__init__.py into zarr_metadata/common.py. Submodules (v2/*, v3/*) now import from zarr_metadata.common directly, avoiding the circular import that occurred when v2.codec was loaded during __init__.py execution. Also: - v3.array declares RegularChunkGrid/RectilinearChunkGrid as direct TypedDict classes instead of NamedRequiredConfig aliases, simplifying the types and enabling more precise chunk-grid annotations downstream. - v2.consolidated.ConsolidatedMetadataV2.metadata value type widened to GroupMetadataV2 | ArrayMetadataV2 | JSON. - Added spec links to v2/{array,codec} docstrings. zarr_metadata/__init__.py continues to re-export JSON, NamedConfig, NamedRequiredConfig at the top level so zarr.core.common keeps resolving. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(metadata): address review findings Three issues surfaced by final code review: 1. Add py.typed marker to zarr-metadata. Without it, PEP 561 makes type checkers treat zarr-metadata as untyped, cascading into ~44 spurious mypy errors in zarr (subclassing Any, unused type: ignore, etc). 2. RegularChunkGrid.configuration was accidentally typed NotRequired when converted from NamedRequiredConfig to a direct TypedDict class. Per spec, chunk_shape is mandatory. Make configuration required. 3. RectilinearDimSpec was declared as tuples but zarr's compress_rle returned lists, and the to_dict producer built lists. Align producers with the declared type: compress_rle now returns list[int | tuple[int, int]], expand_rle accepts both list and tuple RLE pairs, to_dict builds tuples. The tuple shape is correct per spec: each RLE pair is a JSON array of exactly two elements (size, count) — a fixed-cardinality structure that tuple models more faithfully than a mutable list. Mypy error count now matches main (32) with these fixes in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): remove consolidated_metadata from GroupMetadataV3 consolidated_metadata is not in the core Zarr v3 spec as a field on group metadata. It has an (unmerged) extension spec and is implemented by zarr-python, but keeping it out of GroupMetadataV3 is the spec-faithful move. The extra_items=AllowedExtraField on GroupMetadataV3 already permits it to appear at runtime as an extension. ConsolidatedMetadataV3 remains available at zarr_metadata.v3.consolidated for consumers that want to type the extension shape. Also fix two stray lint issues (missing trailing newline in common.py, unused Mapping import in v2/array.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(metadata): don't track zarr-metadata's uv.lock zarr-metadata is a library, not an application — its lockfile pins transitive dev versions that shouldn't be fixed in source. Untrack and gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v3 codec types for bytes, crc32c, gzip, zstd, transpose, sharding Adds per-codec TypedDict configurations + name literals + full envelope types for every core v3 codec besides blosc (which is extended in the same style for consistency): - {Codec}CodecName : Literal["<name>"] — the spec "name" value - {Codec}CodecConfiguration : TypedDict — the "configuration" body - {Codec}Codec : NamedRequiredConfig — the full envelope crc32c has no configuration fields, so Crc32cCodec uses NamedConfig (configuration optional) and no Configuration TypedDict is exported. The `V1` suffix is dropped from the Configuration types (except blosc, where V1 + Numcodecs disambiguate two concrete shapes). The other v3 codec specs aren't versioned at the codec level; there's only one shape per codec today, and an incompatible future change would land under a new codec name rather than a v2 of the same name. Also fixes pre-existing v2 test fixtures to include the now-required compressor/fill_value/order/filters fields on ArrayMetadataV2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): define codec envelope TypedDicts explicitly Each {Codec}Codec envelope type is now an explicit TypedDict class with `name` and `configuration` fields, rather than a NamedRequiredConfig[...] generic alias. Readable at the call site, surfaces the spec structure directly, and allows a real class-level docstring. Also: - Drop BloscCodecConfigurationNumcodecs from zarr-metadata. numcodecs- shape modeling belongs in zarr-python (which implements that shape), not in zarr-metadata (which is spec-only). - Rename BloscCodecConfigurationV1 to BloscCodecConfiguration, matching the unversioned naming used for the other codecs. - Restore BloscConfigV2 locally in zarr-python for the numcodecs shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add Final string constants for codec names and enum-valued fields Each codec now exports SCREAMING_CASE Final constants alongside the Literal types. Downstream packages can reference the spec-defined strings without retyping magic strings. Codec names: BLOSC_CODEC_NAME, BYTES_CODEC_NAME, CRC32C_CODEC_NAME, GZIP_CODEC_NAME, SHARDING_CODEC_NAME, TRANSPOSE_CODEC_NAME, ZSTD_CODEC_NAME. Enum-valued field values: - Blosc: BLOSC_SHUFFLE_{NOSHUFFLE,SHUFFLE,BITSHUFFLE}, BLOSC_CNAME_{LZ4,LZ4HC,BLOSCLZ,SNAPPY,ZLIB,ZSTD} - Bytes: BYTES_ENDIAN_{LITTLE,BIG} (also extracts the existing Literal into a new `Endian` alias) - Sharding: SHARDING_INDEX_LOCATION_{START,END} (and `IndexLocation` Literal alias) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(metadata): say "codec metadata" instead of "codec envelope" Rewords docstrings and test names throughout the package: the {Codec}Codec TypedDict describes a codec's JSON metadata, not a "named-config envelope." Less jargon, consistent with the package name. Identifier names are unchanged (still BloscCodec, GzipCodec, etc.). Also renames v3/array.py chunk-grid docstrings for consistency (Regular/Rectilinear ChunkGrid "metadata" rather than "named-config container"), and updates the README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(metadata): use single-backtick markdown code formatting Convert all double-backtick RST-style inline code in zarr-metadata docstrings to single-backtick markdown style. The package's documentation will be rendered by mkdocs, which expects markdown, so single backticks render correctly as inline code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(metadata): add v3 spec data type metadata Models the spec-defined v3 data types from zarr-specs core and zarr-extensions: * `dtype/primitive.py` (NEW) - Final constants and `PrimitiveDTypeName` Literal union for all 14 core v3 primitives (bool, int8..int64, uint8..uint64, float16..float64, complex64, complex128). * `dtype/bytes.py` - adds `BYTES_DTYPE_NAME` and `BytesDTypeName` for the variable-length `bytes` extension; adds `NullTerminatedBytes` envelope TypedDict for `null_terminated_bytes` (zarr-extensions). Retains `FixedLengthBytesConfig` (re-exported by zarr-python). * `dtype/string.py` - adds `STRING_DTYPE_NAME`/`StringDTypeName` for the `string` extension; adds `FixedLengthUtf32` envelope. Retains `LengthBytesConfig`. * `dtype/time.py` - adds `NumpyDatetime64` and `NumpyTimedelta64` envelopes plus name constants/literals. The shared `TimeConfig` body is preserved. * `dtype/struct.py` (NEW) - the `struct` extension type, with `StructField`, `StructConfig`, and `Struct` envelope. Fields hold recursive `DType` values, supporting nested structs. The `r<N>` raw-bytes type from the core spec is parameterised on bit count, not a single literal name, so it isn't given a TypedDict; consumers match it against the wider `DType` alias. Tests updated and extended for the new types and constants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): per-dtype modules with fill-value types and validators Restructure `zarr_metadata.dtype.*` so each spec data type lives in its own module, mirroring the per-codec layout in `zarr_metadata.codec.*` and the per-dtype directory layout in zarr-extensions. New per-type modules (one per spec data type): bool.py, int8/16/32/64.py, uint8/16/32/64.py, float16/32/64.py, complex64/128.py, bytes.py, string.py, numpy_datetime64.py, numpy_timedelta64.py, struct.py, raw.py Each module exports: - {DTYPE}_DTYPE_NAME (Final str) - {DType}DTypeName (Literal) - For envelope types: a {DType} TypedDict + a {DType}Configuration - {DType}FillValue alias for the JSON shape of `fill_value` Removed `null_terminated_bytes` and `fixed_length_utf32` from zarr-metadata: they are not in zarr-specs or zarr-extensions; they are zarr-python-specific. Their `LengthBytesConfig` and `FixedLengthBytesConfig` TypedDicts now live locally in zarr-python at src/zarr/core/dtype/npy/{string,bytes}.py. zarr.core.dtype.npy.common now imports `DateTimeUnit` from `zarr_metadata.dtype.numpy_datetime64`. zarr.core.dtype.npy.time imports `TimeConfig` (aliased from `NumpyDatetime64Configuration`). NewType + validating-constructor pattern for non-literal spec strings: - HexFloat{16,32,64} for the float hex-string fill values - Base64Bytes for the `bytes` base64 fill value - RawBytesDTypeName for the `r<N>` parameterised name These make spec-format constraints visible to the type system; the matching validating constructors (e.g. `hex_float32`) are the only runtime logic in the package and are minimal regex checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): per-grid and per-encoding modules for chunk_grid + chunk_key_encoding Move chunk-grid TypedDicts out of `v3/array.py` into per-type modules, mirroring the per-codec and per-dtype layouts: packages/zarr-metadata/src/zarr_metadata/v3/ ├── chunk_grid/ │ ├── __init__.py │ ├── regular.py # core spec │ └── rectilinear.py # zarr-extensions └── chunk_key_encoding/ ├── __init__.py # ChunkKeySeparator alias ├── default.py # core spec └── v2.py # core spec Each module exports: - {NAME}_NAME (Final str) - {Name} (TypedDict envelope) - {Name}Configuration (TypedDict body) - {Name}Name (Literal type of the `name` field) `v3/array.py` shrinks to just `AllowedExtraField`, `MetadataField`, and `ArrayMetadataV3`. `chunk_grid` and `chunk_key_encoding` fields stay typed as `MetadataField` (str | NamedConfig) -- narrowing them to a specific union belongs in a future validation layer, not in the spec-faithful types layer. Configuration TypedDicts renamed from `*Config` to `*Configuration` to match the dtype/codec naming. zarr.core.metadata.v3 re-exports preserve the legacy `*Config` aliases via `as` imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): move codec/ and dtype/ under v3/ Both directories model v3-spec artifacts, so they belong under the v3/ subpackage alongside v3/array, v3/group, v3/consolidated, v3/chunk_grid, and v3/chunk_key_encoding. The principle is now: anything imported from `zarr_metadata.v3.X` is a v3-spec artifact; anything from `zarr_metadata.v2.X` is a v2-spec artifact; only true cross-version primitives sit at the top level (`zarr_metadata.JSON`, `NamedConfig`, `NamedRequiredConfig`, and the `ArrayMetadata`/`GroupMetadata` unions). Path moves: zarr_metadata.codec.* -> zarr_metadata.v3.codec.* zarr_metadata.dtype.* -> zarr_metadata.v3.dtype.* Internal imports inside the moved modules and zarr-python re-export sites updated accordingly. zarr.abc.codec imports the zarr-metadata Codec alias with a private name to avoid colliding with its own runtime `Codec` union (`ArrayArrayCodec | ArrayBytesCodec | BytesBytesCodec`), then re-exports as `CodecJSON`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(metadata): rename v3/dtype/ -> v3/data_type/ Matches the v3 spec field name `data_type` exactly. All imports inside the package and in zarr-python re-export sites updated accordingly. The `DType` type alias keeps its short name (it's the widely understood abbreviation for "data type JSON shape"); only the module path changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: add zarr-metadata package Adds packages/zarr-metadata, a sibling PyPI package that contains spec-defined Zarr v2 and v3 metadata types as pure-typing artifacts (TypedDicts, type aliases, Final string constants, NewType validators). No runtime logic beyond minimal regex validators for spec-format-locked strings (hex floats, base64 bytes, raw-bytes name). Layout (anything imported from `zarr_metadata.v3.X` is a v3-spec artifact; from `zarr_metadata.v2.X` is v2-spec; only true cross-version primitives sit at the top level): zarr_metadata/ ├── common.py # JSON, NamedConfig, NamedRequiredConfig ├── __init__.py # ArrayMetadata, GroupMetadata unions ├── v2/ │ ├── array.py, group.py, consolidated.py, codec.py └── v3/ ├── array.py, group.py, consolidated.py ├── chunk_grid/ {regular, rectilinear} ├── chunk_key_encoding/ {default, v2} ├── codec/ {blosc, bytes, crc32c, gzip, │ sharding, transpose, zstd} └── data_type/ {bool, int8/16/32/64, uint8/16/32/64, float16/32/64, complex64/128, bytes, string, numpy_datetime64, numpy_timedelta64, struct, raw} zarr-python source is unchanged in this branch. zarr-metadata is shipped as an independent package; a follow-up PR will adopt it inside zarr-python once the package is published to PyPI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(metadata): drop tests that don't actually test anything The structural tests constructed dict literals annotated with TypedDict types and asserted that a key we just inserted came back out -- which exercises Python's dict, not the type. The TypedDict has no runtime shape check, so even a type-incompatible dict would pass. Pyright (in CI) is what actually verifies the shapes. Constant-equality tests (e.g. `assert NAME == "name"`) also tested nothing. Kept the validating-constructor tests for hex_float{16,32,64}, base64_bytes, and raw_bytes_dtype_name -- those exercise real regex logic and catch real bugs. Replaced test_imports.py with a single smoke test that confirms the package loads and the top-level union types are reachable. Renamed test_structural.py -> test_validators.py to match what it actually tests. Test count: 47 -> 6, ~750 lines -> 89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(metadata): lower minimum Python to 3.11 zarr-metadata is a typing-only foundational package consumed by libraries; supporting one Python version below the current minimum widens the audience at minimal cost. The only blocker was PEP 695 generic class syntax in `common.py`: class NamedConfig[TName: str, TConfig: ...](TypedDict): Rewritten to the PEP 484 `Generic[T]` form, which works on 3.11+. The two affected classes carry `# noqa: UP046` comments since the ruff rule pushes toward the newer syntax that we deliberately avoid. All other modern features (PEP 604 `X | Y` unions at runtime, `NotRequired`, `extra_items=` via typing_extensions, `ReadOnly`) already work on 3.11. Verified: package imports and all 6 tests pass on Python 3.11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: remove generic base metadata * refactor: clean up codecs init * fix: correct v2 structured dtype spec * refactor: drop readonly for numcodecs config * docs: improve docstring * fix: use empty typeddict for crc32c config * fix: remove arbitrary json from consolidated model * fix: don't depend on zarr-metadata yet * fix: typesize is not required * fix: re-wire zarr-metadata up as a dependency for zarr-python * chore: revert changes to src/zarr * chore: mypy ignore the new package * refactor: rename extra field * refactor: we do a little refactoring * test: more dtype tests * chore: add ci * chore: use typing extensions typeddict * fix: unbreak ci * chore: clean up top level exports * chore: clarify extension fields and rename type * allow must_understand: true, and add canonical nan strings --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3ede5e8 commit f8c0c5d

209 files changed

Lines changed: 8297 additions & 1 deletion

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
name: zarr-metadata
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths:
7+
- 'packages/zarr-metadata/**'
8+
- '.github/workflows/zarr-metadata.yml'
9+
pull_request:
10+
paths:
11+
- 'packages/zarr-metadata/**'
12+
- '.github/workflows/zarr-metadata.yml'
13+
workflow_dispatch:
14+
15+
permissions:
16+
contents: read
17+
18+
concurrency:
19+
group: ${{ github.workflow }}-${{ github.ref }}
20+
cancel-in-progress: true
21+
22+
jobs:
23+
test:
24+
name: pytest py=${{ matrix.python-version }}
25+
runs-on: ubuntu-latest
26+
defaults:
27+
run:
28+
shell: bash
29+
working-directory: packages/zarr-metadata
30+
strategy:
31+
fail-fast: false
32+
matrix:
33+
python-version: ['3.11', '3.12', '3.13', '3.14']
34+
steps:
35+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
36+
with:
37+
persist-credentials: false
38+
- name: Install uv
39+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
40+
with:
41+
enable-cache: true
42+
- name: Set up Python ${{ matrix.python-version }}
43+
run: uv python install ${{ matrix.python-version }}
44+
- name: Sync test dependency group
45+
run: uv sync --group test --python ${{ matrix.python-version }}
46+
- name: Run pytest
47+
run: uv run --group test pytest tests
48+
49+
ruff:
50+
name: ruff
51+
runs-on: ubuntu-latest
52+
defaults:
53+
run:
54+
shell: bash
55+
working-directory: packages/zarr-metadata
56+
steps:
57+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
58+
with:
59+
persist-credentials: false
60+
- name: Install uv
61+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
62+
- name: Run ruff
63+
run: uvx ruff check .
64+
65+
pyright:
66+
name: pyright
67+
runs-on: ubuntu-latest
68+
defaults:
69+
run:
70+
shell: bash
71+
working-directory: packages/zarr-metadata
72+
steps:
73+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
74+
with:
75+
persist-credentials: false
76+
- name: Install uv
77+
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
78+
with:
79+
enable-cache: true
80+
- name: Set up Python
81+
run: uv python install 3.11
82+
- name: Sync test dependency group
83+
run: uv sync --group test --python 3.11
84+
- name: Run pyright
85+
run: uv run --group test --with pyright pyright src
86+
87+
zarr-metadata-complete:
88+
name: zarr-metadata complete
89+
needs: [test, ruff, pyright]
90+
if: always()
91+
runs-on: ubuntu-latest
92+
steps:
93+
- name: Check failure
94+
if: |
95+
contains(needs.*.result, 'failure') ||
96+
contains(needs.*.result, 'cancelled')
97+
run: exit 1
98+
- name: Success
99+
run: echo Success!

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,6 @@ tests/.hypothesis
9191

9292
zarr/version.py
9393
zarr.egg-info/
94+
95+
# zarr-metadata package lockfile (a library, not an app)
96+
packages/zarr-metadata/uv.lock

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ repos:
3131
rev: v1.19.1
3232
hooks:
3333
- id: mypy
34-
files: src|tests
34+
files: ^(src|tests)/
3535
additional_dependencies:
3636
# Package dependencies
3737
- packaging

packages/zarr-metadata/LICENSE.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2015-2025 Zarr Developers <https://github.com/zarr-developers>
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

packages/zarr-metadata/README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# zarr-metadata
2+
3+
Python type definitions for Zarr v2 and v3 metadata.
4+
5+
## What this is
6+
7+
A typed-data package: `TypedDict` definitions and `Literal` aliases for the
8+
JSON shapes specified by the [Zarr v2](https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html)
9+
and [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html)
10+
specifications, plus types for [`zarr-extensions`](https://github.com/zarr-developers/zarr-extensions/)
11+
and a few widely-used-but-unspecified entities (e.g. consolidated metadata).
12+
13+
## What this is for
14+
15+
These types describe the JSON shape of Zarr metadata. They are
16+
intended for libraries that **read, write, validate, or transform**
17+
Zarr metadata. Pair them with a runtime validator like
18+
[pydantic](https://docs.pydantic.dev/) to check JSON loaded from disk:
19+
20+
```python
21+
import json
22+
from pydantic import TypeAdapter
23+
from zarr_metadata.v3.array import ArrayMetadataV3
24+
25+
with open("zarr.json", "rb") as f:
26+
raw = json.load(f)
27+
28+
metadata = TypeAdapter(ArrayMetadataV3).validate_python(raw)
29+
```
30+
31+
## What this is *not*
32+
33+
- Not a parser or builder. There are no `make_array_metadata(...)` factories —
34+
that surface belongs to consumer libraries.
35+
- Not a runtime validator on its own. Pair with `pydantic`, `msgspec`, or
36+
similar to enforce shapes at decode time.
37+
38+
Even with a runtime validator, these types only describe **structural**
39+
shape — they will not flag *semantically* invalid metadata, like a 3D v3
40+
array whose `dimension_names` has 4 entries instead of 3. That's a job
41+
for downstream validator routines.
42+
43+
## Scope
44+
45+
At minimum, this library supports what Zarr-Python needs: the complete
46+
Zarr v2 and v3 specs, consolidated metadata, and a subset of the metadata
47+
defined in `zarr-extensions`. We are generally open to contributions that
48+
add types for Zarr metadata with a published spec.
49+
50+
## License
51+
52+
[MIT](./LICENSE.txt)
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
[build-system]
2+
requires = ["hatchling>=1.29.0"]
3+
build-backend = "hatchling.build"
4+
5+
[project]
6+
name = "zarr-metadata"
7+
version = "0.1.0"
8+
description = "Spec-defined metadata types for Zarr v2 and v3."
9+
readme = "README.md"
10+
requires-python = ">=3.11"
11+
license = "MIT"
12+
license-files = ["LICENSE.txt"]
13+
authors = [
14+
{ name = "Davis Bennett", email = "davis.v.bennett@gmail.com" },
15+
]
16+
classifiers = [
17+
"Development Status :: 4 - Beta",
18+
"Intended Audience :: Developers",
19+
"License :: OSI Approved :: MIT License",
20+
"Programming Language :: Python",
21+
"Programming Language :: Python :: 3",
22+
"Programming Language :: Python :: 3.11",
23+
"Programming Language :: Python :: 3.12",
24+
"Programming Language :: Python :: 3.13",
25+
"Programming Language :: Python :: 3.14",
26+
"Typing :: Typed",
27+
]
28+
dependencies = [
29+
"typing_extensions>=4.13",
30+
]
31+
32+
[dependency-groups]
33+
test = ["pytest", "pydantic>=2"]
34+
35+
[tool.hatch.build.targets.wheel]
36+
packages = ["src/zarr_metadata"]
37+
38+
[tool.ruff]
39+
extend = "../../pyproject.toml"
40+
target-version = "py311"
41+
42+
[tool.pytest.ini_options]
43+
minversion = "7"
44+
testpaths = ["tests"]
45+
xfail_strict = true
46+
addopts = ["-ra", "--strict-config", "--strict-markers"]
47+
filterwarnings = [
48+
"error",
49+
# pydantic warns about ReadOnly TypedDict items not being enforced at runtime.
50+
# That's expected here — we rely on type-checker enforcement, not pydantic mutation guards.
51+
"ignore::UserWarning:pydantic._internal._generate_schema",
52+
]
53+
54+
[tool.numpydoc_validation]
55+
checks = [
56+
"GL10",
57+
"SS04",
58+
"PR02",
59+
"PR03",
60+
"PR05",
61+
"PR06",
62+
]
63+
64+
[tool.pyright]
65+
include = ["src"]
66+
enableExperimentalFeatures = true
67+
typeCheckingMode = "strict"
68+
pythonVersion = "3.11"
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
from zarr_metadata._common import NamedConfig
2+
from zarr_metadata.v2.array import (
3+
ArrayDimensionSeparatorV2,
4+
ArrayMetadataV2,
5+
ArrayOrderV2,
6+
DataTypeMetadataV2,
7+
)
8+
from zarr_metadata.v2.codec import CodecMetadataV2
9+
from zarr_metadata.v2.consolidated import ConsolidatedMetadataV2
10+
from zarr_metadata.v2.group import GroupMetadataV2
11+
from zarr_metadata.v3._common import MetadataFieldV3
12+
from zarr_metadata.v3.array import ArrayMetadataV3, ExtensionFieldV3
13+
from zarr_metadata.v3.consolidated import ConsolidatedMetadataV3
14+
from zarr_metadata.v3.group import GroupMetadataV3
15+
16+
__version__ = "0.1.0"
17+
"""Hardcoded package version. Must match the `version` field in
18+
`pyproject.toml`; the sync is enforced by `tests/test_version.py`."""
19+
20+
21+
__all__ = [
22+
"ArrayDimensionSeparatorV2",
23+
"ArrayMetadataV2",
24+
"ArrayMetadataV3",
25+
"ArrayOrderV2",
26+
"CodecMetadataV2",
27+
"ConsolidatedMetadataV2",
28+
"ConsolidatedMetadataV3",
29+
"DataTypeMetadataV2",
30+
"ExtensionFieldV3",
31+
"GroupMetadataV2",
32+
"GroupMetadataV3",
33+
"MetadataFieldV3",
34+
"NamedConfig",
35+
"__version__",
36+
]
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
"""
2+
Top-level cross-version primitives for Zarr metadata.
3+
4+
Version-specific types live under `zarr_metadata.v2` and `zarr_metadata.v3`.
5+
Codec and dtype spec types live under `zarr_metadata.v3.codec` and
6+
`zarr_metadata.v3.data_type`.
7+
"""
8+
9+
from collections.abc import Mapping
10+
from typing import NotRequired
11+
12+
from typing_extensions import TypedDict
13+
14+
15+
class NamedConfig(TypedDict):
16+
"""
17+
Externally-tagged union member for a metadata field.
18+
19+
The `configuration` mapping holds arbitrary JSON-encodable values;
20+
it is typed as `Mapping[str, object]` because the type system cannot
21+
express or verify JSON-encodability.
22+
"""
23+
24+
name: str
25+
configuration: NotRequired[Mapping[str, object]]

packages/zarr-metadata/src/zarr_metadata/py.typed

Whitespace-only changes.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
"""Zarr v2 metadata types."""
2+
3+
from zarr_metadata.v2.array import (
4+
ArrayDimensionSeparatorV2,
5+
ArrayMetadataV2,
6+
ArrayOrderV2,
7+
DataTypeMetadataV2,
8+
)
9+
from zarr_metadata.v2.codec import CodecMetadataV2
10+
from zarr_metadata.v2.consolidated import ConsolidatedMetadataV2
11+
from zarr_metadata.v2.group import GroupMetadataV2
12+
13+
__all__ = [
14+
"ArrayDimensionSeparatorV2",
15+
"ArrayMetadataV2",
16+
"ArrayOrderV2",
17+
"CodecMetadataV2",
18+
"ConsolidatedMetadataV2",
19+
"DataTypeMetadataV2",
20+
"GroupMetadataV2",
21+
]

0 commit comments

Comments
 (0)