You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue collects the results of a Claude-assisted review of src/zarr/ (at 86dabd5) covering four dimensions: bugs, performance, simplifications, and modernizations. Each finding was verified against the code by the reviewing agent before inclusion. Draft PRs addressing groups of these findings reference this issue.
Note: This issue lives on the d-v-b fork so the findings and draft PRs can be triaged before anything is proposed upstream.
Not actioned in a PR: S7 (medium-effort overload design), S9 (blocked on create() deprecation), M6/M7 (ruff rule enablement — judgement calls), M8 (needs a decision on intent), and P6 (refuted, see below).
🐛 Bugs
Storage & codec pipeline
B1 — MemoryStore returns wrong bytes for SuffixByteRequest when suffix exceeds object size (src/zarr/storage/_utils.py:155, medium). _normalize_byte_range_index computes start = len(data) - suffix with no lower clamp; a negative start is interpreted by numpy slicing as an offset from the end, silently returning too few bytes. For a 5-byte value, suffix=7 returns 2 bytes while suffix=10 returns all 5 — non-monotonic, and inconsistent with ZipStore/LocalStore which clamp with max(0, ...). Fix: clamp start = max(0, len(data) - suffix).
B2 — LoggingStore.get_partial_values exhausts a generator argument before forwarding it (src/zarr/storage/_logging.py:182, medium). The log line ",".join([k[0] for k in key_ranges]) fully consumes a one-shot iterator, so the wrapped store receives an empty iterable and returns []. Fix: materialize key_ranges = list(key_ranges) once.
B3 — getsize_prefix over-counts on MemoryStore/ObjectStore due to substring prefix matching (src/zarr/abc/store.py:539 + src/zarr/storage/_memory.py:202, medium). Store.getsize_prefix calls list_prefix(prefix) without a trailing /, and the substring-matching stores then also count sibling nodes like foobar/... when asked about foo. Affects Array.nbytes_stored. Fix: normalize the prefix with a trailing / in getsize_prefix (matching delete_dir/is_empty/clear).
B4 — ZipStore.close() raises AttributeError if the store was never opened (src/zarr/storage/_zip.py:121, low). self._lock is only created in _sync_open(), but close() unconditionally takes it, so with ZipStore(path): pass raises. Fix: guard close() on self._is_open or create the lock in __init__.
B5 — codecs_from_list builds an error message for invalid BytesBytes-after-ArrayArray ordering but never raises it (src/zarr/core/codec_pipeline.py:675, low). Sibling branches raise TypeError(msg); this branch assigns msg and falls through, silently accepting a spec-invalid codec order. Fix: add the missing raise TypeError(msg).
Core
B6 — DefaultChunkKeyEncoding.decode_chunk_key does not round-trip with encode_chunk_key (src/zarr/core/chunk_key_encodings.py:79, low). encode_chunk_key((0, 1)) → "c/0/1", but decode_chunk_key strips only the "c" character, leaving a leading separator, so int('') raises ValueError for every non-empty coordinate key. Fix: drop the "c" token after splitting instead of slicing one character.
B7 — Attribute updates mutate frozen-dataclass metadata in place (src/zarr/core/array.py:6011, src/zarr/core/group.py:1219, low). _update_attributes mutates metadata.attributes in place even though the metadata classes are frozen and hash by to_dict(), so an instance's hash can change after construction. Fix: build a new attributes dict and use dataclasses.replace, as update_shape already does.
B8 — Attributes.put clears live metadata before persisting (non-atomic despite docstring) (src/zarr/core/attributes.py:53, low). put empties the in-memory attributes dict before the store write; if save_metadata raises, memory and store disagree. Fix: compute the replacement dict first and apply it with a replace-based update. (Shares root cause with B7.)
🚀 Performance
P1 — ShardingCodec rebuilds codec pipelines on every shard access (src/zarr/codecs/sharding.py:367, high impact). codec_pipeline is a plain @property calling get_pipeline_class().from_codecs(...) per shard, and _shard_index_size/_decode_shard_index/_encode_shard_index each rebuild a fresh index pipeline per shard. Fix: cache on the frozen codec instance (cached_property/instance-keyed lru_cache).
P2 — default_buffer_prototype() reconstructed per inner chunk inside shards (src/zarr/codecs/sharding.py:105, medium). A debug assert prototype == default_buffer_prototype() runs config + registry lookups for every inner-chunk fetch on the hot read path. Fix: hoist or drop the assertion.
P3 — Per-shard rebuild of the inner chunk grid + indexer, identical across shards (src/zarr/codecs/sharding.py:429,561, medium). Each shard decode/encode reconstructs ChunkGrid.from_sizes, a BasicIndexer, and its materialized projection list, though they're identical for every shard. Fix: memoize keyed by (shard_shape, chunk_shape).
P4 — Repeated donfig config.get("async.concurrency") lookups on the per-chunk codec path (src/zarr/core/codec_pipeline.py:398,508,574,607,626, src/zarr/abc/codec.py:252,289,496, medium-low). With the default batch_size=1, each chunk incurs ~4 donfig dict traversals that cannot change mid-operation. Fix: read once per read/write call and thread through.
P5 — Deep group traversal serializes across sibling subgroups (src/zarr/core/group.py:3322, medium). _iter_members_deep drains each subgroup iterator sequentially, so latency does not overlap across branches on high-latency stores. Fix: fan out sibling iterators concurrently under the existing semaphore.
P6 — StorePath.__truediv__ re-normalizes already-normalized paths per chunk — refuted during implementation: __truediv__ calls _join_paths (a plain "/".join), not normalize_path; the join is already cheap. No action.
🧹 Simplifications
S1 — Duplicate overwrite-check logic in _create_v2/_create_v3/_create_array_metadata (src/zarr/core/array.py:579,660,4404). Three copies of the overwrite/supports_deletes dance; extract one helper.
S2 — Module-level _get_orthogonal_selection/_get_mask_selection/_get_coordinate_selection are single-caller pass-through layers (src/zarr/core/array.py:5526-5695). ~110 net lines removable by inlining indexer construction into the AsyncArray.get_* methods (the sync Array already bypasses them).
S3 — Repeated zarr_format == 2 ? metadata.dtype : metadata.data_type inline dispatch (src/zarr/core/array.py:5393,5739,968). Add a unifying zdtype property on both metadata classes.
S4 — Array.cdata_shape and Array._chunk_grid_shape duplicate the same delegation (src/zarr/core/array.py:2162-2182).
S5 — "Not yet implemented" warning boilerplate repeated in open_group and create (src/zarr/api/asynchronous.py:816,1013). Compress to one helper iterating {name: value}.
S6 — V3 branches of contains_array/contains_group duplicate _contains_node_v3 (src/zarr/storage/_common.py:591-661). Rewrite as (await _contains_node_v3(sp)) == "array" / "group". Implementation note: only contains_group was rewritten — contains_array raises ValueError (rather than returning False) when the node is a group, so the rewrite would change observable behavior.
S7 — parse_zarr_format defined three times (metadata/v2.py:280, metadata/v3.py:49, group.py:86). Unify with overloads for literal narrowing (medium effort).
S8 — default_filters_v3 is a trivially inlinable no-op (src/zarr/core/array.py:4836). Body is return () with one caller.
S9 — AsyncArray._create/_create_v2/_create_v3 trio (~140 lines) exists solely for the deprecated create() path (src/zarr/core/array.py:368-696). Removable when the legacy API is removed — tracked here, not actionable yet.
S10 — *_like functions re-check isinstance after _like_args already did (src/zarr/api/asynchronous.py:139,1091-1313). Fold fill_value extraction into _like_args.
✨ Modernizations (Python ≥ 3.12)
M1 — Hand-rolled batched() duplicates itertools.batched (src/zarr/core/codec_pipeline.py:45-50). Delete and use the stdlib version (3.12+).
M2 — typing_extensions.Buffer → collections.abc.Buffer (src/zarr/codecs/crc32c_.py:8,45,69). Stdlib since 3.12.
M3 — Unused P = ParamSpec("P") (src/zarr/core/sync.py:22). Defined, never referenced; delete.
M4 — Hand-rolled product()/reduce(operator.mul, ...)/np.prod() on int tuples → math.prod() (src/zarr/core/common.py:85, src/zarr/core/indexing.py:1190, src/zarr/core/array.py:908, src/zarr/core/chunk_grids.py:855). math.prod avoids numpy scalar round-trips; keep product = math.prod shim for existing importers. Implementation note: chunk_grids.py:689,704,711 operate on a numpy float array (not int tuples) and numcodecs/_codecs.py:21 is inside a docstring — both excluded.
M5 — os.path.exists/os.path.getsize in the otherwise pathlib-first _local.py (src/zarr/storage/_local.py:366,372). Use Path.exists()/Path.stat().st_size and drop import os.
M6 — Enable ruff PT011 (pytest.raises without match=) (pyproject.toml:348, already a TODO; 19 test files affected). Medium effort, tests only.
M8 — ceildiv zero-guard is ambiguous (src/zarr/core/common.py:88). if a == 0: return 0 only matters for ceildiv(0, 0); clarify intent or tighten to a divisor check.
Reviewed, no action proposed
@dataclass(slots=True) sweep: blocked by cached_property usage on codec classes and downstream-subclassing compat on public classes.
Removing from __future__ import annotations (60+ files): needs a per-file runtime-introspection audit; not worth the risk now.
typing_extensions.ReadOnly and generic TypedDict: correctly pinned to typing_extensions until Python 3.13 is the minimum / extra_items lands in stdlib.
Refuted as bugs after investigation: boundary-chunk is_complete_chunk semantics, chunk-grid boundary math, v2/v3 NaN fill-value round-trips, sync.py double-checked locking, LocalStore atomic writes, sharding index offset correction, BytesCodec endianness, MemoryStore.set_if_not_exists atomicity.
Not actioned: S13/S14/S21 (catalogued, medium-risk design passes).
🐛 Bugs (round 2)
B9 — Nested structured dtypes fail v2 JSON round-trip (src/zarr/core/dtype/common.py:63-84, high). check_structured_dtype_v2_inner recurses into itself for a nested field whose last element is a list of field pairs, so match_json raises ValueError: No Zarr data type found for data that Struct.to_json(zarr_format=2) itself just wrote. v3 is fine. Fix: recurse with the list-of-fields checker when the inner element is a non-string sequence.
B10 — Structured dtypes lose field offsets / padding / alignment on round-trip (src/zarr/core/dtype/npy/structured.py:186-216, high). from_native_dtype keeps only (name, dtype) pairs; to_native_dtype re-packs contiguously, so np.dtype([...], align=True) (itemsize 16) comes back as itemsize 9 and stored chunk bytes are misinterpreted. Fix: preserve offsets/itemsize, or loudly reject non-packed layouts.
B11 — _iter_members_deep drops use_consolidated_for_children on recursion (src/zarr/core/group.py:3343, high). Only immediate children get fresh metadata; grandchildren keep stale consolidated blobs, which consolidate_metadata then bakes into the new consolidation. Empirically confirmed. Implementation found two further holes in the same flag's plumbing: AsyncGroup._members short-circuited to consolidated metadata even when the flag was False, and sync Group.members never forwarded the flag at all — all three fixed in Fix consolidated-metadata staleness in deep iteration, require_array dtype check, attribute-update semantics, and sync timeout cancellation #190.
B12 — Group.update_attributes_async overwrites attributes while every other update path merges (src/zarr/core/group.py:1965 vs :1219, medium). Two "update" entry points on the same object disagree about whether existing keys survive. Fix: align on merge semantics.
B13 — require_array with default dtype=None spuriously rejects existing non-float64 arrays (src/zarr/core/group.py:1195-1201, medium). np.dtype(None) → float64, then np.can_cast(complex128, float64) fails. Fix: skip the dtype check when the caller didn't specify one.
B14 — Deprecation warning points users at a removed config key (src/zarr/storage/__init__.py:35-37, low). The default_compressor warning recommends array.v2_default_compressor.numeric, which is itself removed/deprecated. Fix: update the message.
B15 — sync() timeout leaves the coroutine running (src/zarr/core/sync.py:151-153, low). On TimeoutError the future is never cancelled, so I/O continues after the caller believes it aborted. Fix: cancel the future on the timeout branch.
B16 — CPU/GPU NDBuffer.create(fill_value=None) divergence (src/zarr/core/buffer/cpu.py:157 vs gpu.py:179, low/latent). CPU zero-fills on None; GPU returns uninitialized memory. Unreachable on current pipelines but a parity trap. Fix: make the backends agree.
B17 — UInt64.from_native_dtype is defined twice in the same class (src/zarr/core/dtype/npy/int.py:1100 and :1514, latent). Second definition silently shadows the first. Will be eliminated by the S11 refactor.
🚀 Performance (round 2)
P7 — _NumcodecsArrayBytesCodec._decode_single forces a full-chunk to_bytes() copy per chunk — refuted during implementation: numcodecs PCodec.decode/ZFPY.decode call ensure_bytes(), which passes bytes through unchanged but copies ndarray input — so to_bytes() is already the optimal form, and the proposed change measured slower (1.07µs vs 0.75µs per 16KB chunk). No action.
P8 — BytesCodec._decode_sync rebuilds the native dtype via dataclasses.replace(...) per chunk (src/zarr/codecs/bytes.py:89-93, medium). ~1µs/chunk, identical result for every chunk of a read. Fix: cache keyed on (dtype, endian).
P9 — zarr.open() with zarr_format=None probes array metadata, then re-probes the group on the same path (src/zarr/api/asynchronous.py:360-374, medium on high-latency stores). Opening a group via open() fetches zarr.json/.zattrs twice. Fix: single combined probe dispatching to array vs group; needs care to preserve v2/v3 precedence and warning semantics.
P10 — BasicIndexer.__iter__ builds 4 tuples per chunk in separate passes (src/zarr/core/indexing.py:611-619, low-medium; ~1.75µs/chunk, ~70ms over 40k chunks). Folded into the S20 dedup (single-pass shared helper).
🧹 Simplifications (round 2)
S11 — Lift int dtype boilerplate into BaseInt (src/zarr/core/dtype/npy/int.py:240-1551, ~450-550 lines). BaseFloat/BaseComplex already centralize from_native_dtype/to_native_dtype/_from_json_v2/_from_json_v3/to_json; the 8 int classes copy-paste all five. Real variation: item_size, HasEndianness (absent on Int8/UInt8), 2 Windows _check_native_dtype overrides. Also fixes B17 and a eeeeeeeeeeeeeeeee docstring typo (~line 1424).
S12 — Group.create's 120-line docstring is a verbatim copy of Group.create_array's (src/zarr/core/group.py:2380-2522). Replace with a one-line alias docstring (~115 lines, docs-only).
S13 — Trivial async _decode_single/_encode_single wrappers liftable into BaseCodec (7 codecs delegate identically, 3 differ only by asyncio.to_thread; ~130-160 lines). Catalogued — medium risk on the hot async path; not actioned this round.
S14 — Duplicated from_dict across 10 codecs (~40-50 lines + 10 type: ignores). Catalogued — needs a _codec_name ClassVar design pass; not actioned this round.
S15 — Three byte-identical _parse_*_codec functions (src/zarr/registry.py:197-254, ~40 lines). Collapse to one generic helper.
S16 — Seven copies of the value-must-be-Buffer check across stores (_memory.py, _local.py, _fsspec.py, _zip.py, ~25 lines). One Store._check_value helper on the ABC; keep message text identical.
S17 — _get_node_v2/_get_node_v3 + _read_group_metadata_v2/_v3 needless per-format splits (src/zarr/core/group.py:3402-3551, ~45 lines). Fold each pair into its dispatcher.
S19 — Dead code: _with_semaphore, collect_aiterator, scalar_failed_type_check_msg (sync.py:192-226, dtype/wrapper.py:280-292, ~35 lines). Grep-verified zero callers in src and tests.
S20 — Identical ChunkProjection.__iter__ bodies in Basic/Block indexers (src/zarr/core/indexing.py:611-619, :1119-1127). Extract one shared single-pass helper — implementing P10 at the same time.
S21 — Duplicated JSON-based __eq__/__hash__ in v2/v3 metadata (~20 lines). Catalogued — touches equality on a shared dataclass base; not actioned this round.
S22 — Double-parse: _parse_deprecated_compressor applied in both Group.create_array and AsyncGroup.create_array (group.py:2646 + :1131). Drop the sync-side pre-parse.
M10 — Enable ruff ASYNC rules; fix 2 real blocking calls in async methods (pyproject.toml + src/zarr/storage/_local.py:366,372). LocalStore.move/getsize call os.path.exists/os.path.getsize without asyncio.to_thread, unlike every other I/O in the file. (Supersedes the pathlib-only change at the same lines in the round-1 modernization PR.)
M11 — Un-ignore ruff TC006 — corrects round-1 M7: only 12 violations (in _cli/cli.py, _wrapper.py, testing/stateful.py, testing/utils.py), all auto-fixable, not 93. TC006 wants casts quoted, the opposite of what round 1 assumed.
M12 — Apply ruff PT011 (add match= to bare pytest.raises(ValueError)) — corrects round-1 M6: 15 violations in 2 files (tests/test_store/test_core.py, tests/test_store/test_zip.py), not 19 files.
Reviewed, no action proposed (round 2)
Indexing deep-dive came back clean: orthogonal/coordinate/mask/block read+write paths, duplicate/unsorted indices, boundary chunks, resize/append, all_equal NaN/signed-zero handling (the -0.0 != 0.0 bit-pattern branch is intentional, PR Write chunks with negative zero values and a zero fill value zarr-developers/zarr-python#3216) — all verified against numpy empirically.
Byte-range semantics verified consistent across FsspecStore and ObjectStore (exclusive ends match zarr's RangeByteRequest); obstore delete_dir segment matching correct.
sync/async API twins: no default drift, no missing return wrapping (verified via inspect).
Float/datetime/complex/vlen fill-value round-trips all correct, including NaN payloads (v3), signed zeros, subnormals, NaT sentinels, embedded null bytes.
ruff FURB/PERF: already enabled and clean; GitHub Actions: current majors, SHA-pinned; numpy scalar aliases: none in use.
This issue collects the results of a Claude-assisted review of
src/zarr/(at 86dabd5) covering four dimensions: bugs, performance, simplifications, and modernizations. Each finding was verified against the code by the reviewing agent before inclusion. Draft PRs addressing groups of these findings reference this issue.Draft PRs
Not actioned in a PR: S7 (medium-effort overload design), S9 (blocked on
create()deprecation), M6/M7 (ruff rule enablement — judgement calls), M8 (needs a decision on intent), and P6 (refuted, see below).🐛 Bugs
Storage & codec pipeline
SuffixByteRequestwhen suffix exceeds object size (src/zarr/storage/_utils.py:155, medium)._normalize_byte_range_indexcomputesstart = len(data) - suffixwith no lower clamp; a negative start is interpreted by numpy slicing as an offset from the end, silently returning too few bytes. For a 5-byte value,suffix=7returns 2 bytes whilesuffix=10returns all 5 — non-monotonic, and inconsistent withZipStore/LocalStorewhich clamp withmax(0, ...). Fix: clampstart = max(0, len(data) - suffix).LoggingStore.get_partial_valuesexhausts a generator argument before forwarding it (src/zarr/storage/_logging.py:182, medium). The log line",".join([k[0] for k in key_ranges])fully consumes a one-shot iterator, so the wrapped store receives an empty iterable and returns[]. Fix: materializekey_ranges = list(key_ranges)once.getsize_prefixover-counts on MemoryStore/ObjectStore due to substring prefix matching (src/zarr/abc/store.py:539+src/zarr/storage/_memory.py:202, medium).Store.getsize_prefixcallslist_prefix(prefix)without a trailing/, and the substring-matching stores then also count sibling nodes likefoobar/...when asked aboutfoo. AffectsArray.nbytes_stored. Fix: normalize the prefix with a trailing/ingetsize_prefix(matchingdelete_dir/is_empty/clear).ZipStore.close()raisesAttributeErrorif the store was never opened (src/zarr/storage/_zip.py:121, low).self._lockis only created in_sync_open(), butclose()unconditionally takes it, sowith ZipStore(path): passraises. Fix: guardclose()onself._is_openor create the lock in__init__.codecs_from_listbuilds an error message for invalid BytesBytes-after-ArrayArray ordering but never raises it (src/zarr/core/codec_pipeline.py:675, low). Sibling branches raiseTypeError(msg); this branch assignsmsgand falls through, silently accepting a spec-invalid codec order. Fix: add the missingraise TypeError(msg).Core
DefaultChunkKeyEncoding.decode_chunk_keydoes not round-trip withencode_chunk_key(src/zarr/core/chunk_key_encodings.py:79, low).encode_chunk_key((0, 1))→"c/0/1", butdecode_chunk_keystrips only the"c"character, leaving a leading separator, soint('')raisesValueErrorfor every non-empty coordinate key. Fix: drop the"c"token after splitting instead of slicing one character.src/zarr/core/array.py:6011,src/zarr/core/group.py:1219, low)._update_attributesmutatesmetadata.attributesin place even though the metadata classes are frozen and hash byto_dict(), so an instance's hash can change after construction. Fix: build a new attributes dict and usedataclasses.replace, asupdate_shapealready does.Attributes.putclears live metadata before persisting (non-atomic despite docstring) (src/zarr/core/attributes.py:53, low).putempties the in-memory attributes dict before the store write; ifsave_metadataraises, memory and store disagree. Fix: compute the replacement dict first and apply it with a replace-based update. (Shares root cause with B7.)🚀 Performance
ShardingCodecrebuilds codec pipelines on every shard access (src/zarr/codecs/sharding.py:367, high impact).codec_pipelineis a plain@propertycallingget_pipeline_class().from_codecs(...)per shard, and_shard_index_size/_decode_shard_index/_encode_shard_indexeach rebuild a fresh index pipeline per shard. Fix: cache on the frozen codec instance (cached_property/instance-keyedlru_cache).default_buffer_prototype()reconstructed per inner chunk inside shards (src/zarr/codecs/sharding.py:105, medium). A debugassert prototype == default_buffer_prototype()runs config + registry lookups for every inner-chunk fetch on the hot read path. Fix: hoist or drop the assertion.src/zarr/codecs/sharding.py:429,561, medium). Each shard decode/encode reconstructsChunkGrid.from_sizes, aBasicIndexer, and its materialized projection list, though they're identical for every shard. Fix: memoize keyed by(shard_shape, chunk_shape).config.get("async.concurrency")lookups on the per-chunk codec path (src/zarr/core/codec_pipeline.py:398,508,574,607,626,src/zarr/abc/codec.py:252,289,496, medium-low). With the defaultbatch_size=1, each chunk incurs ~4 donfig dict traversals that cannot change mid-operation. Fix: read once perread/writecall and thread through.src/zarr/core/group.py:3322, medium)._iter_members_deepdrains each subgroup iterator sequentially, so latency does not overlap across branches on high-latency stores. Fix: fan out sibling iterators concurrently under the existing semaphore.P6 —— refuted during implementation:StorePath.__truediv__re-normalizes already-normalized paths per chunk__truediv__calls_join_paths(a plain"/".join), notnormalize_path; the join is already cheap. No action.🧹 Simplifications
_create_v2/_create_v3/_create_array_metadata(src/zarr/core/array.py:579,660,4404). Three copies of theoverwrite/supports_deletesdance; extract one helper._get_orthogonal_selection/_get_mask_selection/_get_coordinate_selectionare single-caller pass-through layers (src/zarr/core/array.py:5526-5695). ~110 net lines removable by inlining indexer construction into theAsyncArray.get_*methods (the syncArrayalready bypasses them).zarr_format == 2 ? metadata.dtype : metadata.data_typeinline dispatch (src/zarr/core/array.py:5393,5739,968). Add a unifyingzdtypeproperty on both metadata classes.Array.cdata_shapeandArray._chunk_grid_shapeduplicate the same delegation (src/zarr/core/array.py:2162-2182).open_groupandcreate(src/zarr/api/asynchronous.py:816,1013). Compress to one helper iterating{name: value}.contains_array/contains_groupduplicate_contains_node_v3(src/zarr/storage/_common.py:591-661). Rewrite as(await _contains_node_v3(sp)) == "array" / "group". Implementation note: onlycontains_groupwas rewritten —contains_arrayraisesValueError(rather than returningFalse) when the node is a group, so the rewrite would change observable behavior.parse_zarr_formatdefined three times (metadata/v2.py:280,metadata/v3.py:49,group.py:86). Unify with overloads for literal narrowing (medium effort).default_filters_v3is a trivially inlinable no-op (src/zarr/core/array.py:4836). Body isreturn ()with one caller.AsyncArray._create/_create_v2/_create_v3trio (~140 lines) exists solely for the deprecatedcreate()path (src/zarr/core/array.py:368-696). Removable when the legacy API is removed — tracked here, not actionable yet.*_likefunctions re-checkisinstanceafter_like_argsalready did (src/zarr/api/asynchronous.py:139,1091-1313). Foldfill_valueextraction into_like_args.✨ Modernizations (Python ≥ 3.12)
batched()duplicatesitertools.batched(src/zarr/core/codec_pipeline.py:45-50). Delete and use the stdlib version (3.12+).typing_extensions.Buffer→collections.abc.Buffer(src/zarr/codecs/crc32c_.py:8,45,69). Stdlib since 3.12.P = ParamSpec("P")(src/zarr/core/sync.py:22). Defined, never referenced; delete.product()/reduce(operator.mul, ...)/np.prod()on int tuples →math.prod()(src/zarr/core/common.py:85,src/zarr/core/indexing.py:1190,src/zarr/core/array.py:908,src/zarr/core/chunk_grids.py:855).math.prodavoids numpy scalar round-trips; keepproduct = math.prodshim for existing importers. Implementation note:chunk_grids.py:689,704,711operate on a numpy float array (not int tuples) andnumcodecs/_codecs.py:21is inside a docstring — both excluded.os.path.exists/os.path.getsizein the otherwise pathlib-first_local.py(src/zarr/storage/_local.py:366,372). UsePath.exists()/Path.stat().st_sizeand dropimport os.pytest.raiseswithoutmatch=) (pyproject.toml:348, already a TODO; 19 test files affected). Medium effort, tests only.cast()literals, 93 auto-fixable occurrences). Cosmetic; large mechanical diff — judgement call.ceildivzero-guard is ambiguous (src/zarr/core/common.py:88).if a == 0: return 0only matters forceildiv(0, 0); clarify intent or tighten to a divisor check.Reviewed, no action proposed
@dataclass(slots=True)sweep: blocked bycached_propertyusage on codec classes and downstream-subclassing compat on public classes.from __future__ import annotations(60+ files): needs a per-file runtime-introspection audit; not worth the risk now.typing_extensions.ReadOnlyand genericTypedDict: correctly pinned to typing_extensions until Python 3.13 is the minimum /extra_itemslands in stdlib.is_complete_chunksemantics, chunk-grid boundary math, v2/v3 NaN fill-value round-trips,sync.pydouble-checked locking,LocalStoreatomic writes, sharding index offset correction,BytesCodecendianness,MemoryStore.set_if_not_existsatomicity.🤖 Generated with Claude Code
Round 2 (deeper sweep: dtype subsystem, indexing, group internals, fsspec/obstore, sync API)
Draft PRs (round 2)
zarr.open())Not actioned: S13/S14/S21 (catalogued, medium-risk design passes).
🐛 Bugs (round 2)
src/zarr/core/dtype/common.py:63-84, high).check_structured_dtype_v2_innerrecurses into itself for a nested field whose last element is a list of field pairs, somatch_jsonraisesValueError: No Zarr data type foundfor data thatStruct.to_json(zarr_format=2)itself just wrote. v3 is fine. Fix: recurse with the list-of-fields checker when the inner element is a non-string sequence.src/zarr/core/dtype/npy/structured.py:186-216, high).from_native_dtypekeeps only(name, dtype)pairs;to_native_dtypere-packs contiguously, sonp.dtype([...], align=True)(itemsize 16) comes back as itemsize 9 and stored chunk bytes are misinterpreted. Fix: preserve offsets/itemsize, or loudly reject non-packed layouts._iter_members_deepdropsuse_consolidated_for_childrenon recursion (src/zarr/core/group.py:3343, high). Only immediate children get fresh metadata; grandchildren keep stale consolidated blobs, whichconsolidate_metadatathen bakes into the new consolidation. Empirically confirmed. Implementation found two further holes in the same flag's plumbing:AsyncGroup._membersshort-circuited to consolidated metadata even when the flag was False, and syncGroup.membersnever forwarded the flag at all — all three fixed in Fix consolidated-metadata staleness in deep iteration, require_array dtype check, attribute-update semantics, and sync timeout cancellation #190.Group.update_attributes_asyncoverwrites attributes while every other update path merges (src/zarr/core/group.py:1965vs:1219, medium). Two "update" entry points on the same object disagree about whether existing keys survive. Fix: align on merge semantics.require_arraywith defaultdtype=Nonespuriously rejects existing non-float64 arrays (src/zarr/core/group.py:1195-1201, medium).np.dtype(None)→ float64, thennp.can_cast(complex128, float64)fails. Fix: skip the dtype check when the caller didn't specify one.src/zarr/storage/__init__.py:35-37, low). Thedefault_compressorwarning recommendsarray.v2_default_compressor.numeric, which is itself removed/deprecated. Fix: update the message.sync()timeout leaves the coroutine running (src/zarr/core/sync.py:151-153, low). OnTimeoutErrorthe future is never cancelled, so I/O continues after the caller believes it aborted. Fix: cancel the future on the timeout branch.NDBuffer.create(fill_value=None)divergence (src/zarr/core/buffer/cpu.py:157vsgpu.py:179, low/latent). CPU zero-fills onNone; GPU returns uninitialized memory. Unreachable on current pipelines but a parity trap. Fix: make the backends agree.UInt64.from_native_dtypeis defined twice in the same class (src/zarr/core/dtype/npy/int.py:1100and:1514, latent). Second definition silently shadows the first. Will be eliminated by the S11 refactor.🚀 Performance (round 2)
P7 —— refuted during implementation: numcodecs_NumcodecsArrayBytesCodec._decode_singleforces a full-chunkto_bytes()copy per chunkPCodec.decode/ZFPY.decodecallensure_bytes(), which passesbytesthrough unchanged but copies ndarray input — soto_bytes()is already the optimal form, and the proposed change measured slower (1.07µs vs 0.75µs per 16KB chunk). No action.BytesCodec._decode_syncrebuilds the native dtype viadataclasses.replace(...)per chunk (src/zarr/codecs/bytes.py:89-93, medium). ~1µs/chunk, identical result for every chunk of a read. Fix: cache keyed on(dtype, endian).zarr.open()withzarr_format=Noneprobes array metadata, then re-probes the group on the same path (src/zarr/api/asynchronous.py:360-374, medium on high-latency stores). Opening a group viaopen()fetcheszarr.json/.zattrstwice. Fix: single combined probe dispatching to array vs group; needs care to preserve v2/v3 precedence and warning semantics.BasicIndexer.__iter__builds 4 tuples per chunk in separate passes (src/zarr/core/indexing.py:611-619, low-medium; ~1.75µs/chunk, ~70ms over 40k chunks). Folded into the S20 dedup (single-pass shared helper).🧹 Simplifications (round 2)
BaseInt(src/zarr/core/dtype/npy/int.py:240-1551, ~450-550 lines).BaseFloat/BaseComplexalready centralizefrom_native_dtype/to_native_dtype/_from_json_v2/_from_json_v3/to_json; the 8 int classes copy-paste all five. Real variation:item_size,HasEndianness(absent on Int8/UInt8), 2 Windows_check_native_dtypeoverrides. Also fixes B17 and aeeeeeeeeeeeeeeeeedocstring typo (~line 1424).Group.create's 120-line docstring is a verbatim copy ofGroup.create_array's (src/zarr/core/group.py:2380-2522). Replace with a one-line alias docstring (~115 lines, docs-only)._decode_single/_encode_singlewrappers liftable intoBaseCodec(7 codecs delegate identically, 3 differ only byasyncio.to_thread; ~130-160 lines). Catalogued — medium risk on the hot async path; not actioned this round.from_dictacross 10 codecs (~40-50 lines + 10type: ignores). Catalogued — needs a_codec_nameClassVar design pass; not actioned this round._parse_*_codecfunctions (src/zarr/registry.py:197-254, ~40 lines). Collapse to one generic helper._memory.py,_local.py,_fsspec.py,_zip.py, ~25 lines). OneStore._check_valuehelper on the ABC; keep message text identical._get_node_v2/_get_node_v3+_read_group_metadata_v2/_v3needless per-format splits (src/zarr/core/group.py:3402-3551, ~45 lines). Fold each pair into its dispatcher.src/zarr/core/indexing.py:1337-1608, ~30 lines). Shared classifier helper + message constant._with_semaphore,collect_aiterator,scalar_failed_type_check_msg(sync.py:192-226,dtype/wrapper.py:280-292, ~35 lines). Grep-verified zero callers in src and tests.ChunkProjection.__iter__bodies in Basic/Block indexers (src/zarr/core/indexing.py:611-619,:1119-1127). Extract one shared single-pass helper — implementing P10 at the same time.__eq__/__hash__in v2/v3 metadata (~20 lines). Catalogued — touches equality on a shared dataclass base; not actioned this round._parse_deprecated_compressorapplied in bothGroup.create_arrayandAsyncGroup.create_array(group.py:2646+:1131). Drop the sync-side pre-parse.✨ Modernizations (round 2)
asyncio.ensure_future→asyncio.create_task(src/zarr/core/common.py:110). Soft-deprecated since 3.10; drop-in here.ASYNCrules; fix 2 real blocking calls in async methods (pyproject.toml+src/zarr/storage/_local.py:366,372).LocalStore.move/getsizecallos.path.exists/os.path.getsizewithoutasyncio.to_thread, unlike every other I/O in the file. (Supersedes the pathlib-only change at the same lines in the round-1 modernization PR.)_cli/cli.py,_wrapper.py,testing/stateful.py,testing/utils.py), all auto-fixable, not 93. TC006 wants casts quoted, the opposite of what round 1 assumed.match=to barepytest.raises(ValueError)) — corrects round-1 M6: 15 violations in 2 files (tests/test_store/test_core.py,tests/test_store/test_zip.py), not 19 files.Reviewed, no action proposed (round 2)
all_equalNaN/signed-zero handling (the-0.0 != 0.0bit-pattern branch is intentional, PR Write chunks with negative zero values and a zero fill value zarr-developers/zarr-python#3216) — all verified against numpy empirically.RangeByteRequest); obstoredelete_dirsegment matching correct.inspect).registry.get_codec_classlazy-load ordering,_NumcodecsCodecthread offloading (already offloaded),get_array_metadataconcurrent v2 probes, metadata__eq__cost (cold path), dtype registry O(n) matching (per-open, not per-chunk) — all fine.