Skip to content

Commit f2ec718

Browse files
committed
Remove dead code
1 parent 60ad5cb commit f2ec718

File tree

4 files changed

+67
-673
lines changed

4 files changed

+67
-673
lines changed

docs/design/chunk-grid.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Prior iterations on the chunk grid design were based on the Zarr V3 spec's defin
4444

4545
1. **A chunk grid is a concrete arrangement of chunks.** Not an abstract tiling pattern. This means that the chunk grid is bound to specific array dimensions, which enables the chunk grid to answer any question about any chunk (offset, size, count) without external parameters.
4646
2. **One implementation, multiple serialization forms.** A single `ChunkGrid` class handles all chunking logic. The serialization format (`"regular"` vs `"rectilinear"`) is chosen by the metadata layer, not the grid.
47-
3. **No chunk grid registry.** Simple name-based dispatch in `parse_chunk_grid()`.
47+
3. **No chunk grid registry.** Simple name-based dispatch in the metadata layer's `parse_chunk_grid()`.
4848
4. **Fixed vs Varying per dimension.** `FixedDimension(size, extent)` for uniform chunks; `VaryingDimension(edges, extent)` for per-chunk edge lengths with precomputed prefix sums. Avoids expanding regular dimensions into lists of identical values.
4949
5. **Transparent transitions.** Operations like `resize()` can move an array from regular to rectilinear chunking.
5050

@@ -274,15 +274,15 @@ When `extent < sum(edges)`, the dimension is always stored as `VaryingDimension`
274274
{"name": "rectilinear", "configuration": {"kind": "inline", "chunk_shapes": [[10, 20, 30], [[25, 4]]]}}
275275
```
276276

277-
Both names deserialize to the same `ChunkGrid` class. The serialized form does not include the array extent — that comes from `shape` in array metadata and is passed to `parse_chunk_grid()` at construction time.
277+
Both names deserialize to the same `ChunkGrid` class. The serialized form does not include the array extent — that comes from `shape` in array metadata and is combined with the chunk grid when constructing a behavioral `ChunkGrid` via `ChunkGrid.from_metadata()`.
278278

279-
**The `ChunkGrid` does not serialize itself.** The format choice (`"regular"` vs `"rectilinear"`) belongs to `ArrayV3Metadata`. The name is inferred from the chunk grid metadata DTO type (`RegularChunkGrid` `"regular"`, `RectilinearChunkGrid` `"rectilinear"`) or from `grid.is_regular` when a behavioral `ChunkGrid` is passed directly.
279+
**The `ChunkGrid` does not serialize itself.** The format choice (`"regular"` vs `"rectilinear"`) belongs to `ArrayV3Metadata`. Serialization and deserialization are handled by the metadata-layer chunk grid classes (`RegularChunkGrid` and `RectilinearChunkGrid` in `metadata/v3.py`), which provide `to_dict()` and `from_dict()` methods.
280280

281281
For `create_array`, the format is inferred from the `chunks` argument: a flat tuple produces `"regular"`, a nested list produces `"rectilinear"`. The `_is_rectilinear_chunks()` helper detects nested sequences like `[[10, 20], [5, 5]]`.
282282

283283
##### Rectilinear spec compliance
284284

285-
The rectilinear format requires `"kind": "inline"` (validated by `_validate_rectilinear_kind()`). Per the spec, each element of `chunk_shapes` can be:
285+
The rectilinear format requires `"kind": "inline"` (validated by `validate_rectilinear_kind()`). Per the spec, each element of `chunk_shapes` can be:
286286

287287
- A bare integer `m`: repeated until `sum >= array_extent`
288288
- A list of bare integers: explicit per-chunk sizes
@@ -291,13 +291,13 @@ The rectilinear format requires `"kind": "inline"` (validated by `_validate_rect
291291
RLE compression is used when serializing: runs of identical sizes become `[value, count]` pairs, singletons stay as bare integers.
292292

293293
```python
294-
# _compress_rle([10, 10, 10, 5]) -> [[10, 3], 5]
295-
# _expand_rle([[10, 3], 5]) -> [10, 10, 10, 5]
294+
# compress_rle([10, 10, 10, 5]) -> [[10, 3], 5]
295+
# expand_rle([[10, 3], 5]) -> [10, 10, 10, 5]
296296
```
297297

298-
For `FixedDimension` serialized as rectilinear, `_serialize_fixed_dim()` returns the bare integer `dim.size`. Per the rectilinear spec, a bare integer is repeated until the sum >= extent, preserving the full codec buffer size for boundary chunks.
298+
For a single-element `chunk_shapes` tuple like `(10,)`, `RectilinearChunkGrid.to_dict()` serializes it as a bare integer `10`. Per the rectilinear spec, a bare integer is repeated until the sum >= extent, preserving the full codec buffer size for boundary chunks.
299299

300-
**Zero-extent handling:** Regular grids serialize zero-extent dimensions without issue (the format encodes only `chunk_shape`, no edges). Rectilinear grids reject zero-extent dimensions because the spec requires at least one positive-integer edge length per axis. This asymmetry is intentional and spec-compliant — documented in `serialize_chunk_grid()`.
300+
**Zero-extent handling:** Regular grids serialize zero-extent dimensions without issue (the format encodes only `chunk_shape`, no edges). Rectilinear grids cannot represent zero-extent dimensions because the spec requires at least one positive-integer edge length per axis.
301301

302302
#### read_chunk_sizes / write_chunk_sizes
303303

@@ -418,7 +418,7 @@ Level 3 — Shard index: ceil(shard_dim / subchunk_dim) entries per dimension
418418

419419
The chunk grid is a concrete arrangement, not an abstract tiling pattern. A finite collection naturally has an extent. Storing it enables `__getitem__`, eliminates `dim_len` parameters from every method, and makes the grid self-describing.
420420

421-
This does *not* mean `ArrayV3Metadata.shape` should delegate to the grid. The array shape remains an independent field in metadata. The extent is passed into the grid at construction time so it can answer boundary questions without external parameters. It is **not** serialized as part of the chunk grid JSON — it comes from the `shape` field in array metadata and is passed to `parse_chunk_grid()`.
421+
This does *not* mean `ArrayV3Metadata.shape` should delegate to the grid. The array shape remains an independent field in metadata. The extent is passed into the grid at construction time so it can answer boundary questions without external parameters. It is **not** serialized as part of the chunk grid JSON — it comes from the `shape` field in array metadata and is combined with the chunk grid configuration in `ChunkGrid.from_metadata()`.
422422

423423
### Why distinguish chunk_size from data_size?
424424

@@ -466,7 +466,7 @@ The resolution:
466466

467467
@d-v-b raised in #3534 that users need a way to say "these chunks are regular, but serialize as rectilinear" (e.g., to allow future append/extend workflows without format changes). @jhamman initially made nested-list input always produce `RectilinearChunkGrid`.
468468

469-
The current branch resolves this via `_infer_chunk_grid_name()`, which extracts or infers the serialization name from the chunk grid input. When metadata is deserialized, the original name (from `{"name": "regular"}` or `{"name": "rectilinear"}`) flows through to `serialize_chunk_grid()` at write time. When a `ChunkGrid` is passed directly, the name is inferred from `grid.is_regular`. Current inference behavior:
469+
The current branch resolves this via the metadata-layer chunk grid classes. When metadata is deserialized, the original name (from `{"name": "regular"}` or `{"name": "rectilinear"}`) determines which metadata class is instantiated (`RegularChunkGrid` or `RectilinearChunkGrid`), and that class handles serialization via `to_dict()`. Current inference behavior for `create_array`:
470470
- `chunks=(10, 20)` (flat tuple) → infers `"regular"`
471471
- `chunks=[[10, 20], [5, 5]]` (nested lists with varying sizes) → infers `"rectilinear"`
472472
- `chunks=[[10, 10], [20, 20]]` (nested lists with uniform sizes) → `from_rectilinear` collapses to `FixedDimension`, so `is_regular=True` and infers `"regular"`
@@ -498,7 +498,7 @@ The current implementation partially realizes this separation:
498498

499499
This means `ArrayV3Metadata.chunk_grid` is now a `ChunkGridMetadata` (the DTO union type), **not** the behavioral `ChunkGrid`. Code that previously accessed behavioral methods on `metadata.chunk_grid` (e.g., `all_chunk_coords()`, `__getitem__`) must now use the behavioral grid from the array layer instead.
500500

501-
The name controls serialization format; `serialize_chunk_grid()` is called by `ArrayV3Metadata.to_dict()`. The behavioral grid handles all runtime queries.
501+
The name controls serialization format; each metadata DTO class provides its own `to_dict()` method for serialization. The behavioral grid handles all runtime queries.
502502

503503
## Prior art
504504

@@ -569,10 +569,9 @@ If the design is accepted, the POC branch can be split into 5 incremental PRs. P
569569
- Zero changes to existing code
570570

571571
**PR 2: Unified ChunkGrid class + serialization** (replaces hierarchy)
572-
- `ChunkGrid` with `from_regular`, `from_rectilinear`, `__getitem__`, `__iter__`, `all_chunk_coords`, `is_regular`, `chunk_shape`, `chunk_sizes`, `unique_edge_lengths`
573-
- `parse_chunk_grid()`, `serialize_chunk_grid()`, `_infer_chunk_grid_name()`
572+
- `ChunkGrid` with `from_regular`, `from_rectilinear`, `from_metadata`, `__getitem__`, `__iter__`, `all_chunk_coords`, `is_regular`, `chunk_shape`, `chunk_sizes`, `unique_edge_lengths`
574573
- `RegularChunkGrid` deprecation shim
575-
- `_infer_chunk_grid_name()` for serialization format inference
574+
- Metadata-layer serialization via `RegularChunkGrid.to_dict()`/`RectilinearChunkGrid.to_dict()`
576575
- Feature flag (`array.rectilinear_chunks`)
577576

578577
**PR 3: Indexing generalization**

src/zarr/core/chunk_grids.py

Lines changed: 3 additions & 183 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
import numbers
77
import operator
88
import warnings
9-
from collections.abc import Iterable, Sequence
109
from dataclasses import dataclass, field
1110
from functools import reduce
1211
from typing import TYPE_CHECKING, Any, Literal, Protocol, TypeGuard, cast, runtime_checkable
@@ -16,21 +15,14 @@
1615

1716
import zarr
1817
from zarr.core.common import (
19-
JSON,
20-
NamedConfig,
2118
ShapeLike,
2219
ceildiv,
23-
compress_rle,
24-
expand_rle,
25-
parse_named_configuration,
2620
parse_shapelike,
27-
validate_rectilinear_edges,
28-
validate_rectilinear_kind,
2921
)
3022
from zarr.errors import ZarrUserWarning
3123

3224
if TYPE_CHECKING:
33-
from collections.abc import Iterator
25+
from collections.abc import Iterable, Iterator, Sequence
3426

3527
from zarr.core.array import ShardsLike
3628
from zarr.core.metadata import ArrayMetadata
@@ -107,7 +99,7 @@ def with_extent(self, new_extent: int) -> FixedDimension:
10799
"""Re-bind to *new_extent* without modifying edges.
108100
109101
Used when constructing a grid from existing metadata where edges
110-
are already correct (e.g. ``parse_chunk_grid``). Raises on
102+
are already correct. Raises on
111103
``VaryingDimension`` if edges don't cover the new extent.
112104
"""
113105
return FixedDimension(size=self.size, extent=new_extent)
@@ -203,7 +195,7 @@ def with_extent(self, new_extent: int) -> VaryingDimension:
203195
"""Re-bind to *new_extent* without modifying edges.
204196
205197
Used when constructing a grid from existing metadata where edges
206-
are already correct (e.g. ``parse_chunk_grid``). Raises if the
198+
are already correct. Raises if the
207199
existing edges don't cover *new_extent*.
208200
"""
209201
edge_sum = self.cumulative[-1]
@@ -275,66 +267,6 @@ def is_boundary(self) -> bool:
275267
return self.shape != self.codec_shape
276268

277269

278-
# A single dimension's rectilinear chunk spec: bare int (uniform shorthand),
279-
# list of ints (explicit edges), or mixed RLE (e.g. [[10, 3], 5]).
280-
RectilinearDimSpec = int | list[int | list[int]]
281-
282-
# The serialization format name for a chunk grid.
283-
ChunkGridName = Literal["regular", "rectilinear"]
284-
285-
286-
def _serialize_fixed_dim(dim: FixedDimension) -> RectilinearDimSpec:
287-
"""Compact rectilinear representation for a fixed-size dimension.
288-
289-
Per the rectilinear spec, a bare integer is repeated until the sum
290-
>= extent. This preserves the full codec buffer size for boundary
291-
chunks, matching the regular grid spec ("chunks at the border always
292-
have the full chunk size").
293-
"""
294-
return dim.size
295-
296-
297-
def _serialize_varying_dim(dim: VaryingDimension) -> RectilinearDimSpec:
298-
"""RLE-compressed rectilinear representation for a varying dimension."""
299-
edges = list(dim.edges)
300-
rle = compress_rle(edges)
301-
if len(rle) < len(edges):
302-
return rle
303-
# mypy: list[int] is invariant, so it won't widen to list[int | list[int]]
304-
return cast("RectilinearDimSpec", edges)
305-
306-
307-
def _decode_dim_spec(dim_spec: JSON, array_extent: int | None = None) -> list[int]:
308-
"""Decode a single dimension's chunk edge specification per the rectilinear spec.
309-
310-
Per the spec, each element of ``chunk_shapes`` can be:
311-
- a bare integer ``m``: repeat ``m`` until the sum >= array extent
312-
- an array of bare integers and/or ``[value, count]`` RLE pairs
313-
314-
Parameters
315-
----------
316-
dim_spec
317-
The raw JSON value for one dimension's chunk edges.
318-
array_extent
319-
Array length along this dimension. Required when *dim_spec* is a bare
320-
integer (to know how many repetitions).
321-
"""
322-
if isinstance(dim_spec, int):
323-
if array_extent is None:
324-
raise ValueError("Integer chunk_shapes shorthand requires array shape to expand.")
325-
if dim_spec <= 0:
326-
raise ValueError(f"Integer chunk edge length must be > 0, got {dim_spec}")
327-
n = ceildiv(array_extent, dim_spec)
328-
return [dim_spec] * n
329-
if isinstance(dim_spec, list):
330-
has_sublists = any(isinstance(e, list) for e in dim_spec)
331-
if has_sublists:
332-
return expand_rle(dim_spec)
333-
else:
334-
return [int(e) for e in dim_spec]
335-
raise ValueError(f"Invalid chunk_shapes entry: {dim_spec}")
336-
337-
338270
def _is_rectilinear_chunks(chunks: Any) -> TypeGuard[Sequence[Sequence[int]]]:
339271
"""Check if chunks is a nested sequence (e.g. [[10, 20], [5, 5]]).
340272
@@ -628,118 +560,6 @@ def update_shape(self, new_shape: tuple[int, ...]) -> ChunkGrid:
628560
)
629561
return ChunkGrid(dimensions=dims)
630562

631-
# ChunkGrid does not serialize itself. The format choice ("regular" vs
632-
# "rectilinear") belongs to the metadata layer. Use serialize_chunk_grid()
633-
# for output and parse_chunk_grid() for input.
634-
635-
636-
def parse_chunk_grid(
637-
data: dict[str, JSON] | ChunkGrid | NamedConfig[str, Any],
638-
array_shape: tuple[int, ...],
639-
) -> ChunkGrid:
640-
"""Create a ChunkGrid from a metadata dict or existing grid, binding array shape.
641-
642-
This is the primary entry point for constructing a ChunkGrid from serialized
643-
metadata. It always produces a grid with correct extent values.
644-
645-
Both ``"regular"`` and ``"rectilinear"`` grid names are supported. Rectilinear
646-
grids are experimental and require the ``array.rectilinear_chunks`` config
647-
option to be enabled; a ``ValueError`` is raised otherwise.
648-
"""
649-
if isinstance(data, ChunkGrid):
650-
# Re-bind extent if array_shape differs from what's stored
651-
dims = tuple(
652-
dim.with_extent(extent)
653-
for dim, extent in zip(data.dimensions, array_shape, strict=True)
654-
)
655-
return ChunkGrid(dimensions=dims)
656-
657-
name_parsed, configuration_parsed = parse_named_configuration(data)
658-
659-
if name_parsed == "regular":
660-
chunk_shape_raw = configuration_parsed.get("chunk_shape")
661-
if chunk_shape_raw is None:
662-
raise ValueError("Regular chunk grid requires 'chunk_shape' configuration")
663-
if not isinstance(chunk_shape_raw, Sequence):
664-
raise TypeError(f"chunk_shape must be a sequence, got {type(chunk_shape_raw)}")
665-
return ChunkGrid.from_regular(array_shape, cast("Sequence[int]", chunk_shape_raw))
666-
667-
if name_parsed == "rectilinear":
668-
validate_rectilinear_kind(cast("str | None", configuration_parsed.get("kind")))
669-
chunk_shapes_raw = configuration_parsed.get("chunk_shapes")
670-
if chunk_shapes_raw is None:
671-
raise ValueError("Rectilinear chunk grid requires 'chunk_shapes' configuration")
672-
if not isinstance(chunk_shapes_raw, Sequence):
673-
raise TypeError(f"chunk_shapes must be a sequence, got {type(chunk_shapes_raw)}")
674-
if len(chunk_shapes_raw) != len(array_shape):
675-
raise ValueError(
676-
f"chunk_shapes has {len(chunk_shapes_raw)} dimensions but array shape "
677-
f"has {len(array_shape)} dimensions"
678-
)
679-
decoded: list[list[int]] = []
680-
for dim_spec, extent in zip(chunk_shapes_raw, array_shape, strict=True):
681-
decoded.append(_decode_dim_spec(dim_spec, array_extent=extent))
682-
validate_rectilinear_edges(decoded, array_shape)
683-
return ChunkGrid.from_rectilinear(decoded, array_shape=array_shape)
684-
685-
raise ValueError(f"Unknown chunk grid name: {name_parsed!r}")
686-
687-
688-
def serialize_chunk_grid(grid: ChunkGrid, name: ChunkGridName) -> dict[str, JSON]:
689-
"""Serialize a ChunkGrid to a metadata dict using the given format name.
690-
691-
The format choice ("regular" vs "rectilinear") belongs to the metadata layer,
692-
not the grid itself. This function is called by ArrayV3Metadata.to_dict().
693-
"""
694-
if name == "regular":
695-
if not grid.is_regular:
696-
raise ValueError(
697-
"Cannot serialize a non-regular chunk grid as 'regular'. Use 'rectilinear' instead."
698-
)
699-
# The regular grid spec encodes only chunk_shape, not per-axis edges,
700-
# so zero-extent dimensions are valid (they simply produce zero chunks).
701-
return {
702-
"name": "regular",
703-
"configuration": {"chunk_shape": tuple(grid.chunk_shape)},
704-
}
705-
706-
if name == "rectilinear":
707-
# Zero-extent dimensions cannot be represented as rectilinear because
708-
# the spec requires at least one positive-integer edge length per axis.
709-
# This is intentionally asymmetric with the regular grid, which encodes
710-
# only chunk_shape (no per-axis edges) and thus handles zero-extent
711-
# arrays without issue.
712-
if any(d.extent == 0 for d in grid.dimensions):
713-
raise ValueError(
714-
"Cannot serialize a zero-extent grid as 'rectilinear': "
715-
"the spec requires all edge lengths to be positive integers."
716-
)
717-
chunk_shapes: list[RectilinearDimSpec] = []
718-
for dim in grid.dimensions:
719-
if isinstance(dim, FixedDimension):
720-
chunk_shapes.append(_serialize_fixed_dim(dim))
721-
elif isinstance(dim, VaryingDimension):
722-
chunk_shapes.append(_serialize_varying_dim(dim))
723-
else:
724-
raise TypeError(f"Unexpected dimension type: {type(dim)}")
725-
return {
726-
"name": "rectilinear",
727-
"configuration": {"kind": "inline", "chunk_shapes": chunk_shapes},
728-
}
729-
730-
raise ValueError(f"Unknown chunk grid name for serialization: {name!r}")
731-
732-
733-
def _infer_chunk_grid_name(
734-
data: dict[str, JSON] | ChunkGrid | NamedConfig[str, Any],
735-
grid: ChunkGrid,
736-
) -> ChunkGridName:
737-
"""Extract or infer the chunk grid serialization name from the input."""
738-
if isinstance(data, dict):
739-
name, _ = parse_named_configuration(data)
740-
return cast("ChunkGridName", name)
741-
return "regular" if grid.is_regular else "rectilinear"
742-
743563

744564
def _guess_chunks(
745565
shape: tuple[int, ...] | int,

src/zarr/core/metadata/v3.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -360,9 +360,6 @@ def resolve_chunks(
360360
Flat inputs like ``(10, 10)`` or a scalar ``int`` produce a ``RegularChunkGrid``
361361
after normalization via :func:`~zarr.core.chunk_grids.normalize_chunks`.
362362
363-
See Also
364-
--------
365-
parse_chunk_grid : Deserialize a chunk grid from stored JSON metadata.
366363
"""
367364
from zarr.core.chunk_grids import _is_rectilinear_chunks, normalize_chunks
368365

0 commit comments

Comments
 (0)