Skip to content

Commit 278f44b

Browse files
committed
Clarify backwards compatibility design
1 parent 3e322ef commit 278f44b

File tree

1 file changed

+147
-20
lines changed

1 file changed

+147
-20
lines changed

design/chunk-grid.md

Lines changed: 147 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -511,40 +511,167 @@ The name controls serialization format; each metadata DTO class provides its own
511511

512512
## Migration
513513

514-
### Backwards compatibility
514+
### Public API compatibility
515515

516-
A module-level `__getattr__` shim in `chunk_grids.py` preserves the common downstream import pattern. Importing the old names emits a `DeprecationWarning` and returns the renamed metadata class:
516+
The user-facing API is fully backward-compatible. Existing code that creates, opens, reads, and writes zarr arrays continues to work without changes:
517+
518+
- `zarr.create_array`, `zarr.open`, `zarr.open_array`, `zarr.open_group` -- unchanged signatures. The `chunks` parameter type is *widened* (now also accepts nested sequences for rectilinear grids), but all existing call patterns still work.
519+
- `arr.chunks` -- returns `tuple[int, ...]` for regular arrays, same as before.
520+
- `arr.shape`, `arr.dtype`, `arr.ndim`, `arr.shards` -- unchanged.
521+
- Top-level `zarr` exports -- unchanged.
522+
- Rectilinear chunks are gated behind `zarr.config.set({'array.rectilinear_chunks': True})`, so they cannot be created accidentally.
523+
524+
New additions (purely additive): `arr.read_chunk_sizes`, `arr.write_chunk_sizes`, `zarr.experimental.ChunkGrid`, `zarr.experimental.ChunkSpec`.
525+
526+
The breaking changes discussed below are confined to **internal modules** (`zarr.core.chunk_grids`, `zarr.core.metadata.v3`, `zarr.core.indexing`) that downstream libraries like cubed and VirtualiZarr access directly.
527+
528+
### Internal API compatibility trade-off analysis
529+
530+
This section analyzes the internal breaking changes from the metadata/array separation and evaluates two strategies: (A) add backward-compatibility shims in zarr-python, vs. (B) require downstream packages to update. The baseline is **no shims at all**.
531+
532+
#### What breaks without any shims
533+
534+
Three API changes affect downstream code:
535+
536+
1. **`RegularChunkGrid` class removed from `zarr.core.chunk_grids`.** On `main`, `RegularChunkGrid` is defined in `chunk_grids.py` as a `Metadata` subclass. This branch replaces it with `RegularChunkGridMetadata` in `metadata/v3.py`. Without a shim, `from zarr.core.chunk_grids import RegularChunkGrid` raises `ImportError`.
537+
538+
2. **`RegularChunkGrid` no longer available from `zarr.core.metadata.v3`.** On `main`, `v3.py` imports `RegularChunkGrid` from `chunk_grids.py` for internal use. VirtualiZarr imports it from this location (`from zarr.core.metadata.v3 import RegularChunkGrid`). Without the internal import, this raises `ImportError`.
539+
540+
3. **`OrthogonalIndexer` constructor expects `ChunkGrid`, not `RegularChunkGrid`/`RegularChunkGridMetadata`.** Even if the import shims above resolve to `RegularChunkGridMetadata`, the indexer constructors access `chunk_grid._dimensions`, which only exists on the runtime `ChunkGrid` class. Cubed constructs `OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))` directly.
541+
542+
#### Downstream impact without shims
543+
544+
**VirtualiZarr** (5 line changes across 2 files):
517545

518546
```python
519-
from zarr.core.chunk_grids import RegularChunkGrid # DeprecationWarning, returns RegularChunkGridMetadata
547+
# manifests/array.py (line 6): import
548+
- from zarr.core.metadata.v3 import ArrayV3Metadata, RegularChunkGrid
549+
+ from zarr.core.metadata.v3 import ArrayV3Metadata, RegularChunkGridMetadata
550+
551+
# manifests/array.py (line 53): isinstance check
552+
- if not isinstance(_metadata.chunk_grid, RegularChunkGrid):
553+
+ if not isinstance(_metadata.chunk_grid, RegularChunkGridMetadata):
554+
555+
# parsers/zarr.py (line 16): import
556+
- from zarr.core.chunk_grids import RegularChunkGrid
557+
+ from zarr.core.metadata.v3 import RegularChunkGridMetadata
520558

521-
grid = RegularChunkGrid(chunk_shape=(10, 20)) # works — same as RegularChunkGridMetadata(...)
522-
isinstance(grid, RegularChunkGrid) # True — RegularChunkGrid is RegularChunkGridMetadata
559+
# parsers/zarr.py (line 270): isinstance check
560+
- if not isinstance(array_v3_metadata.chunk_grid, RegularChunkGrid):
561+
+ if not isinstance(array_v3_metadata.chunk_grid, RegularChunkGridMetadata):
562+
563+
# parsers/zarr.py (line 390): cast
564+
- cast(RegularChunkGrid, metadata.chunk_grid).chunk_shape
565+
+ cast(RegularChunkGridMetadata, metadata.chunk_grid).chunk_shape
523566
```
524567

525-
The same shim exists for `RectilinearChunkGrid``RectilinearChunkGridMetadata`.
568+
The `manifests/array.py` import is from `zarr.core.metadata.v3` (never a documented export; VirtualiZarr relied on a transitive import). The `parsers/zarr.py` import is from `zarr.core.chunk_grids` (the canonical location on `main`). Both are straightforward renames. The `.chunk_shape` attribute is unchanged on the new class.
569+
570+
If VirtualiZarr needs to support both old and new zarr-python, a version-conditional import adds ~5 more lines.
571+
572+
**Cubed** (3 line changes in 1 file):
573+
574+
```python
575+
# core/ops.py (lines 626-631)
576+
def _create_zarr_indexer(selection, shape, chunks):
577+
if zarr.__version__[0] == "3":
578+
- from zarr.core.chunk_grids import RegularChunkGrid
579+
+ from zarr.core.chunk_grids import ChunkGrid
580+
from zarr.core.indexing import OrthogonalIndexer
581+
- return OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))
582+
+ return OrthogonalIndexer(selection, shape, ChunkGrid.from_sizes(shape, chunks))
583+
```
584+
585+
Note that `ChunkGrid` is *not* a renamed class. `RegularChunkGrid(chunk_shape=chunks)` took only chunk sizes; `ChunkGrid.from_sizes(shape, chunks)` also requires the array shape. The `shape` parameter is already available at this call site.
586+
587+
If cubed needs to support both old and new zarr-python:
588+
589+
```python
590+
def _create_zarr_indexer(selection, shape, chunks):
591+
if zarr.__version__[0] == "3":
592+
from zarr.core.indexing import OrthogonalIndexer
593+
try:
594+
from zarr.core.chunk_grids import ChunkGrid
595+
return OrthogonalIndexer(selection, shape, ChunkGrid.from_sizes(shape, chunks))
596+
except ImportError:
597+
from zarr.core.chunk_grids import RegularChunkGrid
598+
return OrthogonalIndexer(selection, shape, RegularChunkGrid(chunk_shape=chunks))
599+
else:
600+
from zarr.indexing import OrthogonalIndexer
601+
return OrthogonalIndexer(selection, ZarrArrayIndexingAdaptor(shape, chunks))
602+
```
603+
604+
#### What shims can cover
605+
606+
**Shim 1: `__getattr__` in `chunk_grids.py`** (~15 lines)
607+
608+
Maps `RegularChunkGrid` to `RegularChunkGridMetadata` with a deprecation warning. Covers:
609+
- The `from zarr.core.chunk_grids import RegularChunkGrid` import pattern (used by cubed and VirtualiZarr's `parsers/zarr.py`)
610+
- `isinstance(x, RegularChunkGrid)` checks (because the name resolves to the actual class)
611+
- `RegularChunkGrid(chunk_shape=(...))` construction (because `RegularChunkGridMetadata` accepts the same arguments)
612+
613+
Does **not** cover: passing the result to `OrthogonalIndexer`, because `RegularChunkGridMetadata` lacks `._dimensions`.
614+
615+
**Shim 2: `__getattr__` in `metadata/v3.py`** (~12 lines)
616+
617+
Same pattern, covers VirtualiZarr's import from `zarr.core.metadata.v3`. Mirrors Shim 1 for a different import path.
618+
619+
**Shim 3: Auto-coerce `ChunkGridMetadata` in indexer constructors** (~30 lines)
620+
621+
A helper function + 1-line insertion in each of `BasicIndexer`, `OrthogonalIndexer`, `CoordinateIndexer`, and `MaskIndexer`:
622+
623+
```python
624+
def _resolve_chunk_grid(chunk_grid, shape):
625+
"""Coerce ChunkGridMetadata to runtime ChunkGrid if needed."""
626+
from zarr.core.chunk_grids import ChunkGrid as _ChunkGrid
627+
from zarr.core.metadata.v3 import ChunkGridMetadata
628+
if isinstance(chunk_grid, _ChunkGrid):
629+
return chunk_grid
630+
if isinstance(chunk_grid, ChunkGridMetadata):
631+
warnings.warn(
632+
"Passing ChunkGridMetadata to indexers is deprecated. "
633+
"Use ChunkGrid.from_sizes() instead.",
634+
DeprecationWarning, stacklevel=2,
635+
)
636+
if hasattr(chunk_grid, "chunk_shape"):
637+
return _ChunkGrid.from_sizes(shape, tuple(chunk_grid.chunk_shape))
638+
return _ChunkGrid.from_sizes(shape, chunk_grid.chunk_shapes)
639+
raise TypeError(f"Expected ChunkGrid or ChunkGridMetadata, got {type(chunk_grid)}")
640+
```
641+
642+
This covers cubed's `OrthogonalIndexer(selection, shape, RegularChunkGrid(...))` pattern end-to-end (combined with Shim 1).
643+
644+
#### Comparison
645+
646+
| | No shims | Shims 1+2 only | Shims 1+2+3 |
647+
|---|---|---|---|
648+
| **zarr-python additions** | 0 lines | ~27 lines | ~57 lines |
649+
| **VirtualiZarr changes** | 5 lines | 0 lines | 0 lines |
650+
| **Cubed changes** | 3 lines | 3 lines | 0 lines |
651+
| **Maintenance burden** | None | Low (deprecation shims are well-understood) | Medium (indexer coercion blurs metadata/runtime boundary) |
652+
| **API clarity** | Clean (metadata DTOs and runtime types are distinct) | Good (old names redirect to new names) | Weaker (indexers implicitly accept two type families) |
653+
654+
With Shims 1+2 only, VirtualiZarr's `manifests/array.py` import from `zarr.core.metadata.v3` is covered by Shim 2, and the `parsers/zarr.py` import from `zarr.core.chunk_grids` is covered by Shim 1. The `isinstance` checks work because both shims resolve to `RegularChunkGridMetadata`. The `cast` works because `.chunk_shape` is unchanged. So VirtualiZarr needs 0 changes with Shims 1+2. The 3 lines for cubed remain because Shim 1 resolves the import but `OrthogonalIndexer` still needs a runtime `ChunkGrid`.
526655

527656
### Downstream migration
528657

529-
| Old pattern | New pattern |
658+
Migration from `main` (where only `RegularChunkGrid` and the abstract `ChunkGrid` ABC exist):
659+
660+
| Old pattern (on `main`) | New pattern |
530661
|---|---|
531662
| `from zarr.core.chunk_grids import RegularChunkGrid` | `from zarr.core.metadata.v3 import RegularChunkGridMetadata` |
532-
| `from zarr.core.chunk_grids import RectilinearChunkGrid` | `from zarr.core.metadata.v3 import RectilinearChunkGridMetadata` |
533-
| `isinstance(cg, RegularChunkGrid)` | `isinstance(cg, RegularChunkGridMetadata)` or `cg.is_regular` on the `ChunkGrid` |
534-
| `isinstance(cg, RectilinearChunkGrid)` | `isinstance(cg, RectilinearChunkGridMetadata)` or `not cg.is_regular` |
535-
| `cg.chunk_shape` | `cg.chunk_shape` (unchanged on metadata objects) |
536-
| `cg.chunk_shapes` | `cg.chunk_shapes` (unchanged on metadata objects) |
537-
| Feature detection via class import | Version check or `hasattr(ChunkGrid, 'is_regular')` |
538-
539-
**[xarray#10880](https://github.com/pydata/xarray/pull/10880):** Replace `isinstance` checks with `.is_regular`. Write path simplifies with `chunks=[[...]]` API.
540-
541-
**[VirtualiZarr#877](https://github.com/zarr-developers/VirtualiZarr/pull/877):** Drop vendored `_is_nested_sequence`. Replace `isinstance` checks.
663+
| `from zarr.core.chunk_grids import ChunkGrid` (ABC) | `from zarr.core.chunk_grids import ChunkGrid` (concrete class, different API) |
664+
| `isinstance(cg, RegularChunkGrid)` | `isinstance(cg, RegularChunkGridMetadata)` or `grid.is_regular` on the runtime `ChunkGrid` |
665+
| `cg.chunk_shape` on `RegularChunkGrid` | `cg.chunk_shape` on `RegularChunkGridMetadata` (unchanged) |
666+
| `ChunkGrid.from_dict(data)` | `parse_chunk_grid(data)` from `zarr.core.metadata.v3` |
667+
| `chunk_grid.all_chunk_coords(array_shape)` | `chunk_grid.all_chunk_coords()` (shape now stored in grid) |
668+
| `chunk_grid.get_nchunks(array_shape)` | `chunk_grid.get_nchunks()` (shape now stored in grid) |
542669

543-
**[Icechunk#1338](https://github.com/earth-mover/icechunk/issues/1338):** Minimal impact — format changes driven by spec, not class hierarchy.
670+
During the earlier [#3534](https://github.com/zarr-developers/zarr-python/pull/3534) effort (which used separate `RegularChunkGrid`/`RectilinearChunkGrid` classes), downstream PRs and issues were opened to explore compatibility:
544671

545-
**[cubed#876](https://github.com/cubed-dev/cubed/issues/876):** Switch store creation to `ChunkGrid` API. @tomwhite confirmed in #3534 that rechunking with variable-sized intermediate chunks works.
672+
- xarray ([#10880](https://github.com/pydata/xarray/pull/10880)), VirtualiZarr ([#877](https://github.com/zarr-developers/VirtualiZarr/pull/877)), Icechunk ([#1338](https://github.com/earth-mover/icechunk/issues/1338)), cubed ([#876](https://github.com/cubed-dev/cubed/issues/876))
546673

547-
**HEALPix use case:** @tinaok demonstrated in #3534 that variable-chunked arrays arise naturally when grouping HEALPix cells by parent pixel — the chunk sizes come from `np.unique(parents, return_counts=True)`.
674+
These target #3534's API, not this branch's unified `ChunkGrid` design. New downstream POC branches for this design are linked in [Proofs of concepts](#proofs-of-concepts).
548675

549676
### Credits
550677

0 commit comments

Comments
 (0)