Skip to content

Commit 943a2ab

Browse files
committed
Merge branch 'main' into use-xdist
2 parents 5b00151 + 373910c commit 943a2ab

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+6081
-435
lines changed

.github/workflows/releases.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
name: Wheels
22

33
on:
4+
release:
5+
types:
6+
- published
47
push:
58
branches: [main]
69
pull_request:
@@ -64,7 +67,7 @@ jobs:
6467
name: Upload to PyPI
6568
needs: [build_artifacts, test_dist_pypi]
6669
runs-on: ubuntu-latest
67-
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/v')
70+
if: github.event_name == 'release'
6871
environment:
6972
name: releases
7073
url: https://pypi.org/p/zarr

.github/workflows/test.yml

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,14 +155,36 @@ jobs:
155155
run: |
156156
hatch run doctest:test
157157
158+
benchmarks:
159+
name: Benchmark smoke test
160+
runs-on: ubuntu-latest
161+
steps:
162+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
163+
with:
164+
fetch-depth: 0
165+
persist-credentials: false
166+
- name: Set up Python
167+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
168+
with:
169+
python-version: '3.13'
170+
cache: 'pip'
171+
- name: Install Hatch
172+
uses: pypa/hatch@257e27e51a6a5616ed08a39a408a21c35c9931bc
173+
with:
174+
version: '1.16.5'
175+
- name: Run Benchmarks
176+
run: |
177+
hatch env run --env "test.py3.13-minimal" run-benchmark
178+
158179
test-complete:
159180
name: Test complete
160181

161182
needs:
162183
[
163184
test,
164185
test-upstream-and-min-deps,
165-
doctests
186+
doctests,
187+
benchmarks
166188
]
167189
if: always()
168190
runs-on: ubuntu-latest

TEAM.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- @dstansby (David Stansby)
1111
- @dcherian (Deepak Cherian)
1212
- @TomAugspurger (Tom Augspurger)
13+
- @maxrjones (Max Jones)
1314

1415
## Emeritus core-developers
1516
- @alimanfoo (Alistair Miles)

changes/3802.feature.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Add support for rectilinear (variable-sized) chunk grids. This feature is experimental and
2+
must be explicitly enabled via ``zarr.config.set({'array.rectilinear_chunks': True})``.
3+
4+
Rectilinear chunks can be used through:
5+
6+
- **Creating arrays**: Pass nested sequences (e.g., ``[[10, 20, 30], [50, 50]]``) to ``chunks``
7+
in ``zarr.create_array``, ``zarr.from_array``, ``zarr.zeros``, ``zarr.ones``, ``zarr.full``,
8+
``zarr.open``, and related functions, or to ``chunk_shape`` in ``zarr.create``.
9+
- **Opening existing arrays**: Arrays stored with the ``rectilinear`` chunk grid are read
10+
transparently via ``zarr.open`` and ``zarr.open_array``.
11+
- **Rectilinear sharding**: Shard boundaries can be rectilinear while inner chunks remain regular.
12+
13+
**Breaking change**: The ``validate`` method on ``BaseCodec`` and ``CodecPipeline`` now receives
14+
a ``ChunkGridMetadata`` instance instead of a ``ChunkGrid`` instance for the ``chunk_grid``
15+
parameter. Third-party codecs that override ``validate`` and inspect the chunk grid will need to
16+
update their type annotations. No known downstream packages were using this parameter.

changes/3846.bugfix.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix `ZipStore.list()`, `list_dir()`, and `exists()` to auto-open the zip file when called before `open()`, consistent with the existing behavior of `get()` and `set()`.

design/chunk-grid.md

Lines changed: 711 additions & 0 deletions
Large diffs are not rendered by default.

docs/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@
461461
- Test that a `ValueError` is raised for invalid byte range syntax in `StoreTests`. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
462462
- Separate instantiating and opening a store in `StoreTests`. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
463463
- Add a test for using Stores as a context managers in `StoreTests`. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
464-
- Implemented `LogingStore.open()`. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
464+
- Implemented `LoggingStore.open()`. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
465465
- `LoggingStore` is now a generic class. ([#2693](https://github.com/zarr-developers/zarr-python/issues/2693))
466466
- Change StoreTest's `test_store_repr`, `test_store_supports_writes`,
467467
`test_store_supports_partial_writes`, and `test_store_supports_listing`

docs/user-guide/arrays.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -611,6 +611,171 @@ In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is
611611
This means that `10*10` chunks are stored in each shard, and there are `10*10` shards in total.
612612
Without the `shards` argument, there would be 10,000 chunks stored as individual files.
613613

614+
## Rectilinear (variable) chunk grids
615+
616+
!!! warning "Experimental"
617+
Rectilinear chunk grids are an experimental feature and may change in
618+
future releases. This feature is expected to stabilize in Zarr version 3.3.
619+
620+
Because the feature is still stabilizing, it is disabled by default and
621+
must be explicitly enabled:
622+
623+
```python
624+
import zarr
625+
zarr.config.set({"array.rectilinear_chunks": True})
626+
```
627+
628+
Or via the environment variable `ZARR_ARRAY__RECTILINEAR_CHUNKS=True`.
629+
630+
The examples below assume this config has been set.
631+
632+
By default, Zarr arrays use a regular chunk grid where every chunk along a
633+
given dimension has the same size (except possibly the final boundary chunk).
634+
Rectilinear chunk grids allow each chunk along a dimension to have a different
635+
size. This is useful when the natural partitioning of the data is not uniform —
636+
for example, satellite swaths of varying width, time series with irregular
637+
intervals, or spatial tiles of different extents.
638+
639+
### Creating arrays with rectilinear chunks
640+
641+
To create an array with rectilinear chunks, pass a nested list to the `chunks`
642+
parameter where each inner list gives the chunk sizes along one dimension:
643+
644+
```python exec="true" session="arrays" source="above" result="ansi"
645+
zarr.config.set({"array.rectilinear_chunks": True})
646+
z = zarr.create_array(
647+
store=zarr.storage.MemoryStore(),
648+
shape=(60, 100),
649+
chunks=[[10, 20, 30], [50, 50]],
650+
dtype='int32',
651+
)
652+
print(z.info)
653+
```
654+
655+
In this example the first dimension is split into three chunks of sizes 10, 20,
656+
and 30, while the second dimension is split into two equal chunks of size 50.
657+
658+
### Reading and writing data
659+
660+
Rectilinear arrays support the same indexing interface as regular arrays.
661+
Reads and writes that cross chunk boundaries of different sizes are handled
662+
automatically:
663+
664+
```python exec="true" session="arrays" source="above" result="ansi"
665+
import numpy as np
666+
data = np.arange(60 * 100, dtype='int32').reshape(60, 100)
667+
z[:] = data
668+
# Read a slice that spans the first two chunks (sizes 10 and 20) along axis 0
669+
print(z[5:25, 0:5])
670+
```
671+
672+
### Inspecting chunk sizes
673+
674+
The `.write_chunk_sizes` property returns the actual data size of each storage
675+
chunk along every dimension. It works for both regular and rectilinear arrays
676+
and returns a tuple of tuples (matching the dask `Array.chunks` convention).
677+
When sharding is used, `.read_chunk_sizes` returns the inner chunk sizes instead:
678+
679+
```python exec="true" session="arrays" source="above" result="ansi"
680+
print(z.write_chunk_sizes)
681+
```
682+
683+
For regular arrays, this includes the boundary chunk:
684+
685+
```python exec="true" session="arrays" source="above" result="ansi"
686+
z_regular = zarr.create_array(
687+
store=zarr.storage.MemoryStore(),
688+
shape=(100, 80),
689+
chunks=(30, 40),
690+
dtype='int32',
691+
)
692+
print(z_regular.write_chunk_sizes)
693+
```
694+
695+
Note that the `.chunks` property is only available for regular chunk grids. For
696+
rectilinear arrays, use `.write_chunk_sizes` (or `.read_chunk_sizes`) instead.
697+
698+
### Resizing and appending
699+
700+
Rectilinear arrays can be resized. When growing past the current edge sum, a
701+
new chunk is appended covering the additional extent. When shrinking, the chunk
702+
edges are preserved and the extent is re-bound (chunks beyond the new extent
703+
simply become inactive):
704+
705+
```python exec="true" session="arrays" source="above" result="ansi"
706+
z = zarr.create_array(
707+
store=zarr.storage.MemoryStore(),
708+
shape=(30,),
709+
chunks=[[10, 20]],
710+
dtype='float64',
711+
)
712+
z[:] = np.arange(30, dtype='float64')
713+
print(f"Before resize: chunk_sizes={z.write_chunk_sizes}")
714+
z.resize((50,))
715+
print(f"After resize: chunk_sizes={z.write_chunk_sizes}")
716+
```
717+
718+
The `append` method also works with rectilinear arrays:
719+
720+
```python exec="true" session="arrays" source="above" result="ansi"
721+
z.append(np.arange(10, dtype='float64'))
722+
print(f"After append: shape={z.shape}, chunk_sizes={z.write_chunk_sizes}")
723+
```
724+
725+
### Compressors and filters
726+
727+
Rectilinear arrays work with all codecs — compressors, filters, and checksums.
728+
Since each chunk may have a different size, the codec pipeline processes each
729+
chunk independently:
730+
731+
```python exec="true" session="arrays" source="above" result="ansi"
732+
z = zarr.create_array(
733+
store=zarr.storage.MemoryStore(),
734+
shape=(60, 100),
735+
chunks=[[10, 20, 30], [50, 50]],
736+
dtype='float64',
737+
filters=[zarr.codecs.TransposeCodec(order=(1, 0))],
738+
compressors=[zarr.codecs.BloscCodec(cname='zstd', clevel=3)],
739+
)
740+
z[:] = np.arange(60 * 100, dtype='float64').reshape(60, 100)
741+
np.testing.assert_array_equal(z[:], np.arange(60 * 100, dtype='float64').reshape(60, 100))
742+
print("Roundtrip OK")
743+
```
744+
745+
### Rectilinear shard boundaries
746+
747+
Rectilinear chunk grids can also be used for shard boundaries when combined
748+
with sharding. In this case, the outer grid (shards) is rectilinear while the
749+
inner chunks remain regular. Each shard dimension must be divisible by the
750+
corresponding inner chunk size:
751+
752+
```python exec="true" session="arrays" source="above" result="ansi"
753+
z = zarr.create_array(
754+
store=zarr.storage.MemoryStore(),
755+
shape=(120, 100),
756+
chunks=(10, 10),
757+
shards=[[60, 40, 20], [50, 50]],
758+
dtype='int32',
759+
)
760+
z[:] = np.arange(120 * 100, dtype='int32').reshape(120, 100)
761+
print(z[50:70, 40:60])
762+
```
763+
764+
Note that rectilinear inner chunks with sharding are not supported — only the
765+
shard boundaries can be rectilinear.
766+
767+
### Metadata format
768+
769+
Rectilinear chunk grid metadata uses run-length encoding (RLE) for compact
770+
serialization. When reading metadata, both bare integers and `[value, count]`
771+
pairs are accepted:
772+
773+
- `[10, 20, 30]` — three chunks with explicit sizes
774+
- `[[10, 3]]` — three chunks of size 10 (RLE shorthand)
775+
- `[[10, 3], 5]` — three chunks of size 10, then one chunk of size 5
776+
777+
When writing, Zarr automatically compresses repeated values into RLE format.
778+
614779
## Missing features in 3.0
615780

616781
The following features have not been ported to 3.0 yet.

docs/user-guide/config.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ Configuration options include the following:
3030
- Default Zarr format `default_zarr_version`
3131
- Default array order in memory `array.order`
3232
- Whether empty chunks are written to storage `array.write_empty_chunks`
33+
- Enable experimental rectilinear chunks `array.rectilinear_chunks`
3334
- Whether missing chunks are filled with the array's fill value on read `array.read_missing_chunks` (default `True`). Set to `False` to raise a [`ChunkNotFoundError`][zarr.errors.ChunkNotFoundError] instead.
3435
- Async and threading options, e.g. `async.concurrency` and `threading.max_workers`
3536
- Selections of implementations of codecs, codec pipelines and buffers

0 commit comments

Comments
 (0)