Skip to content

Commit eb89a55

Browse files
authored
Merge branch 'main' into ig/spec0_py314
2 parents 715ec2a + 5d92e85 commit eb89a55

8 files changed

Lines changed: 129 additions & 62 deletions

File tree

.github/workflows/needs_release_notes.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
name: "Pull Request Labeler"
22

33
on:
4-
- pull_request_target:
5-
types: [opened, reopened, synchronize]
4+
pull_request_target:
5+
types: [opened, reopened, synchronize]
66

77
jobs:
88
labeler:

.github/workflows/releases.yml

Lines changed: 1 addition & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,6 @@ on:
66
pull_request:
77
branches: [main]
88
workflow_dispatch:
9-
inputs:
10-
tag:
11-
description: 'Git tag to build and publish (e.g. v3.1.6)'
12-
required: true
13-
type: string
149

1510
permissions:
1611
contents: read
@@ -21,26 +16,7 @@ concurrency:
2116

2217
jobs:
2318

24-
validate_tag:
25-
if: github.event_name == 'workflow_dispatch'
26-
runs-on: ubuntu-latest
27-
steps:
28-
- name: Validate tag format
29-
run: |
30-
if [[ ! "${{ inputs.tag }}" =~ ^v[0-9]+\.[0-9]+\.[0-9]+([a-z]+[0-9]*)?$ ]]; then
31-
echo "::error::Invalid tag format '${{ inputs.tag }}'. Expected format: v1.2.3, v1.2.3a1, v1.2.3rc1"
32-
exit 1
33-
fi
34-
- name: Verify tag exists
35-
run: |
36-
git ls-remote --tags "${{ github.server_url }}/${{ github.repository }}" "${{ inputs.tag }}" | grep -q "${{ inputs.tag }}" || {
37-
echo "::error::Tag '${{ inputs.tag }}' does not exist in the repository"
38-
exit 1
39-
}
40-
4119
build_artifacts:
42-
needs: [validate_tag]
43-
if: always() && (needs.validate_tag.result == 'success' || needs.validate_tag.result == 'skipped')
4420
name: Build wheel on ubuntu-latest
4521
runs-on: ubuntu-latest
4622
strategy:
@@ -49,7 +25,6 @@ jobs:
4925
steps:
5026
- uses: actions/checkout@v6
5127
with:
52-
ref: ${{ inputs.tag || github.ref }}
5328
submodules: true
5429
fetch-depth: 0
5530

@@ -86,9 +61,7 @@ jobs:
8661
upload_pypi:
8762
needs: [build_artifacts, test_dist_pypi]
8863
runs-on: ubuntu-latest
89-
if: >-
90-
(github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/v'))
91-
|| (github.event_name == 'workflow_dispatch' && startsWith(inputs.tag, 'v'))
64+
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/v')
9265
environment:
9366
name: releases
9467
url: https://pypi.org/p/zarr

changes/3828.misc.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
`CodecPipeline.read` and `CodecPipeline.read_batch` now return a tuple of typeddict objects
2+
that each carry information about the request for a chunk from storage.

changes/3836.doc.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Corrects the type annotation reported for the `batch_info` parameter in the `CodecPipeline.write`
2+
method docstring.

docs/user-guide/v3_migration.md

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -198,32 +198,29 @@ after the 3.0.0 release. If features listed below are important to your use case
198198
of Zarr-Python, please open (or comment on) a
199199
[GitHub issue](https://github.com/zarr-developers/zarr-python/issues/new).
200200

201-
- The following functions / methods have not been ported to Zarr-Python 3 yet:
201+
The following functions / methods have not been ported to Zarr-Python 3 yet:
202202

203-
* `zarr.copy` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
204-
* `zarr.copy_all` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
205-
* `zarr.copy_store` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
206-
* `zarr.Group.move` ([issue #2108](https://github.com/zarr-developers/zarr-python/issues/2108))
203+
- `zarr.copy` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
204+
- `zarr.copy_all` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
205+
- `zarr.copy_store` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
206+
- `zarr.Group.move` ([issue #2108](https://github.com/zarr-developers/zarr-python/issues/2108))
207207

208-
- The following features (corresponding to function arguments to functions in
208+
The following features (corresponding to function arguments to functions in
209209
`zarr`) have not been ported to Zarr-Python 3 yet. Using these features
210210
will raise a warning or a `NotImplementedError`:
211211

212-
* `cache_attrs`
213-
* `cache_metadata`
214-
* `chunk_store` ([issue #2495](https://github.com/zarr-developers/zarr-python/issues/2495))
215-
* `meta_array`
216-
* `object_codec` ([issue #2617](https://github.com/zarr-developers/zarr-python/issues/2617))
217-
* `synchronizer` ([issue #1596](https://github.com/zarr-developers/zarr-python/issues/1596))
218-
* `dimension_separator`
212+
- `cache_attrs`
213+
- `cache_metadata`
214+
- `chunk_store` ([issue #2495](https://github.com/zarr-developers/zarr-python/issues/2495))
215+
- `meta_array`
216+
- `object_codec` ([issue #2617](https://github.com/zarr-developers/zarr-python/issues/2617))
217+
- `synchronizer` ([issue #1596](https://github.com/zarr-developers/zarr-python/issues/1596))
218+
- `dimension_separator`
219219

220-
- The following features that were supported by Zarr-Python 2 have not been ported
220+
The following features that were supported by Zarr-Python 2 have not been ported
221221
to Zarr-Python 3 yet:
222222

223-
* Structured arrays / dtypes ([issue #2134](https://github.com/zarr-developers/zarr-python/issues/2134))
224-
* Fixed-length string dtypes ([issue #2347](https://github.com/zarr-developers/zarr-python/issues/2347))
225-
* Datetime and timedelta dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
226-
* Object dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
227-
* Ragged arrays ([issue #2618](https://github.com/zarr-developers/zarr-python/issues/2618))
228-
* Groups and Arrays do not implement `__enter__` and `__exit__` protocols ([issue #2619](https://github.com/zarr-developers/zarr-python/issues/2619))
229-
* Default filters for object dtypes for Zarr format 2 arrays ([issue #2627](https://github.com/zarr-developers/zarr-python/issues/2627))
223+
- Object dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
224+
- Ragged arrays ([issue #2618](https://github.com/zarr-developers/zarr-python/issues/2618))
225+
- Groups and Arrays do not implement `__enter__` and `__exit__` protocols ([issue #2619](https://github.com/zarr-developers/zarr-python/issues/2619))
226+
- Default filters for object dtypes for Zarr format 2 arrays ([issue #2627](https://github.com/zarr-developers/zarr-python/issues/2627))

src/zarr/abc/codec.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
"CodecInput",
3333
"CodecOutput",
3434
"CodecPipeline",
35+
"GetResult",
3536
"SupportsSyncCodec",
3637
]
3738

@@ -429,13 +430,13 @@ async def read(
429430
batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
430431
out: NDBuffer,
431432
drop_axes: tuple[int, ...] = (),
432-
) -> None:
433+
) -> tuple[GetResult, ...]:
433434
"""Reads chunk data from the store, decodes it and writes it into an output array.
434435
Partial decoding may be utilized if the codecs and stores support it.
435436
436437
Parameters
437438
----------
438-
batch_info : Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple]]
439+
batch_info : Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
439440
Ordered set of information about the chunks.
440441
The first slice selection determines which parts of the chunk will be fetched.
441442
The second slice selection determines where in the output array the chunk data will be written.
@@ -447,6 +448,11 @@ async def read(
447448
``out``) to the fill value for the array.
448449
449450
out : NDBuffer
451+
452+
Returns
453+
-------
454+
tuple[GetResult, ...]
455+
One result per chunk in ``batch_info``.
450456
"""
451457
...
452458

@@ -463,7 +469,7 @@ async def write(
463469
464470
Parameters
465471
----------
466-
batch_info : Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple]]
472+
batch_info : Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
467473
Ordered set of information about the chunks.
468474
The first slice selection determines which parts of the chunk will be encoded.
469475
The second slice selection determines where in the value array the chunk data is located.

src/zarr/core/codec_pipeline.py

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
BytesBytesCodec,
1414
Codec,
1515
CodecPipeline,
16+
GetResult,
1617
)
1718
from zarr.core.common import concurrent_map
1819
from zarr.core.config import config
@@ -248,47 +249,58 @@ async def read_batch(
248249
batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
249250
out: NDBuffer,
250251
drop_axes: tuple[int, ...] = (),
251-
) -> None:
252+
) -> tuple[GetResult, ...]:
253+
results: list[GetResult] = []
252254
if self.supports_partial_decode:
255+
batch_info_list = list(batch_info)
253256
chunk_array_batch = await self.decode_partial_batch(
254257
[
255258
(byte_getter, chunk_selection, chunk_spec)
256-
for byte_getter, chunk_spec, chunk_selection, *_ in batch_info
259+
for byte_getter, chunk_spec, chunk_selection, *_ in batch_info_list
257260
]
258261
)
259262
for chunk_array, (_, chunk_spec, _, out_selection, _) in zip(
260-
chunk_array_batch, batch_info, strict=False
263+
chunk_array_batch, batch_info_list, strict=False
261264
):
262265
if chunk_array is not None:
263266
if drop_axes:
264267
chunk_array = chunk_array.squeeze(axis=drop_axes)
265268
out[out_selection] = chunk_array
269+
results.append(GetResult(status="present"))
266270
else:
267271
out[out_selection] = fill_value_or_default(chunk_spec)
272+
results.append(GetResult(status="missing"))
268273
else:
274+
batch_info_list = list(batch_info)
269275
chunk_bytes_batch = await concurrent_map(
270-
[(byte_getter, array_spec.prototype) for byte_getter, array_spec, *_ in batch_info],
276+
[
277+
(byte_getter, array_spec.prototype)
278+
for byte_getter, array_spec, *_ in batch_info_list
279+
],
271280
lambda byte_getter, prototype: byte_getter.get(prototype),
272281
config.get("async.concurrency"),
273282
)
274283
chunk_array_batch = await self.decode_batch(
275284
[
276285
(chunk_bytes, chunk_spec)
277286
for chunk_bytes, (_, chunk_spec, *_) in zip(
278-
chunk_bytes_batch, batch_info, strict=False
287+
chunk_bytes_batch, batch_info_list, strict=False
279288
)
280289
],
281290
)
282291
for chunk_array, (_, chunk_spec, chunk_selection, out_selection, _) in zip(
283-
chunk_array_batch, batch_info, strict=False
292+
chunk_array_batch, batch_info_list, strict=False
284293
):
285294
if chunk_array is not None:
286295
tmp = chunk_array[chunk_selection]
287296
if drop_axes:
288297
tmp = tmp.squeeze(axis=drop_axes)
289298
out[out_selection] = tmp
299+
results.append(GetResult(status="present"))
290300
else:
291301
out[out_selection] = fill_value_or_default(chunk_spec)
302+
results.append(GetResult(status="missing"))
303+
return tuple(results)
292304

293305
def _merge_chunk_array(
294306
self,
@@ -468,15 +480,19 @@ async def read(
468480
batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
469481
out: NDBuffer,
470482
drop_axes: tuple[int, ...] = (),
471-
) -> None:
472-
await concurrent_map(
483+
) -> tuple[GetResult, ...]:
484+
batch_results = await concurrent_map(
473485
[
474486
(single_batch_info, out, drop_axes)
475487
for single_batch_info in batched(batch_info, self.batch_size)
476488
],
477489
self.read_batch,
478490
config.get("async.concurrency"),
479491
)
492+
results: list[GetResult] = []
493+
for batch in batch_results:
494+
results.extend(batch)
495+
return tuple(results)
480496

481497
async def write(
482498
self,

tests/test_codec_pipeline.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
from __future__ import annotations
2+
3+
import pytest
4+
5+
import zarr
6+
from zarr.core.buffer.core import default_buffer_prototype
7+
from zarr.core.indexing import BasicIndexer
8+
from zarr.storage import MemoryStore
9+
10+
11+
@pytest.mark.parametrize(
12+
("write_slice", "read_slice", "expected_statuses"),
13+
[
14+
# Write all chunks, read all — all present
15+
(slice(None), slice(None), ("present", "present", "present")),
16+
# Write first chunk only, read all — first present, rest missing
17+
(slice(0, 2), slice(None), ("present", "missing", "missing")),
18+
# Write nothing, read all — all missing
19+
(None, slice(None), ("missing", "missing", "missing")),
20+
],
21+
)
22+
async def test_read_returns_get_results(
23+
write_slice: slice | None,
24+
read_slice: slice,
25+
expected_statuses: tuple[str, ...],
26+
) -> None:
27+
"""
28+
Test that CodecPipeline.read returns a tuple of GetResult with correct statuses.
29+
"""
30+
store = MemoryStore()
31+
arr = zarr.open_array(store, mode="w", shape=(6,), chunks=(2,), dtype="int64", fill_value=-1)
32+
33+
if write_slice is not None:
34+
arr[write_slice] = 0
35+
36+
async_arr = arr._async_array
37+
pipeline = async_arr.codec_pipeline
38+
metadata = async_arr.metadata
39+
40+
prototype = default_buffer_prototype()
41+
config = async_arr.config
42+
indexer = BasicIndexer(
43+
read_slice,
44+
shape=metadata.shape,
45+
chunk_grid=metadata.chunk_grid,
46+
)
47+
48+
out_buffer = prototype.nd_buffer.empty(
49+
shape=indexer.shape,
50+
dtype=metadata.dtype.to_native_dtype(),
51+
order=config.order,
52+
)
53+
54+
results = await pipeline.read(
55+
[
56+
(
57+
async_arr.store_path / metadata.encode_chunk_key(chunk_coords),
58+
metadata.get_chunk_spec(chunk_coords, config, prototype=prototype),
59+
chunk_selection,
60+
out_selection,
61+
is_complete_chunk,
62+
)
63+
for chunk_coords, chunk_selection, out_selection, is_complete_chunk in indexer
64+
],
65+
out_buffer,
66+
drop_axes=indexer.drop_axes,
67+
)
68+
69+
assert len(results) == len(expected_statuses)
70+
for result, expected_status in zip(results, expected_statuses, strict=True):
71+
assert result["status"] == expected_status

0 commit comments

Comments
 (0)