Merge branch 'main' into ig/spec0_py314

dcherian · web-flow · commit eb89a554939d · 2026-03-27T08:43:34.000-06:00
diff --git a/.github/workflows/needs_release_notes.yml b/.github/workflows/needs_release_notes.yml
@@ -1,8 +1,8 @@
 name: "Pull Request Labeler"
 
 on:
-  - pull_request_target:
-      types: [opened, reopened, synchronize]
+  pull_request_target:
+    types: [opened, reopened, synchronize]
 
 jobs:
   labeler:
diff --git a/.github/workflows/releases.yml b/.github/workflows/releases.yml
@@ -6,11 +6,6 @@ on:
   pull_request:
     branches: [main]
   workflow_dispatch:
-    inputs:
-      tag:
-        description: 'Git tag to build and publish (e.g. v3.1.6)'
-        required: true
-        type: string
 
 permissions:
   contents: read
@@ -21,26 +16,7 @@ concurrency:
 
 jobs:
 
-  validate_tag:
-    if: github.event_name == 'workflow_dispatch'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Validate tag format
-        run: |
-          if [[ ! "${{ inputs.tag }}" =~ ^v[0-9]+\.[0-9]+\.[0-9]+([a-z]+[0-9]*)?$ ]]; then
-            echo "::error::Invalid tag format '${{ inputs.tag }}'. Expected format: v1.2.3, v1.2.3a1, v1.2.3rc1"
-            exit 1
-          fi
-      - name: Verify tag exists
-        run: |
-          git ls-remote --tags "${{ github.server_url }}/${{ github.repository }}" "${{ inputs.tag }}" | grep -q "${{ inputs.tag }}" || {
-            echo "::error::Tag '${{ inputs.tag }}' does not exist in the repository"
-            exit 1
-          }
-
   build_artifacts:
-    needs: [validate_tag]
-    if: always() && (needs.validate_tag.result == 'success' || needs.validate_tag.result == 'skipped')
     name: Build wheel on ubuntu-latest
     runs-on: ubuntu-latest
     strategy:
@@ -49,7 +25,6 @@ jobs:
     steps:
       - uses: actions/checkout@v6
         with:
-          ref: ${{ inputs.tag || github.ref }}
           submodules: true
           fetch-depth: 0
 
@@ -86,9 +61,7 @@ jobs:
   upload_pypi:
     needs: [build_artifacts, test_dist_pypi]
     runs-on: ubuntu-latest
-    if: >-
-      (github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/v'))
-      || (github.event_name == 'workflow_dispatch' && startsWith(inputs.tag, 'v'))
+    if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags/v')
     environment:
       name: releases
       url: https://pypi.org/p/zarr
diff --git a/changes/3828.misc.md b/changes/3828.misc.md
@@ -0,0 +1,2 @@
+`CodecPipeline.read` and `CodecPipeline.read_batch` now return a tuple of typeddict objects
+that each carry information about the request for a chunk from storage.
diff --git a/changes/3836.doc.md b/changes/3836.doc.md
@@ -0,0 +1,2 @@
+Corrects the type annotation reported for the `batch_info` parameter in the `CodecPipeline.write`
+method docstring.
diff --git a/docs/user-guide/v3_migration.md b/docs/user-guide/v3_migration.md
@@ -198,32 +198,29 @@ after the 3.0.0 release. If features listed below are important to your use case
 of Zarr-Python, please open (or comment on) a
 [GitHub issue](https://github.com/zarr-developers/zarr-python/issues/new).
 
-- The following functions / methods have not been ported to Zarr-Python 3 yet:
+The following functions / methods have not been ported to Zarr-Python 3 yet:
 
-  * `zarr.copy` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
-  * `zarr.copy_all` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
-  * `zarr.copy_store` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
-  * `zarr.Group.move` ([issue #2108](https://github.com/zarr-developers/zarr-python/issues/2108))
+- `zarr.copy` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
+- `zarr.copy_all` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
+- `zarr.copy_store` ([issue #2407](https://github.com/zarr-developers/zarr-python/issues/2407))
+- `zarr.Group.move` ([issue #2108](https://github.com/zarr-developers/zarr-python/issues/2108))
 
-- The following features (corresponding to function arguments to functions in
+The following features (corresponding to function arguments to functions in
   `zarr`) have not been ported to Zarr-Python 3 yet. Using these features
   will raise a warning or a `NotImplementedError`:
 
-  * `cache_attrs`
-  * `cache_metadata`
-  * `chunk_store` ([issue #2495](https://github.com/zarr-developers/zarr-python/issues/2495))
-  * `meta_array`
-  * `object_codec` ([issue #2617](https://github.com/zarr-developers/zarr-python/issues/2617))
-  * `synchronizer` ([issue #1596](https://github.com/zarr-developers/zarr-python/issues/1596))
-  * `dimension_separator`
+- `cache_attrs`
+- `cache_metadata`
+- `chunk_store` ([issue #2495](https://github.com/zarr-developers/zarr-python/issues/2495))
+- `meta_array`
+- `object_codec` ([issue #2617](https://github.com/zarr-developers/zarr-python/issues/2617))
+- `synchronizer` ([issue #1596](https://github.com/zarr-developers/zarr-python/issues/1596))
+- `dimension_separator`
 
-- The following features that were supported by Zarr-Python 2 have not been ported
+The following features that were supported by Zarr-Python 2 have not been ported
   to Zarr-Python 3 yet:
 
-  * Structured arrays / dtypes ([issue #2134](https://github.com/zarr-developers/zarr-python/issues/2134))
-  * Fixed-length string dtypes ([issue #2347](https://github.com/zarr-developers/zarr-python/issues/2347))
-  * Datetime and timedelta dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
-  * Object dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
-  * Ragged arrays ([issue #2618](https://github.com/zarr-developers/zarr-python/issues/2618))
-  * Groups and Arrays do not implement `__enter__` and `__exit__` protocols ([issue #2619](https://github.com/zarr-developers/zarr-python/issues/2619))
-  * Default filters for object dtypes for Zarr format 2 arrays ([issue #2627](https://github.com/zarr-developers/zarr-python/issues/2627))
+- Object dtypes ([issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616))
+- Ragged arrays ([issue #2618](https://github.com/zarr-developers/zarr-python/issues/2618))
+- Groups and Arrays do not implement `__enter__` and `__exit__` protocols ([issue #2619](https://github.com/zarr-developers/zarr-python/issues/2619))
+- Default filters for object dtypes for Zarr format 2 arrays ([issue #2627](https://github.com/zarr-developers/zarr-python/issues/2627))
diff --git a/src/zarr/abc/codec.py b/src/zarr/abc/codec.py
@@ -32,6 +32,7 @@
     "CodecInput",
     "CodecOutput",
     "CodecPipeline",
+    "GetResult",
     "SupportsSyncCodec",
 ]
 
@@ -429,13 +430,13 @@ async def read(
         batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
         out: NDBuffer,
         drop_axes: tuple[int, ...] = (),
-    ) -> None:
+    ) -> tuple[GetResult, ...]:
         """Reads chunk data from the store, decodes it and writes it into an output array.
         Partial decoding may be utilized if the codecs and stores support it.
 
         Parameters
         ----------
-        batch_info : Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple]]
+        batch_info : Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
             Ordered set of information about the chunks.
             The first slice selection determines which parts of the chunk will be fetched.
             The second slice selection determines where in the output array the chunk data will be written.
@@ -447,6 +448,11 @@ async def read(
             ``out``) to the fill value for the array.
 
         out : NDBuffer
+
+        Returns
+        -------
+        tuple[GetResult, ...]
+            One result per chunk in ``batch_info``.
         """
         ...
 
@@ -463,7 +469,7 @@ async def write(
 
         Parameters
         ----------
-        batch_info : Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple]]
+        batch_info : Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
             Ordered set of information about the chunks.
             The first slice selection determines which parts of the chunk will be encoded.
             The second slice selection determines where in the value array the chunk data is located.
diff --git a/src/zarr/core/codec_pipeline.py b/src/zarr/core/codec_pipeline.py
@@ -13,6 +13,7 @@
     BytesBytesCodec,
     Codec,
     CodecPipeline,
+    GetResult,
 )
 from zarr.core.common import concurrent_map
 from zarr.core.config import config
@@ -248,47 +249,58 @@ async def read_batch(
         batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
         out: NDBuffer,
         drop_axes: tuple[int, ...] = (),
-    ) -> None:
+    ) -> tuple[GetResult, ...]:
+        results: list[GetResult] = []
         if self.supports_partial_decode:
+            batch_info_list = list(batch_info)
             chunk_array_batch = await self.decode_partial_batch(
                 [
                     (byte_getter, chunk_selection, chunk_spec)
-                    for byte_getter, chunk_spec, chunk_selection, *_ in batch_info
+                    for byte_getter, chunk_spec, chunk_selection, *_ in batch_info_list
                 ]
             )
             for chunk_array, (_, chunk_spec, _, out_selection, _) in zip(
-                chunk_array_batch, batch_info, strict=False
+                chunk_array_batch, batch_info_list, strict=False
             ):
                 if chunk_array is not None:
                     if drop_axes:
                         chunk_array = chunk_array.squeeze(axis=drop_axes)
                     out[out_selection] = chunk_array
+                    results.append(GetResult(status="present"))
                 else:
                     out[out_selection] = fill_value_or_default(chunk_spec)
+                    results.append(GetResult(status="missing"))
         else:
+            batch_info_list = list(batch_info)
             chunk_bytes_batch = await concurrent_map(
-                [(byte_getter, array_spec.prototype) for byte_getter, array_spec, *_ in batch_info],
+                [
+                    (byte_getter, array_spec.prototype)
+                    for byte_getter, array_spec, *_ in batch_info_list
+                ],
                 lambda byte_getter, prototype: byte_getter.get(prototype),
                 config.get("async.concurrency"),
             )
             chunk_array_batch = await self.decode_batch(
                 [
                     (chunk_bytes, chunk_spec)
                     for chunk_bytes, (_, chunk_spec, *_) in zip(
-                        chunk_bytes_batch, batch_info, strict=False
+                        chunk_bytes_batch, batch_info_list, strict=False
                     )
                 ],
             )
             for chunk_array, (_, chunk_spec, chunk_selection, out_selection, _) in zip(
-                chunk_array_batch, batch_info, strict=False
+                chunk_array_batch, batch_info_list, strict=False
             ):
                 if chunk_array is not None:
                     tmp = chunk_array[chunk_selection]
                     if drop_axes:
                         tmp = tmp.squeeze(axis=drop_axes)
                     out[out_selection] = tmp
+                    results.append(GetResult(status="present"))
                 else:
                     out[out_selection] = fill_value_or_default(chunk_spec)
+                    results.append(GetResult(status="missing"))
+        return tuple(results)
 
     def _merge_chunk_array(
         self,
@@ -468,15 +480,19 @@ async def read(
         batch_info: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]],
         out: NDBuffer,
         drop_axes: tuple[int, ...] = (),
-    ) -> None:
-        await concurrent_map(
+    ) -> tuple[GetResult, ...]:
+        batch_results = await concurrent_map(
             [
                 (single_batch_info, out, drop_axes)
                 for single_batch_info in batched(batch_info, self.batch_size)
             ],
             self.read_batch,
             config.get("async.concurrency"),
         )
+        results: list[GetResult] = []
+        for batch in batch_results:
+            results.extend(batch)
+        return tuple(results)
 
     async def write(
         self,
diff --git a/tests/test_codec_pipeline.py b/tests/test_codec_pipeline.py
@@ -0,0 +1,71 @@
+from __future__ import annotations
+
+import pytest
+
+import zarr
+from zarr.core.buffer.core import default_buffer_prototype
+from zarr.core.indexing import BasicIndexer
+from zarr.storage import MemoryStore
+
+
+@pytest.mark.parametrize(
+    ("write_slice", "read_slice", "expected_statuses"),
+    [
+        # Write all chunks, read all — all present
+        (slice(None), slice(None), ("present", "present", "present")),
+        # Write first chunk only, read all — first present, rest missing
+        (slice(0, 2), slice(None), ("present", "missing", "missing")),
+        # Write nothing, read all — all missing
+        (None, slice(None), ("missing", "missing", "missing")),
+    ],
+)
+async def test_read_returns_get_results(
+    write_slice: slice | None,
+    read_slice: slice,
+    expected_statuses: tuple[str, ...],
+) -> None:
+    """
+    Test that CodecPipeline.read returns a tuple of GetResult with correct statuses.
+    """
+    store = MemoryStore()
+    arr = zarr.open_array(store, mode="w", shape=(6,), chunks=(2,), dtype="int64", fill_value=-1)
+
+    if write_slice is not None:
+        arr[write_slice] = 0
+
+    async_arr = arr._async_array
+    pipeline = async_arr.codec_pipeline
+    metadata = async_arr.metadata
+
+    prototype = default_buffer_prototype()
+    config = async_arr.config
+    indexer = BasicIndexer(
+        read_slice,
+        shape=metadata.shape,
+        chunk_grid=metadata.chunk_grid,
+    )
+
+    out_buffer = prototype.nd_buffer.empty(
+        shape=indexer.shape,
+        dtype=metadata.dtype.to_native_dtype(),
+        order=config.order,
+    )
+
+    results = await pipeline.read(
+        [
+            (
+                async_arr.store_path / metadata.encode_chunk_key(chunk_coords),
+                metadata.get_chunk_spec(chunk_coords, config, prototype=prototype),
+                chunk_selection,
+                out_selection,
+                is_complete_chunk,
+            )
+            for chunk_coords, chunk_selection, out_selection, is_complete_chunk in indexer
+        ],
+        out_buffer,
+        drop_axes=indexer.drop_axes,
+    )
+
+    assert len(results) == len(expected_statuses)
+    for result, expected_status in zip(results, expected_statuses, strict=True):
+        assert result["status"] == expected_status

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	+`CodecPipeline.read` and `CodecPipeline.read_batch` now return a tuple of typeddict objects
	`2`	`+that each carry information about the request for a chunk from storage.`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	+Corrects the type annotation reported for the `batch_info` parameter in the `CodecPipeline.write`
	`2`	`+method docstring.`