Skip to content

Commit b740cf2

Browse files
d-v-bclaudeilan-goldmaxrjoneschuckwondo
authored
feat:get_ranges (#3925)
* feat(core): add _coalesce module skeleton with CoalesceOptions and stub * test(core): add failing tests for coalesced_get basic cases * feat(core): implement coalesced_get for basic sequential cases * test(core): cover Offset/Suffix/None and mixed-cluster cases in coalesced_get * feat(core): run coalesced fetches concurrently under max_concurrency * test(core): cover key-missing (start/mid) and fetch-raises in coalesced_get * test(core): cover max_coalesced_bytes cap in coalesced_get * test(core): add coverage-invariant property test for coalesced_get * test(core): drop unused HEAVY_MERGE/NO_MERGE constants Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(core): shorten coalesced_get docstring summary line Split the overlong first line into a short numpydoc summary plus an extended description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(core): drop dead invariant checks in merged-group path After the input split at the top of coalesced_get, merged groups only ever contain RangeByteRequest members. Replace the per-element isinstance filters (and the defensive ``else 0`` sort-key branch) with a single assertion at the top of the merged-group block and direct attribute access. Also remove the unreachable ``if total == 0: return`` guard (``indexed`` is non-empty by construction once we pass the earlier guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(core): cover key-missing on uncoalescable input Exercise the ``kind == "missing"`` branch in the uncoalescable single-fetch arm for Offset/Suffix/None inputs, which was not hit by existing tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): cancel pending fetches on early exit and stop-after-miss Two related correctness issues in coalesced_get's drain loop: 1. When the consumer breaks out of the async-for (early exit), the generator's finally block only awaited in-flight tasks rather than cancelling them. That wasted I/O. Cancel first, then gather. 2. The drain loop waited on completion_queue for ``total`` entries, but after a "missing" or "error" we cancel pending tasks -- and cancelled tasks never enqueue a completion. With max_concurrency > 1 this could hang. Rework the drain loop to break out immediately on the first miss/error; the finally block handles cleanup. The new structure also collapses the redundant miss/error branches and removes the now-unused ``total``/``drained``/``stopped`` bookkeeping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(core): cover mid-stream miss with concurrency > 1 Exercises the concurrent path where a missing key is observed while other fetches are still in flight. Uses an asyncio.Event to gate late arrivals until after the miss has been processed, giving the drain loop an opportunity to observe and discard post-stop completions, and verifies the iterator terminates cleanly without hanging or raising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(core): verify consumer-break cancels pending fetches Drives many slow ranges with a small max_concurrency, breaks out of the async-for after the first yield, and verifies that at least one still-running fetch was cancelled rather than being left to run to completion. Cancellation is observed via a counter in the fetch's CancelledError branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(core): type coalesced_get as AsyncGenerator coalesced_get is implemented as an async generator (uses yield) and callers need access to aclose() to drive its finally block deterministically. Declaring the return type as AsyncGenerator instead of AsyncIterator exposes aclose()/asend()/athrow() through the type system, matches the runtime object, and lets consumers (e.g. the consumer-break test) avoid type-ignore escape hatches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(test): drop redundant pytestmark asyncio pyproject asyncio_mode=auto already covers async test dispatch; the explicit pytestmark was a vestige. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(storage): add private SupportsGetRanges protocol * test(storage): add failing tests for FsspecStore.get_ranges * feat(storage): add FsspecStore.get_ranges and coalesce_options kwarg * test(storage): add SupportsGetRanges conformance tests * chore: add towncrier fragment for get_ranges Used 0000.feature.md as a placeholder; rename to {pr-number}.feature.md once the PR is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: drop private-symbol mention from get_ranges changelog The SupportsGetRanges protocol is private; a user-facing release note shouldn't advertise it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: refactor tests * test: skip fsspec-backed get_ranges tests on old fsspec The min_deps CI job pins fsspec to 2023.10.0, which predates AsyncFileSystemWrapper. Wrapping a sync MemoryFileSystem fails there at fixture setup. Guard the affected tests with the same skipif pattern already used in test_fsspec.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> * chore: lint * refactor: better function design with explicit context * refactor(coalesce): split coalescing from coalesced get * refactor(coalesce): use sequence instead of iterable * Update src/zarr/storage/_fsspec.py Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> * refactor: banish typeddict, we are fine with kwargs * refactor: apply suggestions from code review * refactor: use drop queue in favor of as_completed; abort early with FileNotFoundError when get yields None * refactor: define get_ranges on the store abc * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Apply suggestion from @chuckwondo Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Apply suggestion from @chuckwondo Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * refactor: use function-scoped defaults * fix: fix failing test * refactor: defaults in one place * hoist imports * Update tests/test_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Update src/zarr/abc/store.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Update tests/test_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * apply suggestions from code review * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * raise exception group * raise exception group * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> * Update src/zarr/core/_coalesce.py Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Ilan Gold <ilanbassgold@gmail.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com>
1 parent 7e58df0 commit b740cf2

7 files changed

Lines changed: 1252 additions & 2 deletions

File tree

changes/3925.feature.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add `zarr.abc.store.Store.get_ranges` for concurrent, coalesced multi-range reads from a single key. The method is defined on the `Store` ABC with a default implementation built on `Store.get`, so every store inherits a working version; stores with native multi-range backends (e.g. `FsspecStore`) can override for efficiency. Coalescing knobs (`max_concurrency`, `max_gap_bytes`, `max_coalesced_bytes`) are passed as keyword arguments to `get_ranges`. Failures from underlying fetches surface as a `BaseExceptionGroup` (PEP 654); callers should use `except*` to filter for specific exception types such as `FileNotFoundError`.

src/zarr/abc/store.py

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@
44
import json
55
from abc import ABC, abstractmethod
66
from dataclasses import dataclass
7+
from functools import partial
78
from itertools import starmap
89
from typing import TYPE_CHECKING, Literal, Protocol, runtime_checkable
910

1011
from zarr.core.sync import sync
1112

1213
if TYPE_CHECKING:
13-
from collections.abc import AsyncGenerator, AsyncIterator, Iterable
14+
from collections.abc import AsyncGenerator, AsyncIterator, Iterable, Sequence
1415
from types import TracebackType
1516
from typing import Any, Self
1617

@@ -616,6 +617,66 @@ async def _get_many(
616617
for req in requests:
617618
yield (req[0], await self.get(*req))
618619

620+
async def get_ranges(
621+
self,
622+
key: str,
623+
byte_ranges: Sequence[ByteRequest | None],
624+
*,
625+
prototype: BufferPrototype,
626+
max_concurrency: int = 10,
627+
max_gap_bytes: int = 1 << 20, # 1 MiB
628+
max_coalesced_bytes: int = 16 << 20, # 16 MiB
629+
) -> AsyncIterator[Sequence[tuple[int, Buffer | None]]]:
630+
"""Read many byte ranges from `key`.
631+
632+
Yields one batch per underlying I/O operation, each a sequence of
633+
`(input_index, Buffer | None)` tuples. Batches across yields arrive in
634+
completion order, not input order. The default implementation built
635+
into `Store` runs the coalescer over `self.get`, so subclasses get a
636+
working implementation for free; stores that have a more efficient
637+
backend (e.g. ranged HTTP, S3 byte-range fetches) should override.
638+
639+
Parameters
640+
----------
641+
key
642+
Storage key to read from.
643+
byte_ranges
644+
Input ranges. `None` means "the whole value".
645+
prototype
646+
Buffer prototype, forwarded to `self.get`.
647+
max_concurrency
648+
Maximum number of merged fetches in flight at once.
649+
max_gap_bytes
650+
Two `RangeByteRequest`s separated by at most this many bytes may
651+
be merged into one fetch.
652+
max_coalesced_bytes
653+
Upper bound on the size of a single merged fetch.
654+
655+
Raises
656+
------
657+
BaseExceptionGroup
658+
Failures from underlying fetches are reported as a
659+
`BaseExceptionGroup` (PEP 654) and should be handled with
660+
`except*`. Inner exceptions include `FileNotFoundError` if any
661+
fetch returns `None` (i.e. `key` is absent), and any exception
662+
raised by `self.get` for the corresponding range. Pending
663+
fetches are cancelled as soon as one task fails, so the group
664+
typically contains a single non-`CancelledError` exception even
665+
under high concurrency.
666+
"""
667+
# Local import: zarr.core._coalesce imports symbols from this module.
668+
from zarr.core._coalesce import coalesced_get
669+
670+
fetch = partial(self.get, key, prototype)
671+
async for group in coalesced_get(
672+
fetch,
673+
byte_ranges,
674+
max_concurrency=max_concurrency,
675+
max_gap_bytes=max_gap_bytes,
676+
max_coalesced_bytes=max_coalesced_bytes,
677+
):
678+
yield group
679+
619680
async def getsize(self, key: str) -> int:
620681
"""
621682
Return the size, in bytes, of a value in a Store.

src/zarr/core/_coalesce.py

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# src/zarr/core/_coalesce.py
2+
from __future__ import annotations
3+
4+
import asyncio
5+
from typing import TYPE_CHECKING, NamedTuple
6+
7+
from zarr.abc.store import RangeByteRequest
8+
9+
if TYPE_CHECKING:
10+
from collections.abc import AsyncGenerator, Awaitable, Callable, Sequence
11+
12+
from zarr.abc.store import ByteRequest
13+
from zarr.core.buffer import Buffer
14+
15+
16+
class _WorkerCtx(NamedTuple):
17+
"""Shared state passed to the per-task worker coroutines.
18+
19+
Bundling these lets the workers declare their dependencies as one
20+
parameter instead of capturing them implicitly via closure.
21+
"""
22+
23+
fetch: Callable[[ByteRequest | None], Awaitable[Buffer | None]]
24+
semaphore: asyncio.Semaphore
25+
26+
27+
async def _fetch_single(
28+
ctx: _WorkerCtx, idx: int, req: ByteRequest | None
29+
) -> Sequence[tuple[int, Buffer | None]]:
30+
"""Fetch one byte range. Raises FileNotFoundError if the key is absent."""
31+
async with ctx.semaphore:
32+
buf = await ctx.fetch(req)
33+
if buf is None:
34+
raise FileNotFoundError
35+
return ((idx, buf),)
36+
37+
38+
async def _fetch_group(
39+
ctx: _WorkerCtx, members: list[tuple[int, RangeByteRequest]]
40+
) -> Sequence[tuple[int, Buffer | None]]:
41+
"""Fetch one merged byte range and slice it back into per-input buffers.
42+
43+
`members` must already be sorted by `start`; callers in this module
44+
build it from the sorted mergeable list. Raises `FileNotFoundError`
45+
if the key is absent.
46+
"""
47+
if len(members) == 1:
48+
solo_idx, solo_req = members[0]
49+
return await _fetch_single(ctx, solo_idx, solo_req)
50+
51+
start = members[0][1].start
52+
end = max(r.end for _, r in members)
53+
async with ctx.semaphore:
54+
big = await ctx.fetch(RangeByteRequest(start, end))
55+
if big is None:
56+
raise FileNotFoundError
57+
sliced = [(idx, big[r.start - start : r.end - start]) for idx, r in members]
58+
return tuple(sliced)
59+
60+
61+
def coalesce_ranges(
62+
byte_ranges: Sequence[ByteRequest | None],
63+
*,
64+
max_gap_bytes: int,
65+
max_coalesced_bytes: int,
66+
) -> tuple[
67+
list[list[tuple[int, RangeByteRequest]]],
68+
list[tuple[int, ByteRequest | None]],
69+
]:
70+
"""Plan a set of byte-range fetches: which inputs merge, which stand alone.
71+
72+
Pure (no I/O). The result is the I/O plan a caller would execute: each
73+
group corresponds to one fetch of a coalesced byte range, and each
74+
uncoalescable item corresponds to one fetch of the original request.
75+
76+
All tuning knobs are required keyword arguments. `Store.get_ranges` is
77+
the public entry point and owns the canonical default values; this
78+
function takes them explicitly to avoid duplicating policy.
79+
80+
Parameters
81+
----------
82+
byte_ranges
83+
Input ranges. `None` means "the whole value".
84+
max_gap_bytes
85+
Two `RangeByteRequest`s separated by at most this many bytes may be
86+
merged into one fetch.
87+
max_coalesced_bytes
88+
Upper bound on the size of a single merged fetch.
89+
90+
Returns
91+
-------
92+
groups
93+
List of merged groups. Each group is a list of
94+
`(input_index, RangeByteRequest)` pairs sorted by `start`. A
95+
single-element group represents a `RangeByteRequest` that did not
96+
merge with any neighbor.
97+
uncoalescable
98+
List of `(input_index, request)` pairs for inputs that are not
99+
`RangeByteRequest` (`OffsetByteRequest`, `SuffixByteRequest`,
100+
`None`). Indices are preserved from the input order.
101+
102+
Notes
103+
-----
104+
Only `RangeByteRequest` inputs participate in coalescing. Two ranges
105+
merge when both: their gap (next `start` minus current group's running
106+
`end`) is `<= max_gap_bytes`, and the resulting merged span is
107+
`<= max_coalesced_bytes`.
108+
"""
109+
indexed = list(enumerate(byte_ranges))
110+
mergeable = [(i, r) for i, r in indexed if isinstance(r, RangeByteRequest)]
111+
uncoalescable: list[tuple[int, ByteRequest | None]] = [
112+
(i, r) for i, r in indexed if not isinstance(r, RangeByteRequest)
113+
]
114+
115+
# Sort mergeables by start offset, then merge. Track running start/end of the
116+
# current group so each merge step is O(1) instead of O(group size).
117+
mergeable.sort(key=lambda pair: pair[1].start)
118+
groups: list[list[tuple[int, RangeByteRequest]]] = []
119+
group_start = 0
120+
group_end = 0
121+
for pair in mergeable:
122+
_i, r = pair
123+
if groups and r.start - group_end <= max_gap_bytes:
124+
prospective_end = max(group_end, r.end)
125+
if prospective_end - group_start <= max_coalesced_bytes:
126+
groups[-1].append(pair)
127+
group_end = prospective_end
128+
continue
129+
groups.append([pair])
130+
group_start = r.start
131+
group_end = r.end
132+
133+
return groups, uncoalescable
134+
135+
136+
async def coalesced_get(
137+
fetch: Callable[[ByteRequest | None], Awaitable[Buffer | None]],
138+
byte_ranges: Sequence[ByteRequest | None],
139+
*,
140+
max_concurrency: int,
141+
max_gap_bytes: int,
142+
max_coalesced_bytes: int,
143+
) -> AsyncGenerator[Sequence[tuple[int, Buffer | None]]]:
144+
"""Read many byte ranges through `fetch` with coalescing and concurrency.
145+
146+
Nearby ranges are merged into a single underlying I/O, and merged fetches
147+
are run concurrently. Each yield corresponds to exactly one underlying I/O
148+
operation: a sequence of `(input_index, result)` tuples for all input
149+
ranges served by that I/O. Tuples within a yielded sequence are ordered by
150+
start offset. Yields across groups are in completion order, not input
151+
order.
152+
153+
All tuning knobs are required keyword arguments. `Store.get_ranges` is
154+
the public entry point and owns the canonical default values; this
155+
function takes them explicitly to avoid duplicating policy.
156+
157+
Parameters
158+
----------
159+
fetch
160+
Callable that reads one byte range and returns a `Buffer` (or `None`
161+
if the underlying key does not exist). Typically constructed via
162+
`functools.partial(store.get, key, prototype)`.
163+
byte_ranges
164+
Input ranges. `None` means "the whole value".
165+
max_concurrency
166+
Maximum number of merged fetches in flight at once.
167+
max_gap_bytes
168+
Forwarded to `coalesce_ranges`.
169+
max_coalesced_bytes
170+
Forwarded to `coalesce_ranges`.
171+
172+
Yields
173+
------
174+
Sequence[tuple[int, Buffer | None]]
175+
Per-I/O batch of `(input_index, result)` tuples.
176+
177+
Notes
178+
-----
179+
- Only `RangeByteRequest` inputs are coalesced. `OffsetByteRequest`,
180+
`SuffixByteRequest`, and `None` are each treated as uncoalescable
181+
(one fetch, one single-tuple yield per input).
182+
- Failures from underlying fetches surface as a `BaseExceptionGroup`
183+
(PEP 654). Inner exceptions include `FileNotFoundError` if a fetch
184+
returns `None`, plus any exception `fetch` raises. Pending fetches are
185+
cancelled as soon as one task fails, so the group typically contains a
186+
single non-`CancelledError` exception even under high concurrency.
187+
- Groups completed before the failure remain observable on the yields
188+
preceding the raise.
189+
- `GeneratorExit` raised by `aclose()` is filtered out so the iterator
190+
closes cleanly; callers don't see a group containing only it.
191+
"""
192+
if not byte_ranges:
193+
return
194+
195+
groups, singles = coalesce_ranges(
196+
byte_ranges,
197+
max_gap_bytes=max_gap_bytes,
198+
max_coalesced_bytes=max_coalesced_bytes,
199+
)
200+
201+
ctx = _WorkerCtx(fetch=fetch, semaphore=asyncio.Semaphore(max_concurrency))
202+
203+
# Launch all work as tasks. The semaphore bounds actual I/O concurrency.
204+
# TaskGroup wraps task exceptions in BaseExceptionGroup; we propagate the
205+
# group unchanged as part of the public contract (callers handle batch
206+
# failures via `except*` / PEP 654). GeneratorExit (raised when the
207+
# consumer calls aclose()) is filtered out so close completes cleanly.
208+
try:
209+
async with asyncio.TaskGroup() as tg:
210+
tasks = [
211+
*(tg.create_task(_fetch_group(ctx, group)) for group in groups),
212+
*(tg.create_task(_fetch_single(ctx, i, single)) for i, single in singles),
213+
]
214+
215+
for fut in asyncio.as_completed(tasks):
216+
yield await fut
217+
except BaseExceptionGroup as eg:
218+
# Strip GeneratorExits (consumer aclose()) and propagate whatever remains.
219+
_, other_errors = eg.split(GeneratorExit)
220+
221+
if other_errors is not None:
222+
raise other_errors from None

src/zarr/storage/_wrapper.py

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from typing import TYPE_CHECKING, cast
44

55
if TYPE_CHECKING:
6-
from collections.abc import AsyncGenerator, AsyncIterator, Iterable
6+
from collections.abc import AsyncGenerator, AsyncIterator, Iterable, Sequence
77
from types import TracebackType
88
from typing import Any, Self
99

@@ -103,6 +103,32 @@ async def get_partial_values(
103103
) -> list[Buffer | None]:
104104
return await self._store.get_partial_values(prototype, key_ranges)
105105

106+
async def get_ranges(
107+
self,
108+
key: str,
109+
byte_ranges: Sequence[ByteRequest | None],
110+
*,
111+
prototype: BufferPrototype,
112+
max_concurrency: int | None = None,
113+
max_gap_bytes: int | None = None,
114+
max_coalesced_bytes: int | None = None,
115+
) -> AsyncIterator[Sequence[tuple[int, Buffer | None]]]:
116+
"""Forward `get_ranges` to the wrapped store.
117+
118+
Default values for the coalescing kwargs are not declared here; the
119+
wrapped store decides them. `None` means "don't override the wrapped
120+
store's default".
121+
"""
122+
kwargs: dict[str, int] = {}
123+
if max_concurrency is not None:
124+
kwargs["max_concurrency"] = max_concurrency
125+
if max_gap_bytes is not None:
126+
kwargs["max_gap_bytes"] = max_gap_bytes
127+
if max_coalesced_bytes is not None:
128+
kwargs["max_coalesced_bytes"] = max_coalesced_bytes
129+
async for group in self._store.get_ranges(key, byte_ranges, prototype=prototype, **kwargs):
130+
yield group
131+
106132
async def exists(self, key: str) -> bool:
107133
return await self._store.exists(key)
108134

0 commit comments

Comments
 (0)