Skip to content

Commit 82a2b0c

Browse files
Merge branch 'main' into fix-obstore-listdir
2 parents 99990a8 + 0e61449 commit 82a2b0c

File tree

13 files changed

+896
-18
lines changed

13 files changed

+896
-18
lines changed

.github/workflows/lint.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Lint
2+
3+
on:
4+
push:
5+
branches: [main, 3.1.x]
6+
pull_request:
7+
branches: [main, 3.1.x]
8+
workflow_dispatch:
9+
10+
concurrency:
11+
group: ${{ github.workflow }}-${{ github.ref }}
12+
cancel-in-progress: true
13+
14+
jobs:
15+
lint:
16+
name: Lint
17+
runs-on: ubuntu-latest
18+
steps:
19+
- uses: actions/checkout@v6
20+
- name: Install uv
21+
uses: astral-sh/setup-uv@v5
22+
- name: Install prek
23+
run: uv tool install prek
24+
- name: Run prek
25+
run: prek run --all-files

.pre-commit-config.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
ci:
22
autoupdate_commit_msg: "chore: update pre-commit hooks"
33
autoupdate_schedule: "monthly"
4-
autofix_commit_msg: "style: pre-commit fixes"
54
autofix_prs: false
5+
skip: [] # pre-commit.ci only checks for updates, prek runs hooks locally
6+
67
default_stages: [pre-commit, pre-push]
78

89
default_language_version:

changes/3658.misc.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Switch from `pre-commit` to [`prek`](https://github.com/j178/prek) for pre-commit checks.

docs/contributing.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -109,26 +109,50 @@ All tests are automatically run via GitHub Actions for every pull request and mu
109109

110110
> **Note:** Previous versions of Zarr-Python made extensive use of doctests. These tests were not maintained during the 3.0 refactor but may be brought back in the future. See issue #2614 for more details.
111111
112-
### Code standards - using pre-commit
112+
### Code standards - using prek
113113

114114
All code must conform to the PEP8 standard. Regarding line length, lines up to 100 characters are allowed, although please try to keep under 90 wherever possible.
115115

116-
`Zarr` uses a set of `pre-commit` hooks and the `pre-commit` bot to format, type-check, and prettify the codebase. `pre-commit` can be installed locally by running:
116+
`Zarr` uses a set of git hooks managed by [`prek`](https://github.com/j178/prek), a fast, Rust-based pre-commit hook manager that is fully compatible with `.pre-commit-config.yaml` files. `prek` can be installed locally by running:
117117

118118
```bash
119-
python -m pip install pre-commit
119+
uv tool install prek
120+
```
121+
122+
or:
123+
124+
```bash
125+
pip install prek
120126
```
121127

122128
The hooks can be installed locally by running:
123129

124130
```bash
125-
pre-commit install
131+
prek install
132+
```
133+
134+
This would run the checks every time a commit is created locally. The checks will by default only run on the files modified by a commit, but the checks can be triggered for all the files by running:
135+
136+
```bash
137+
prek run --all-files
138+
```
139+
140+
You can also run hooks only for files in a specific directory:
141+
142+
```bash
143+
prek run --directory src/zarr
144+
```
145+
146+
Or run hooks for files changed in the last commit:
147+
148+
```bash
149+
prek run --last-commit
126150
```
127151

128-
This would run the checks every time a commit is created locally. These checks will also run on every commit pushed to an open PR, resulting in some automatic styling fixes by the `pre-commit` bot. The checks will by default only run on the files modified by a commit, but the checks can be triggered for all the files by running:
152+
To list all available hooks:
129153

130154
```bash
131-
pre-commit run --all-files
155+
prek list
132156
```
133157

134158
If you would like to skip the failing checks and push the code for further discussion, use the `--no-verify` option with `git commit`.

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ maintainers = [
3030
{ name = "Deepak Cherian" }
3131
]
3232
requires-python = ">=3.11"
33-
# If you add a new dependency here, please also add it to .pre-commit-config.yml
33+
# If you add a new dependency here, please also add it to .pre-commit-config.yaml
3434
dependencies = [
3535
'packaging>=22.0',
3636
'numpy>=2.0',
@@ -428,6 +428,7 @@ markers = [
428428
ignore = [
429429
"PC111", # fix Python code in documentation - enable later
430430
"PC180", # for JavaScript - not interested
431+
"PC902", # pre-commit.ci custom autofix message - not using autofix
431432
]
432433

433434
[tool.numpydoc_validation]

src/zarr/abc/store.py

Lines changed: 210 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
from __future__ import annotations
22

3+
import asyncio
4+
import json
35
from abc import ABC, abstractmethod
4-
from asyncio import gather
56
from dataclasses import dataclass
67
from itertools import starmap
78
from typing import TYPE_CHECKING, Literal, Protocol, runtime_checkable
89

10+
from zarr.core.sync import sync
11+
912
if TYPE_CHECKING:
1013
from collections.abc import AsyncGenerator, AsyncIterator, Iterable
1114
from types import TracebackType
@@ -206,6 +209,211 @@ async def get(
206209
"""
207210
...
208211

212+
async def _get_bytes(
213+
self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None
214+
) -> bytes:
215+
"""
216+
Retrieve raw bytes from the store asynchronously.
217+
218+
This is a convenience method that wraps ``get()`` and converts the result
219+
to bytes. Use this when you need the raw byte content of a stored value.
220+
221+
Parameters
222+
----------
223+
key : str
224+
The key identifying the data to retrieve.
225+
prototype : BufferPrototype
226+
The buffer prototype to use for reading the data.
227+
byte_range : ByteRequest, optional
228+
If specified, only retrieve a portion of the stored data.
229+
Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``.
230+
231+
Returns
232+
-------
233+
bytes
234+
The raw bytes stored at the given key.
235+
236+
Raises
237+
------
238+
FileNotFoundError
239+
If the key does not exist in the store.
240+
241+
See Also
242+
--------
243+
get : Lower-level method that returns a Buffer object.
244+
get_bytes : Synchronous version of this method.
245+
get_json : Asynchronous method for retrieving and parsing JSON data.
246+
247+
Examples
248+
--------
249+
>>> store = await MemoryStore.open()
250+
>>> await store.set("data", Buffer.from_bytes(b"hello world"))
251+
>>> data = await store.get_bytes("data", prototype=default_buffer_prototype())
252+
>>> print(data)
253+
b'hello world'
254+
"""
255+
buffer = await self.get(key, prototype, byte_range)
256+
if buffer is None:
257+
raise FileNotFoundError(key)
258+
return buffer.to_bytes()
259+
260+
def _get_bytes_sync(
261+
self, key: str = "", *, prototype: BufferPrototype, byte_range: ByteRequest | None = None
262+
) -> bytes:
263+
"""
264+
Retrieve raw bytes from the store synchronously.
265+
266+
This is a synchronous wrapper around ``get_bytes()``. It should only
267+
be called from non-async code. For async contexts, use ``get_bytes()``
268+
instead.
269+
270+
Parameters
271+
----------
272+
key : str, optional
273+
The key identifying the data to retrieve. Defaults to an empty string.
274+
prototype : BufferPrototype
275+
The buffer prototype to use for reading the data.
276+
byte_range : ByteRequest, optional
277+
If specified, only retrieve a portion of the stored data.
278+
Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``.
279+
280+
Returns
281+
-------
282+
bytes
283+
The raw bytes stored at the given key.
284+
285+
Raises
286+
------
287+
FileNotFoundError
288+
If the key does not exist in the store.
289+
290+
Warnings
291+
--------
292+
Do not call this method from async functions. Use ``get_bytes()`` instead
293+
to avoid blocking the event loop.
294+
295+
See Also
296+
--------
297+
get_bytes : Asynchronous version of this method.
298+
get_json_sync : Synchronous method for retrieving and parsing JSON data.
299+
300+
Examples
301+
--------
302+
>>> store = MemoryStore()
303+
>>> await store.set("data", Buffer.from_bytes(b"hello world"))
304+
>>> data = store.get_bytes_sync("data", prototype=default_buffer_prototype())
305+
>>> print(data)
306+
b'hello world'
307+
"""
308+
309+
return sync(self._get_bytes(key, prototype=prototype, byte_range=byte_range))
310+
311+
async def _get_json(
312+
self, key: str, *, prototype: BufferPrototype, byte_range: ByteRequest | None = None
313+
) -> Any:
314+
"""
315+
Retrieve and parse JSON data from the store asynchronously.
316+
317+
This is a convenience method that retrieves bytes from the store and
318+
parses them as JSON.
319+
320+
Parameters
321+
----------
322+
key : str
323+
The key identifying the JSON data to retrieve.
324+
prototype : BufferPrototype
325+
The buffer prototype to use for reading the data.
326+
byte_range : ByteRequest, optional
327+
If specified, only retrieve a portion of the stored data.
328+
Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``.
329+
Note: Using byte ranges with JSON may result in invalid JSON.
330+
331+
Returns
332+
-------
333+
Any
334+
The parsed JSON data. This follows the behavior of ``json.loads()`` and
335+
can be any JSON-serializable type: dict, list, str, int, float, bool, or None.
336+
337+
Raises
338+
------
339+
FileNotFoundError
340+
If the key does not exist in the store.
341+
json.JSONDecodeError
342+
If the stored data is not valid JSON.
343+
344+
See Also
345+
--------
346+
get_bytes : Method for retrieving raw bytes.
347+
get_json_sync : Synchronous version of this method.
348+
349+
Examples
350+
--------
351+
>>> store = await MemoryStore.open()
352+
>>> metadata = {"zarr_format": 3, "node_type": "array"}
353+
>>> await store.set("zarr.json", Buffer.from_bytes(json.dumps(metadata).encode()))
354+
>>> data = await store.get_json("zarr.json", prototype=default_buffer_prototype())
355+
>>> print(data)
356+
{'zarr_format': 3, 'node_type': 'array'}
357+
"""
358+
359+
return json.loads(await self._get_bytes(key, prototype=prototype, byte_range=byte_range))
360+
361+
def _get_json_sync(
362+
self, key: str = "", *, prototype: BufferPrototype, byte_range: ByteRequest | None = None
363+
) -> Any:
364+
"""
365+
Retrieve and parse JSON data from the store synchronously.
366+
367+
This is a synchronous wrapper around ``get_json()``. It should only
368+
be called from non-async code. For async contexts, use ``get_json()``
369+
instead.
370+
371+
Parameters
372+
----------
373+
key : str, optional
374+
The key identifying the JSON data to retrieve. Defaults to an empty string.
375+
prototype : BufferPrototype
376+
The buffer prototype to use for reading the data.
377+
byte_range : ByteRequest, optional
378+
If specified, only retrieve a portion of the stored data.
379+
Can be a ``RangeByteRequest``, ``OffsetByteRequest``, or ``SuffixByteRequest``.
380+
Note: Using byte ranges with JSON may result in invalid JSON.
381+
382+
Returns
383+
-------
384+
Any
385+
The parsed JSON data. This follows the behavior of ``json.loads()`` and
386+
can be any JSON-serializable type: dict, list, str, int, float, bool, or None.
387+
388+
Raises
389+
------
390+
FileNotFoundError
391+
If the key does not exist in the store.
392+
json.JSONDecodeError
393+
If the stored data is not valid JSON.
394+
395+
Warnings
396+
--------
397+
Do not call this method from async functions. Use ``get_json()`` instead
398+
to avoid blocking the event loop.
399+
400+
See Also
401+
--------
402+
get_json : Asynchronous version of this method.
403+
get_bytes_sync : Synchronous method for retrieving raw bytes without parsing.
404+
405+
Examples
406+
--------
407+
>>> store = MemoryStore()
408+
>>> metadata = {"zarr_format": 3, "node_type": "array"}
409+
>>> store.set("zarr.json", Buffer.from_bytes(json.dumps(metadata).encode()))
410+
>>> data = store.get_json_sync("zarr.json", prototype=default_buffer_prototype())
411+
>>> print(data)
412+
{'zarr_format': 3, 'node_type': 'array'}
413+
"""
414+
415+
return sync(self._get_json(key, prototype=prototype, byte_range=byte_range))
416+
209417
@abstractmethod
210418
async def get_partial_values(
211419
self,
@@ -278,7 +486,7 @@ async def _set_many(self, values: Iterable[tuple[str, Buffer]]) -> None:
278486
"""
279487
Insert multiple (key, value) pairs into storage.
280488
"""
281-
await gather(*starmap(self.set, values))
489+
await asyncio.gather(*starmap(self.set, values))
282490

283491
@property
284492
def supports_consolidated_metadata(self) -> bool:

0 commit comments

Comments
 (0)